...
4.85 million files were ingested using a three-level hierarchy (74 top-level nodes, 256 second-level nodes in each, 256 third-level nodes in each, and one 10KB datastream in each), taking 111 hours. After each batch, three REST API operations were timed: listing the top level of the repository ("toplist"), listing a second-level node ("dirlist"), and retrieving a file ("fileget"). Performance retrieving files and listing the second-level nodes did not degrade with larger numbers of objects. However, listing the top-level of the repository degraded roughly linearly as more objects were added, and became increasing erratic.
6.5 million files in a 4-level hierarchy
6.5 million files were ingested into a test repository running Fedora 4.0-beta1 (lib-devsandbox1.ucsd.edu) using a four-level hierarchy (25 top-level nodes, 64 second- through fourth-level nodes, and one 10KB datastream in each bottom-level node), taking 30 days. After each batch, three REST API operations were timed: listing the top level of the repository ("toplist"), listing a third-level node ("dirlist"), and retrieving a file ("fileget"). Performance retrieving files did not degrade with larger numbers of objects. However, listing the top-level of the repository degraded roughly linearly as more objects were added, and listing a third-level node increased more rapidly, with increasing variability as more objects were created.
The duration of ingesting each batch of 256K objects also increased steadily:
Federated filesystem
Files in a single directory
...