Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Fedora 4 can handle a large number of at least 10 million objects, but performance degrades if there are too many children and/or grandchildren of a single node.

...

4.85 million files were ingested using a three-level hierarchy (74 top-level nodes, 256 second-level nodes in each, 256 third-level nodes in each, and one 10KB datastream in each), taking 111 hours.  After each batch, three REST API operations were timed: listing the top level of the repository ("toplist"), listing a second-level node ("dirlist"), and retrieving a file ("fileget").  Performance retrieving files and listing the second-level nodes did not degrade with larger numbers of objects.  However, listing the top-level of the repository degraded roughly linearly as more objects were added, and became increasing erratic.

...

10 million files in a 4-level hierarchy

6.5 10 million files were ingested into a test repository running Fedora 4.0-beta1 (lib-devsandbox1.ucsd.edu) using a four-level hierarchy (25 39 top-level nodes, 64 second- through fourth-level nodes, and one 10KB datastream in each bottom-level node), taking 30 days54 days (averging 186K objects/day).  After each batch, three REST API operations were timed: listing the top level of the repository ("toplist"), listing a third-level node ("dirlist"), and retrieving a file ("fileget").  Performance retrieving files did not degrade with larger numbers of objects.  However, listing the top-level of the repository degraded roughly linearly as more objects were added, and listing a third-level node increased more rapidly, with increasing variability as more objects were created.

...