Java command-line options and System properties can be used
-Xmx2048m
– maximum memory Java can use-Dfcrepo.home=/path/to/data
– set the directory for permanent data-Djava.io.tmpdir=/path/to/tmpdir
– set the directory for temp files. Data uploaded to a federated filesystem via the REST API is written to a temp file in this directory before being moved to the federated filesystem, so this directory should have enough free space for the largest files you will upload.
Ingesting Large Files via the REST API
Based on the tests below, we believe arbitrarily-large files can be ingested and downloaded via the REST API (tested up to 1TB). The only apparent limitations are disk space available to store the files, and a sufficiently large Java heap size.
Note: To enable fast access to large files, it is necessary to set "contentBasedSha1" : "false". Otherwise the repository will run a SHA1 on the content for identification that could take on the order or hours when reaching into the range of > 50Gb. For more on this benchmarking see: https://wiki.duraspace.org/display/FF/Design+-+Large+Files.
REST API Upload/Download Roundtrip
- Platform: Linux 3.12.1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux 16GB RAM
- Repository Profile: Single-File
- Workflow Profile: Upload/Download Roundtrip
File Size | Upload | Download |
---|---|---|
256GB | 15,488,156ms (16.9MB/sec) | 3,306,756ms (79.3MB/sec) |
REST API Upload/Download Roundtrip
- Platform: lib-devsandbox1.ucsd.edu (all data on NAS to handle large files)
- Repository Profile: Minimal
- Workflow Profile: Upload/Download Roundtrip
File Size | Upload | Download |
---|---|---|
256GB | 15,488,156ms (16.9MB/sec) | 3,306,756ms (79.3MB/sec) |
512GB | 31,262,610ms (16.77MB/sec) | 5,386,542ms (97.33MB/sec) |
1TB | 59,631,142ms (17.58MB/sec) | 15,120,135ms (69.35MB/sec) |
Serving Large Files via Filesystem Federation
Based on the tests below, we believe arbitrarily-large files can be projected into the repository via filesystem federation and downloaded via the REST API (tested up to 1TB). The only apparent limitations are disk space available to store the files, and a sufficiently large Java heap size.
Filesystem Federation Download Tests
- Platform: Linux 3.12.1-1-ARCH #1 SMP PREEMPT x86_64 GNU/Linux 16GB RAM
Repository Profile: Single-File with an additional external Resource:
"externalSources" : {
"home-directory" : {
"classname" : "org.modeshape.connector.filesystem.FileSystemConnector",
"directoryPath" : "/tmp/projection",
"projections" : [ "default:/projection => /" ],
"readOnly" : true,
"addMimeTypeMixin" : true
}
}
File Size | Projection Directory Request Duration | First Projected Node Request Duration | Download Duration | Throughput |
---|---|---|---|---|
2 GB | 0m35.117s | 0m34.572s | 0m8.236s | 248.66 mb/sec |
Filesystem Federation Download Tests
- Platform: lib-devsandbox1.ucsd.edu (all data on NAS to handle large files)
- Repository Profile: Minimal, with filesystem federation:
"externalSources" : { "filesystem" : { "classname" : "org.modeshape.connector.filesystem.FileSystemConnect "directoryPath" : "/mnt/isilon/fedora-dev/federated", "projections" : [ "default:/projection => /" ], "readOnly" : true, "addMimeTypeMixin" : true, "contentBasedSha1" : "false" } }
Objects | Datastream Size | Projection Directory | Projected Node Request Duration | Download | Download Throughput |
---|---|---|---|---|---|
1 | 1 GB | 417 ms | 35 ms | 17,333 ms | 59.08 MB/sec |
1 | 2 GB | 528 ms | 219 ms | 26,902 ms | 76.13 MB/sec |
1 | 4 GB | 432 ms | 54 ms | 47,581 ms | 86.08 MB/sec |
1 | 8 GB | 583 ms | 90 ms | 90,705 ms | 90.31 MB/sec |
1 | 16 GB | 691 ms | 452 ms | 176,508 ms | 92.82 MB/sec |
1 | 32 GB | 445 ms | 34 ms | 348,488 ms | 94.03 MB/sec |
1 | 64 GB | 750 ms | 460 ms | 699,937 ms | 93.63 MB/sec |
1 | 128 GB | 800 ms | 90 ms | 1,412,640 ms | 92.79 MB/sec |
1 | 256 GB | 530 ms | 70 ms | 2,768,570 ms | 94.69 MB/sec |
1 | 512 GB | 490 ms | 80 ms | 5,893,420 ms | 88.96 MB/sec |
1 | 1 TB | 420 ms | 40 ms | 11,322,330 ms | 92.61 MB/sec |
Direct Comparison of Different Transfer Methods
Based on the tests below, we believe arbitrarily-large files can be uploaded and downloaded via the REST API, using either repository storage or a federated filesystem (tested up to 1TB). The only apparent limitations are disk space available to store the files, temp directory capacity, and a sufficiently large Java heap size.
- Platform: lib-devsandbox1.ucsd.edu (all data on NAS to handle large files)
- Repository Profile: Federation
- Workflow Profile: Repository/Federation/NFS/SCP Comparison
Comparison of Upload and Download Times for Different Transfer Methods
Transfer Method | File Size | Upload | Download |
---|---|---|---|
REST API (Federated) | 1TB | 732 min (84 GB/sec) | 246 min (250 GB/sec) |
REST API (Repository) | 1TB | 339 min (181 GB/sec) | 250 min (246 GB/sec) |
SCP | 1TB | 383 min (160 GB/sec) | |
NFS | 1TB | 336 min (183 GB/sec) |
Copying Files Between Federated Filesystem and Repository Storage
Source | Destination | File Size | Copy Time |
---|---|---|---|
Repository storage | Federated filesystem | 1TB | 402 min (153 GB/sec) |
Federated filesystem | Repository storage | 1TB | 345 min (178 GB/sec) |