The Sync Tool is a utility which was created in order to provide a simple way to move files from a local file system to DuraCloud and subsequently keep the files in DuraCloud synchronized with those on the local system.
Download the Sync Tool from the Downloads page.
The Sync Tool can be installed using one of the installers on the downloads page linked above. Once installed, the Sync Tool will default to running in GUI mode. To run in command line mode, open a terminal window (or command prompt) and navigate to the Sync Tool installation directory. Once there, execute the Sync Tool JAR file using: "java -jar duracloudsync.jar --help". This will print the usage information for the tool.
Using the prefix option, the content IDs that are created for the files being moved to DuraCloud by the SyncTool can be made to begin with a consistent text value. There are several reasons this might be useful, such as to include the name of a top-level directory in the path, or to be able to run the Sync from a new sub-directory, but still maintain the full path included on all existing stored content. Suppose the path to a local file (found within the watch directory) is "dir1/file.txt" and you would like the resulting content stored in DuraCloud to be 'a/b/c/dir1/file.txt. To achieve that result, the destination prefix of "a/b/c/" would need to be set.
Adding or changing a prefix for content that has already been transferred to DuraCloud will result in those files being duplicated in DuraCloud storage. Removing the duplicate files can be done by using the "sync deletes" option, but this will cause all content in the destination space which does not include the prefix to be deleted (along with any content that is not found in the local watch directories.) Be cautious when using this feature if you have already uploaded content to your DuraCloud space. |
If you use a prefix to include a file path (such as a top level directory name), remember to include the "/" character at the end of your prefix. For example, using the prefix "dir1/" with file "file.txt", your final content ID will be "dir1/file.txt". If you were to forget the slash, your prefix would be "dir1", which would lead to a content ID of "dir1file.txt", which is likely not what you want. |
Optimizing Transfer Rate
When using the SyncTool to transmit data sets with a large number of files (i.e. hundreds of thousands of files or more) users occasionally run into out of memory errors. Users with sufficient memory resources on their machines can usually remedy this problem by increasing the maximum heap space available to the Java VM. We recommend starting with a setting of at least 1 GB when working with sets over 100,000 files. If the problem persists, try increasing the memory value until the problem ceases to manifest. To increase the heap space use the -Xmx java option. Click for more information on setting the heap space.
An alternative solution is to upload files in smaller sets. The prefix option can be used to ensure that files are added to DuraCloud with the preferred ID values.
To run the SyncTool in UI mode with 1 GB of heap memory space, download the Jar version of the SyncTool and execute the following on the command line:
java -Xmx1g -jar duracloudsync-{version}.jar |
To run the SyncTool in command-line mode with 1 GB of heap memory space, download the Jar version of the SyncTool and execute the above command followed by the command line parameter values.
As of DuraCloud version 4.0.0, the Sync Tool requires Java 8 to run. The latest version of Java can be downloaded from here. |
You must have Java version 8 or above installed on your local system. If Java is not installed, or if a previous version is installed, you will need to download and install Java. To determine if the correct version of Java is installed, open a terminal or command prompt and enter
java -version |
The version displayed should be 1.8.0 or above. If running this command generates an error, Java is likely not installed.
To display the help for the Sync Tool, run
java -jar duracloudsync-{version}.jar --help |
When running the Sync Tool for the first time, you will need to use these options:
Short Option | Long Option | Argument Expected | Required | Description | Default Value (if optional) |
---|---|---|---|---|---|
-h | --host | Yes | Yes | The host address of the DuraCloud DuraStore application |
|
-r | --port | Yes | No | The port of the DuraCloud DuraStore application | 443 |
-i | --store-id | Yes | No | The Store ID for the DuraCloud storage provider | The primary storage provider is used |
-s | --space-id | Yes | Yes | The ID of the DuraCloud space where content will be stored |
|
-u | --username | Yes | Yes | The username necessary to perform writes to DuraStore |
|
-p | --password | Yes | No | The password necessary to perform writes to DuraStore. If not specified the sync tool will first check to see if an environment variable named "DURACLOUD_PASSWORD" exists, if it does exist the sync tool will use its value as the password, otherwise you will be prompted to enter the password. Please note that when using the environment variable or the -p parameter you must escape your password according the conventions of your commandline shell. If you're using bash for example, any dollar ($) or backslash (\) chars must be escaped with a backslash. So the password p$ssw\rd would need to be entered as p\$ssw\\rd. There are many other special characters that need to be escaped. Here is a list of bash special characters for your reference. | Not set |
-c | --content-dirs | Yes | Yes | A list of the directory paths to monitor and sync with DuraCloud. If multiple directories are included in this list, they should be separated by a space. |
|
-j | --jump-start | No | No | This option indicates that the sync tool should not attempt to check if content to be synchronized is already in DuraCloud, but should instead transfer all content. This option is best used for new data sets. | Not set |
-a | --prefix | Yes | No | A prefix to be added to the content IDs of all files in the content directories as they are added to DuraCloud. The same prefix applies to all files in all content directories. For example, if a content directory is C:/users/bob/pictures with one file in it, C:/users/bob/pictures/001.jpg, and the prefix value is "bobs-pictures/", the file would be given a DuraCloud content ID of bobs-pictures/001.jpg. Note that the name of the content directory is not included in the path, so if you would like for it to appear as part of the content ID, you will need to include it in the prefix. Also note that the prefix does not need to be a directory name, it can be any value. If, however, you would like for it to appear as a directory path, do not forget to end the prefix with a "/" character. | Not set |
-w | --work-dir | Yes | No | The state of the sync tool is persisted to this directory. If not specified, this value will default to a directory named duracloud-sync-work in the user's home directory. | duracloud-sync-work |
-f | --poll-frequency | Yes | No | The time (in ms) to wait between each poll of the sync-dirs | 10000 (10 seconds) |
-t | --threads | Yes | No | The number of threads in the pool used to manage file transfers | 3 |
-m | --max-file-size | Yes | No | The maximum size of a stored file in GB (value must be between 1 and 5), larger files will be split into pieces | 1 |
-n | --rename-updates <suffix> | No | No | Indicates that when a local file is changed, the original copy of the file in DuraCloud should be renamed prior to the new local version being transferred to DuraCloud. The newest version of the file will retain the original file name while older versions will have a suffix value along with a date appended to the name. For example, a local file named "myfile.txt" is copied to DuraCloud by the SyncTool. The local file is updated, and the SyncTool is run again with this parameter enabled. The result is that DuraCloud will contain "myfile.txt", which is the updated version of the file, and "myfile.txt.orig.<date>" (with <date> replaced by the date on which the file was updated) which is the original version of the file. If "myfile.txt" is updated again, another version file will be created. Specify an optional suffix to override default ( "orig"). To prevent updates altogether, see option -o. (Note that this option cannot be used together with either the -o or the -d options.) | orig |
-o | --no-update | No | No | Indicate that changed files should not be updated. In order to perform updates without overwriting, see option -n. | |
-d | --sync-deletes | No | No | Indicates that deletes performed on files within the content directories should also be performed on those files in DuraCloud; if this option is not included all deletes are ignored | Not set |
-x | --exit-on-completion | No | No | Indicates that the sync tool should exit once it has completed a scan of the content directories and synced all files; if this option is included, the sync tool will not continue to | Not set |
-l | --clean-start | No | No | Indicates that the sync tool should perform a clean start, ensuring that all files in all content directories are checked against DuraCloud, even if those files have not changed locally since the last run of the sync tool | Not set |
-e | --exclude | Yes | No | The full path to a file which specifies a set of exclusion rules. The purpose of the exclusion rules is to indicate that certain files or directories should not be transferred to DuraCloud. The rules must be listed one per line in the file. The rules will match only on the name of a file or directory, not an entire path, so path separators should not be included in rules. Rules are not case sensitive (so a rule "test.log" will match a file "test.LOG"). The rules may include wildcard characters ? and *. The ? matches a single character, while * matches 0 or more characters. Examples of valid rules: | Not set |
When the Sync Tool runs, it creates a backup of your configuration in the work directory that you specify. When running the tool again, you can make use of this file to keep from having to re-enter all of the options specified on the initial run. In this case you need only a single option:
Short Option | Long Option | Argument Expected | Required | Description |
---|---|---|---|---|
-g | --config-file | Yes | Yes | Read configuration from this file (a file containing the most recently used configuration can be found in the work-dir, named synctool.config) |
Command to sync the contents of a single local content directory to DuraCloud.
java -jar duracloudsync-{version}.jar -c C:\files\important -h test.duracloud.org -s important-dir-backup -u myname -p mypassword |
Command to sync the contents of multiple local content directories to DuraCloud.
java -jar duracloudsync-{version}.jar -c C:\files\important C:\Users\me\Documents\important -h test.duracloud.org -s important-dir-backup -u myname -p mypassword |
While the Sync Tool is running, these commands are available. Just type them on the command line where the tool is running. These commands are not available when running in exit-on-completion mode.
Short Command | Long Command | Description |
---|---|---|
x | exit | Tells the Sync Tool to end its activity and close |
c | config | Prints the configuration of the Sync Tool (the same information is printed at startup) |
s | status | Prints the current status of the Sync Tool |
l <Level> | N/A | Changes the log level to <Level> (may be any of DEBUG, INFO, WARN, ERROR) |
h | help | Prints the runtime command help |
As noted above, the Sync Tool can be run in one of two modes, one which allows it to run continually, and the other which allows it to exit once it completes transferring all current files. The mode you choose will determine the way in which you deploy the Sync Tool on a server. The following examples assume the use of the bash shell.
To start the Sync Tool in continually running mode, you would use a command like this:
nohup java -jar duracloudsync-{version}.jar {parameters} > ~/synctool-output.log 2>&1 & |
#!/bin/bash if ps -ef | grep -v grep | grep duracloudsync ; then echo 'DuraCloud Sync is Running' exit 0 else echo 'Starting DuraCloud Sync' java -jar duracloudsync-{version}.jar -x [parameters] >> ~/synctool-output.log 2>&1 & exit 0 fi |
The -x parameter is included here to ensure the Sync Tool exists after completing its run. This script also includes a check to ensure that the tool is not already running before starting.