Introduction
The Sync Tool is a utility which was created in order to provide a simple way to move files from a local file system to DuraCloud and subsequently keep the files in DuraCloud synchronized with those on the local system.
Download
How the Sync Tool Works
- When you run the Sync Tool for the first time, you must include DuraCloud connection information (host, port, username, password) as well as the space where you would like all of your files stored. You must also provide a list of directories which will be synced to DuraCloud and a directory for the Sync Tool to use for its own work.
- When the Sync Tool starts up, it will look through all of the files in each of the local content directories and add them to its internal queue for processing. Each of those files will then be written to your DuraCloud space. As this initial write is happening a listener is set up to watch for any file changes within each of the content directories. As a change occurs (a file is added, updated, or deleted), that change is added to the queue, and the appropriate action is taken to make the DuraCloud space consistent with the local file (i.e. the file is either written to the space or deleted from the space.)
- You can stop the Sync Tool at any time by typing 'x' or 'exit' on the command line where it is running. It will stop all listeners, complete any file transfers that are in progress, and close down.
- When you restart the Sync Tool, if you point it at the same work directory, it will pick up where it left off. While the Sync Tool is running, it is constantly writing backups of its internal queue, so it first reads the most current backup and begins processing the files there. It then scans the content directories to see if there are any files which have been added or updated since the last backup, and it also pulls a list of files from the DuraCloud space and scans that list to see if any local files have been deleted. Any changes detected are added to the internal queue, and the Sync Tool continues to run as usual.
Operational notes
- Restarting
- You can perform a restart of the Sync Tool by using the -g command line option to point to the Sync Tool configuration file, which is written into the work directory (named synctool.config)
- If you would like the Sync Tool to perform a clean start rather than a restart (i.e. you would like it to compare all files in the content directories to DuraCloud) you will need to either point it to a new work directory, or clear out the existing work directory.
- The Sync Tool will perform a clean start (not a restart) if the list of content directories is not the same as the previous run. This is to ensure that all files in all content directories are processed properly.
- Collisions
- The Sync Tool allows you to sync multiple local directories into a single space within DuraCloud. Because of this, there is the possibility of file naming collisions, where two local files resolve to the same DuraCloud ID. If this happens, one file will be overwritten by the other. There are a few ways to ensure that this does not occur:
- Ensure that the top level files and directories within the set of content directories do not have overlapping names.
- Sync only a single directory to a space. You can run multiple copies of the Sync Tool, each over a single local directory, syncing to its own DuraCloud space.
- The Sync Tool allows you to sync multiple local directories into a single space within DuraCloud. Because of this, there is the possibility of file naming collisions, where two local files resolve to the same DuraCloud ID. If this happens, one file will be overwritten by the other. There are a few ways to ensure that this does not occur:
- Work Directory - these files and directories can be found in the work directory (specified using the -w command line parameter)
- Config Files
- When the Sync Tool starts up, it writes the list of parameters and values provided by the user on startup to a file called synctool.config in the work directory. This file can be used to restart the Sync Tool, using the -g parameter to point to the file's location. You can also restart the Sync Tool by indicating the same set of options as used originally. The -g parameter is for convenience only and is not required in any circumstance. Note that this file is overwritten each time the Sync Tool is run with a different set of parameters, so you may choose to copy the file elsewhere (or give it a new name) if you would like to keep a copy of a particular configuration set.
- You may also see a file named synctool.config.bak in the work directory which is used to compare against the current config in order to determine if a restart is possible. In order for a restart to occur, the list of content directories (-c parameter) must be the same as the previous execution of the tool, and there must be at least one changed list backup (see below.)
- Changed List Directory
- While the Sync Tool is running it is constantly updating the list of files which have been changed (when starting the first time, this includes all files in the directories that need to be synced). In order to allow the Sync Tool to restart after it has been stopped, this list of files is continually backed up into the changedList directory. There is no reason to edit these files, but you may choose to delete the changedList directory along with the config files mentioned above to ensure that the Sync Tool does not attempt to perform a restart.
- Logs Directory
- Information about what the Sync Tool is doing while it is running can be found in the sync-tool.log file. It is a good idea to monitor this file for errors and warnings as this information is not printed to the console.
- The duracloud.log file is useful for application debugging when the information in the sync-tool.log file is insufficient to understand a problem.
- Config Files
Prerequisites
- You must have Java version 6 or above installed on your local system. If Java is not installed, you will need to download and install it. To determine if the correct version of Java is installed, open a terminal or command prompt and enter
The version displayed should be 1.6.0 or above. If running this command generates an error, Java is likely not installed.
java -version
- You must have downloaded the Sync Tool. It is available as a link near the top of this page.
Starting the Sync Tool
- To run the Sync Tool, open a terminal or command prompt and navigate to the directory where the Sync Tool is located
- To display the help for the Sync Tool, run
java -jar synctool-0.7.0-driver.jar
- When running the Sync Tool for the first time, you will need to use these options:
Short Option
Long Option
Argument Expected
Required
Description
Default Value (if optional)
-h
--host
Yes
Yes
The host address of the DuraCloud DuraStore application
-r
--port
Yes
No
The port of the DuraCloud DuraStore application
443
-i
--store-id
Yes
No
The Store ID for the DuraCloud storage provider
The primary storage provider is used
-s
--space-id
Yes
Yes
The ID of the DuraCloud space where content will be stored
-u
--username
Yes
Yes
The username necessary to perform writes to DuraStore
-p
--password
Yes
Yes
The password necessary to perform writes to DuraStore
-c
--content-dirs
Yes
Yes
A list of the directory paths to monitor and sync with DuraCloud. If multiple directories are included in this list, they should be separated by a space.
-w
--work-dir
Yes
Yes
The state of the sync tool is persisted to this directory
-f
--poll-frequency
Yes
No
The time (in ms) to wait between each poll of the sync-dirs
10000 (10 seconds)
-t
--threads
Yes
No
The number of threads in the pool used to manage file transfers
3
-m
--max-file-size
Yes
No
The maximum size of a stored file in GB (value must be between 1 and 5), larger files will be split into pieces
1
-d
--sync-deletes
No
No
Indicates that deletes performed on files within the content directories should also be performed on those files in DuraCloud; if this option is not included all deletes are ignored
Not set
-x
--exit-on-completion
No
No
Indicates that the sync tool should exit once it has completed a scan of the content directories and synced all files; if this option is included, the sync tool will not continue to monitor the content dirs
Not set
- When the Sync Tool runs, it creates a backup of your configuration in the work directory that you specify. When running the tool again, you can make use of this file to keep from having to re-enter all of the options specified on the initial run. In this case you need only a single option:
Short Option
Long Option
Argument Expected
Required
Description
-g
--config-file
Yes
Yes
Read configuration from this file (a file containing the most recently used configuration can be found in the work-dir, named synctool.config)
- An example for running the Sync Tool
java -jar synctool-0.8.0-driver.jar -w C:\tools\synctool\backup -c C:\files\important -f 2000 -h test.duracloud.org -s important-dir-backup -t 5 -u myname -w mypassword
Runtime commands
- While the Sync Tool is running, these commands are available. Just type them on the command line where the tool is running.
Short Command
Long Command
Description
x
exit
Tells the Sync Tool to end its activity and close
c
config
Prints the configuration of the Sync Tool (the same information is printed at startup)
s
status
Prints the current status of the Sync Tool
l <Level>
N/A
Changes the log level to <Level> (may be any of DEBUG, INFO, WARN, ERROR)
h
help
Prints the runtime command help