After installing the connector for the first time, all the data will be uploaded from the local datastore. After this first run, the local datastore will be used as a local cache. All operations (with the exception of writes) will first check for the item in the local cache and if no record is found, it will be accessed from S3.
The local cache has a size limit, specifiable by the cacheSize parameter. When the size of the cache exceeds the limit, it will automatically undergo purging in order to clear older items and reclaim space. During purging, the local cache makes sure that it doesn’t delete any in-progress asynchronous uploads.
Things to note about the local cache mechanism:
- It can be disabled by setting the cacheSize parameter to 0. In this case, all operations will be performed directly in the S3 cloud and the local cache completely ignored.
- If the size of a file exceeds the size of the local cache, it will be served directly from S3.
- When the cache is being purged, it will not be available to the S3 data store. Files from the local cache that have pending uploads will still be available.
- Files deleted from the S3 data store will also be deleted from the local cache.
Multi-Threaded Content migration from FileSystem DataStore to S3
Multi-threading can be configured in order to speed up file operations to or from the S3 data store. This can be particularly useful for initial migrations from a local datastore where large amounts of data need to be uploaded.
Asynchronous Upload to S3
The asyncUploadLimit parameter limits the number of asynchronous uploads to the S3 data store. Once this limit is reached, the next upload will be synchronous until one of asynchronous uploads completes. To disable this feature the asyncUploadLimit parameter can be set to 0. The default value is 100.
Asynchronous Upload Cache
The connector also uses a upload cache for asynchronous uploads. It tracks their status and removes finished uploads or adds new ones to the cache when necessary.