Storage Elements in AEM 6.1

You are reading the AEM 6.1 version of Storage Elements in AEM 6.1.
This documentation is also available for the following versions:  AEM 6.2  AEM 6.0 

Overview of Storage in AEM 6

One of the most important changes in AEM 6 are the innovations at the repository level.

Currently, there are two node storage implementations available in AEM 6: Tar storage, and MongoDB storage.

Tar Storage

Running a freshly installed AEM instance with Tar Storage

By default, AEM 6 uses the Tar storage to store nodes and binaries, using the default configuration options. To manually configured its storage settings, follow the below procedure:

  1. Download the AEM 6 quickstart jar and place it in a new folder.

  2. Unpack AEM by running:

    java –jar cq-quickstart-6.jar -unpack

  3. Create a folder named crx-quickstart\install in the installation directory.

  4. Create a file called org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.cfg in the newly created folder.

  5. Edit the file and set the configuration options. The following options are available for Segment Node Store, which is the basis of AEM's Tar storage implementation:

    • repository.home: Path to repository home under which various repository related data is stored. By default segment files would be stored under the crx-quickstart/segmentstore directory.
    • tarmk.size: Maximum size of a segment in MB. The default is 256MB.

     

  6. Start AEM.

Mongo Storage

Running a freshly installed AEM instance with Mongo Storage

AEM 6 can be configured to run with MongoDB storage by following the below procedure:

  1. Download the AEM 6 quickstart jar and place it into a new folder.

  2. Unpack AEM by running the following command:

     

    java –jar cq-quickstart-6.jar -unpack

  3. Make sure that MongoDB is installed and an instance of mongod is running. For more info, see Installing MongoDB.

  4. Create a folder named crx-quickstart\install in the installation directory.

  5. Configure the node store by creating a configuration file with the name of the configuration you want to use in the crx-quickstart\install directory.

    The Document Node Store (which is the basis for AEM's MongoDB storage implementation) uses a file called org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg

     

  6. Edit the file and set your configuration options. The following options are available:

     

    • mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017
    • db: Name of the Mongo database. By default new AEM 6 installations use aem-author as the database name.
    • cache: The cache size in MB. This is distributed among various caches used in DocumentNodeStore. The default is 256
    • changesSize: Size in MB of capped collection used in Mongo for caching the diff output. The default is 256
    • customBlobStore: Boolean value indicating that a custom data store will be used. The default is false.
  7. Create a configuration file with the PID of the data store you wish to use and edit the file in order to set the configuration options. For more info, please see Configuring Node Stores and Data Stores.

  8. Start the AEM 6 jar with a MongoDB storage backend by running:

    java -jar cq-quickstart-6.jar -r crx3,crx3mongo
            

    Code samples are intended for illustration purposes only.

    Where -r is the backend runmode. In this example, it will start with MongoDB support.

Maintaining the Repository

As data is never overwritten in a tar file, the disk usage increases even when only updating existing data. To make up for the growing size of the repository, AEM employs a garbage collection mechanism called Revision Cleanup. The mechanism will reclaim disk space by removing obsolete data from the repository, and has three phases: estimation, compaction, cleanup. In the past the revision cleanup was often referenced as compaction.

 

Caution

Offline revision cleanup is the only supported way of performing revision cleanup. Therefore, the revision cleanup task in the Operations Dashboard should be disabled. 

Note

If you are using an external data store, make sure you run Data Store Garbage Collection after the offline revision process is complete.

For additional information on node and data store configuration, see Configuring Node Stores and Data Stores.

Note

For more information about the revision cleanup process, see the Frequently Asked Questions.

Performing Offline Revision Cleanup

Caution

Different versions of the Oak-run tool need to be used depending on the Oak version you use with your AEM installation. Please check the version requirements list below before using the tool:

  • For Oak versions 1.0.0 through 1.0.11 or 1.1.0 through 1.1.6, use Oak-run version 1.0.11
  • For Oak versions newer than the above, use the version of Oak-run that matches the Oak core of your AEM installation.

Adobe provides a tool called Oak-run for performing revision cleanup. It can be downloaded at the following location:

http://mvnrepository.com/artifact/org.apache.jackrabbit/oak-run/

The tool is a runnable jar that can be manually run to compact the repository. The process is called offline revision cleanup because the repository needs to be shut down in order to properly run the tool. Make sure to plan the cleanup in accordance with your maintenance window.

Normal operation of the tool also requires old checkpoints to be cleared before the maintenance takes place.

For tips on how to increase the performance of the cleanup process, see Increasing the Performance of Offline Revision Cleanup.

The procedure to run the tool is:

  1. Always make sure you have a recent backup of the AEM instance.

    Shut down AEM.

  2. Use the tool to find old checkpoints:

    java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore
            

    Code samples are intended for illustration purposes only.

  3. Then, delete the unreferenced checkpoints:

    java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore rm-unreferenced
            

    Code samples are intended for illustration purposes only.

  4. Finally, run the compaction and wait for it to complete:

    java -jar oak-run.jar compact install-folder/crx-quickstart/repository/segmentstore
            

    Code samples are intended for illustration purposes only.

Increasing the Performance of Offline Revision Cleanup

Since version 1.0.22, the oak-run tool introduces several features with an aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible.

The list includes several command line parameters, as described below:

  • -Dtar.memoryMapped. Use this to enable memory mapped operations for tar file to greatly increase performance. You can set this as true or false. It is highly recommended you enable this feature in order to speed up compaction.
  • -Dupdate.limit. Defines the threshold for the flush of a temporary transaction to disk. The default value is 5000000.
  • -Dcompress-interval. Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available.
  • -Dcompaction-progress-log. The number of compacted nodes that will be logged. The default value is 1500000, which means that the first 1500000 compacted nodes will be logged during the operation. Use this in conjunction with the next parameter documented below.
  • -Dlogback.configurationFile. Use a configuration file for logging. You can use the below configuration file to enable the logging of the nodes that are being compacted:
  • -Dtar.PersistCompactionMap. Set this parameter to true to use disk space instead of heap memory for compaction map persistance. Requires the oak-run tool versions 1.4 and higher. For further details also see question 6 in the FAQ section.

Caution

Memory mapped file operations do not work correctly on some versions of Windows. Make sure that you use the tool without the -Dtar.memoryMapped parameter on Windows platforms, otherwise the revision cleanup will fail.

An example of the parameters in use:

java -Dtar.memoryMapped=true -Dupdate.limit=5000000 -Dcompress-interval=10000000 -Dcompaction-progress-log=1500000 -Dlogback.configurationFile=logback.xml -Xmx8g -jar oak-run-*.jar checkpoints <repository>
        

Code samples are intended for illustration purposes only.

Note

Use as much heap memory as possible for faster I/O operations. It is recommended you use at least eight gigabytes for most common deployments.

Performance Tuning and Maintenance Recommendations

Follow the below recommendations in order to maintain maximum efficiency while upkeeping the repository:

  1. Make sure you run Offline Revision Cleanup whenever possible during scheduled maintenance hours.
  2. If you are using an external data store, make sure you run Data Store Garbage Collection after revision cleanup has been completed.
  3. Follow the recommendations in this knowledgebase article for tips on improving the performance of your AEM instance.

Revision Cleanup Frequently Asked Questions

     1. What is the supported way of performing revision cleanup?

  • Offline revision cleanup is the only supported way of performing revision cleanup. Online revision cleanup is present in AEM 6.2 under restricted support. For more information, see Online Revision Cleanup.

     2. How frequently should Offline Revision Cleanup be performed?

  • It depends on the repository growth rate. As a general rule of thumb, for average content repositories, it is recommended that you perform revision cleanup every 2 weeks for an author instance, and once per quarter for a publish instance.

     3. What are the factors that determine the duration of the Offline Revision Cleanup?

  • The repository size and the amount of revisions that need to be cleaned up determines the duration of the cleanup.

     4. What's the worst that can happen if you do not perform revision cleanup?

  • The AEM instance will run out of disk space, which will cause outages in production. It is highly recommended that you follow the monitoring best practices as mentioned in Maintenance and Monitoring.

      5. What is the difference between a revision and a page version?

  • Oak revision: Oak organizes all the content in a large tree hierarchy that consists of nodes and properties. Each snapshot or revision of this content tree is immutable, and changes to the tree are expressed as a sequence of new revisions. Typically, each content modification triggers a new revision. See also http://jackrabbit.apache.org/dev/ngp.html.
  • Page Version: Versioning creates a "snapshot" of a page at a specific point in time. Typically, a new version is created when a page is activated. For more information, see Working with Page Versions.

     6. How to speed up the Offline Revision Cleanup task if it does not complete within 8 hours ?

​