Storage Elements in AEM 6

Overview of platform changes

One of the most important changes in AEM 6 are the innovations at the repository level.

Currently, there are two node storage implementations available in AEM6: Tar storage, and MongoDB storage.

Tar Storage

The Tar storage uses tar files. It stores the content as various types of records within larger segments. Journals are used to track the latest state of the repository.

There are several key design principles it was build around:

  • Immutable Segments

The content is stored in segments that can be up to 256KiB in size. They are immutable, which makes it easy to cache frequently accessed segments and reduce system errors that may corrupt the repository.  

Each segment is identified by a unique identifier (UUID) and contains a continuous subset of the content tree. In addition, segments can reference other content. Each segment keeps a list of UUIDs of other referenced segments.

  • Locality

Related records like a node and its immediate children are usually stored in the same segment. This makes searching the repository very fast and avoids most cache misses for typical clients that access more than one related node per session.

  • Compactness

The formatting of records is optimized for size to reduce IO costs and to fit as much content in caches as possible.

Running a freshly installed AEM instance with Tar Storage

By default, AEM 6 uses the Tar storage to store nodes and binaries, using the default configuration options. To manually configured its storage settings, follow the below procedure:

  1. Download the AEM 6 quickstart jar and place it in a new folder.

  2. Unpack AEM by running:

    java –jar cq-quickstart-6.jar -unpack

  3. Create a folder named crx-quickstart\install in the installation directory.

  4. Create a file called org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.cfg in the newly created folder.

  5. Edit the file and set the configuration options. The following options are available for Segment Node Store, which is the basis of AEM's Tar storage implementation:

    • repository.home: Path to repository home under which various repository related data is stored. By default segment files would be stored under the crx-quickstart/segmentstore directory.
    • tarmk.size: Maximum size of a segment in MB. The default is 256MB.

     

  6. Start AEM.

Mongo Storage

The MongoDB storage leverages MongoDB for sharding and clustering. The repository tree is kept in one MongoDB database where each node is a separate document.

It has several particularities:

  • Revisions

For each update (commit) of the content, a new revision is created. A revision is basically a string that consists of three elements:

  1. A timestamp derived from the system time of the machine it was generated on
  2. A counter to distinguish revisions created with the same timestamp
  3. The cluster node id where the revision was created
  • Branches

Branches are supported, which allows client to stage multiple changes and make them visible with a single merge call.

  • Previous documents

MongoDB storage adds data to a document with every modification. However, it only deletes data if a cleanup is explicitly triggered. Old data is moved when a certain threshold is met. Previous documents only contain immutable data, which means they only contain committed and merged revisions.

  • Cluster node metadata

Data about active and inactive cluster nodes is kept in the database in order to facilitate cluster operations.

 

A typical AEM cluster setup with MongoDB storage:

file

Running a freshly installed AEM instance with Mongo Storage

AEM 6 can be configured to run with MongoDB storage by following the below procedure:

  1. Download the AEM 6 quickstart jar and place it into a new folder.

  2. Unpack AEM by running the following command:

     

    java –jar cq-quickstart-6.jar -unpack

  3. Make sure that MongoDB is installed and an instance of mongod is running. For more info, see Installing MongoDB.

  4. Create a folder named crx-quickstart\install in the installation directory.

  5. Configure the node store by creating a configuration file with the name of the configuration you want to use in the crx-quickstart\install directory.

    The Document Node Store (which is the basis for AEM's MongoDB storage implementation) uses a file called org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService.cfg

     

  6. Edit the file and set your configuration options. The following options are available:

     

    • mongouri: The MongoURI required to connect to Mongo Database. The default is mongodb://localhost:27017
    • db: Name of the Mongo database. By default new AEM 6 installations use aem-author as the database name.
    • cache: The cache size in MB. This is distributed among various caches used in DocumentNodeStore. The default is 256
    • changesSize: Size in MB of capped collection used in Mongo for caching the diff output. The default is 256
    • customBlobStore: Boolean value indicating that a custom data store will be used. The default is false.
  7. Create a configuration file with the PID of the data store you wish to use and edit the file in order to set the configuration options. For more info, please see Configuring Node Stores and Data Stores.

  8. Start the AEM 6 jar with a MongoDB storage backend by running:

    java -jar cq-quickstart-6.jar -r crx3,crx3mongo
            

    Code samples are intended for illustration purposes only.

    Where -r is the backend runmode. In this example, it will start with MongoDB support.

Maintaining the Repository

Compacting Tar Files

As data is never overwritten in a tar file, the disk usage increases even when only updating existing data. To make up for the growing size of the repository, AEM employs a garbage collection mechanism called Tar Compaction. The mechanism will reclaim disk space by removing obsolete data from the repository.

Revision Clean Up

By default, tar file compaction is automatically run each night between 2 am and 5 am. The automatic compaction can be triggered manually in the Operations Dashboard via a maintenance job called Revision Clean Up

To start Revision Clean Up you need to:

  1. Go to the AEM Welcome Screen.

  2. In the main AEM window, go to Tools - Operations - Dashboard - Maintenance or directly browse to http://localhost:4502/libs/granite/operations/content/maintenance.html

  3. Click on Daily Maintenance Window.

  4. Hover over the Revision Clean Up window and press the Start button.

file

The icon will turn orange to indicate that the Revision Clean Up job is running. You can stop it at any time by hovering the mouse over the icon and pressing the Stop button:

file

Invoking Revision Garbage Collection via the JMX Console

  1. Open the JMX Console by going to http://localhost:4502/system/console/jmx

  2. Click the RevisionGarbageCollection MBean.

  3. In the next window, click startRevisionGC() and then Invoke to start the Revision Garbage Collection job.

Note

Due to the mechanics of the garbage collection, the first run will actually add 256 MB of disk space. Subsequent runs will work as expected and start shrinking the repository size.

Offline Compaction via the Oak-run Tool

For faster compaction of the Tar files  (e.g. to trim none-production env) and situations where normal garbage collection doesn't work, Adobe provides a manual Tar compaction tool called Oak-run. It can be downloaded here:

* oak-run-1.0.13.jar
Oak-run 1.0.13 Tar Compaction Tool (WARNING: Only for instances running Oak 1.0.12 or later version).

Caution

If you have not applied the Oak 1.0.12 hotfix or later version to your AEM instance yet then use oak-run-1.0.8.jar below, not oak-run-1.0.12.jar.  oak-run-1.0.12.jar is not compatible with repositories using older versions.

* oak-run-1.0.8.jar
Oak-run 1.0.8 Tar Compaction Tool (For instances with Oak version < 1.0.12).

The tool is a runnable jar that can be manually run to compact the repository. The procedure is called offline compaction because the repository needs to be shut down in order to properly run the tool.

Normal operation of the tool also requires old checkpoints to be cleared before the compaction takes place.

The procedure to run the tool is:

  1. Shut down AEM.

  2. Use the tool to find old checkpoints:

    java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore
            

    Code samples are intended for illustration purposes only.

  3. Then, delete the unreferenced checkpoints:

    java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore rm-unreferenced
            

    Code samples are intended for illustration purposes only.

  4. Finally, run the compaction and wait for it to complete:

    java -jar oak-run.jar compact install-folder/crx-quickstart/repository/segmentstore
            

    Code samples are intended for illustration purposes only.

​