Introduction to the AEM Platform
The AEM platform in AEM 6 is based on Apache Jackrabbit Oak.
Apache Jackrabbit Oak is an effort to implement a scalable and performant hierarchical content repository for use as the foundation of modern world-class web sites and other demanding content applications.
It is the successor to Jackrabbit 2 and is used by AEM 6 as the default backend for its content repository, CRX.
Design principles and goals
Oak implements the JSR-283 (JCR 2.0) spec. Its principal design objectives are:
- Better support for big repositories
- Multiple distributed cluster nodes for high availability
- Better performance
- Support for many child nodes and Access Control Levels
The purpose of the Storage layer is to:
- Implement a tree model
- Make storage pluggable
- Provide a clustering mechanism
The Oak Core adds several layers to the storage layer:
- Access Level Controls
- Search and Indexing
The main objective of the Oak JCR is to transform JCR semantics into tree operations. It is also responsible for:
- Implementing the JCR API
- Containing commit hooks that implement JCR constraints
In addition, non-Java implementations are now possible and part of the Oak JCR concept.
The Oak storage layer provides an abstraction layer for the actual storage of the content.
Currently, there are two storage implementations available in AEM6: Tar Storage and MongoDB Storage .
The Tar storage uses tar files. It stores the content as various types of records within larger segments. Journals are used to track the latest state of the repository.
There are several key design principles it was build around:
- Immutable Segments
The content is stored in segments that can be up to 256KiB in size. They are immutable, which makes it easy to cache frequently accessed segments and reduce system errors that may corrupt the repository.
Each segment is identified by a unique identifier (UUID) and contains a continuous subset of the content tree. In addition, segments can reference other content. Each segment keeps a list of UUIDs of other referenced segments.
Related records like a node and its immediate children are usually stored in the same segment. This makes searching the repository very fast and avoids most cache misses for typical clients that access more than one related node per session.
The formatting of records is optimized for size to reduce IO costs and to fit as much content in caches as possible.
The MongoDB storage leverages MongoDB for sharding and clustering. The repository tree is kept in one MongoDB database where each node is a separate document.
It has several particularities:
For each update (commit) of the content, a new revision is created. A revision is basically a string that consists of three elements:
- A timestamp derived from the system time of the machine it was generated on
- A counter to distinguish revisions created with the same timestamp
- The cluster node id where the revision was created
Branches are supported, which allows client to stage multiple changes and make them visible with a single merge call.
- Previous documents
MongoDB storage adds data to a document with every modification. However, it only deletes data if a cleanup is explicitly triggered. Old data is moved when a certain threshold is met. Previous documents only contain immutable data, which means they only contain committed and merged revisions.
- Cluster node metadata
Data about active and inactive cluster nodes is kept in the database in order to facilitate cluster operations.
A typical AEM cluster setup with MongoDB storage:
What is different from Jackrabbit 2?
Because Oak is designed to be backwards compatible with the JCR 1.0 standard, there will be almost no changes on the user level. However, there are some noticeable differences that you need to take into account when setting up an Oak based AEM installation:
- Oak does not create indexes automatically. Because of this, custom indexes will need to be created when necessary.
- Unlike Jackrabbit 2 where sessions always reflect the latest state of the repository, with Oak a session reflects a stable view of the repository from the time the session was acquired. This is due to the MVCC model on which Oak is based on.
- Same name siblings (SNS) are not supported in Oak.