CRX Clustering

You are reading the CRX 2.3 version of CRX Clustering.
This version has reached End of core support. For further details see our technical support periods.
This documentation is also available for the following versions:  CRX 2.2  CRX 2.1  CRX 2.0 (for CQ 5.3) 

CRX provides built-in clustering capability that enables multiple repository instances running on separate machines to be joined together to act as a single repository. The repository instances within a cluster are referred to as cluster nodes (not to be confused with the unrelated concept of the JCR node, which is the unit of storage within a repository).

Each node of the cluster is a complete, fully functioning CRX instance with a distinct address on the network. Any operation (read or write) can be addressed to any cluster node.

When a write operation updates the data in one cluster node, that change is immediately and automatically applied to all other nodes in the cluster. This guarantees that any subsequent read from any cluster node will return the same updated result.

How Clustering Works

A CRX cluster consists of one master node and some number of slave nodes. A single standalone CRX instance is simply a master node with zero slave nodes.


Typically, each node runs on a distinct physical machine. The nodes must all be able to communicate directly with each other over TCP/IP.

Each cluster node retains an independent identity on the network, meaning that each can be written to and read from independently. However, whenever a write operation is received by a slave node, it is redirected to the master node, which makes the change to its own content while also guaranteeing that the same change is made to all slave nodes, ensuring synchronization of content.

In contrast, when a read request is received by any cluster node (master or slave) that node serves the request immediately and directly. Since the content of each node is synchronized by the clustering mechanism, the results will be consistent regardless of which particular node is read from.

The usual way to take advantage of this is to have a load-balancing webserver (for example, the Dispatcher) mediate read requests and distribute them across the cluster node, thus increasing read responsiveness.

Architecture and Configuration

When setting up a clustered installation, there are two main deployment decisions to be made:

  • How the cluster fits into the larger architecture of the installation
  • How the cluster is configured internally

 

The two main options for cluster architecture are:

  • Active/active clustering
  • Active/passive clustering

The cluster configuration defines which internal storage mechanisms are used and how these are shared or synchronized across cluster nodes. The three most commonly used configurations are:

 

  • Shared nothing
  • Shared data store
  • Shared external database

First we will look at cluster architecture. Cluster configuration will be covered later on.

CQ without Clustering

Because CQ is built on top of the CRX repository, CRX clustering can be employed in CQ installations to improve performance. To understand the various ways that CRX clustering can fit into the larger CQ architecture we will first take a look at two common, non-clustered CQ architectures: single publish and multiple publish.

Single Publish

A CQ installation consists of two separate environments: author and publish

The author environment is used for adding, deleting and editing pages of the website. When a page is ready to go live, it is activated, causing the system to replicate the page to the publish environment from where it is served to the viewing audience.

In the simplest case, the author environment consists of a single CQ instance running on a single server machine and the publish instance consists of another single CQ instance running on another machine. In addition, a Dispatcher is usually installed between the publish server and the web for caching.

file

Multiple Publish

A common variation on the installation described above is to install multiple publish instances (usually on separate machines). When a change made on the author instance is ready to go live, it is replicated simultaneously to all the publish instances.

As long as all activations and deactivations of pages are performed identically on all publish instances, the content of the publish instances will stay in sync. Depending on the configuration of the front-end server, requests from web surfers are dealt with in one of two ways:

  • The incoming requests are distributed among the publish instances by the load-balancing feature of the Dispatcher.
  • The incoming requests are all forwarded to a primary publish instance until it fails, at which point all requests are forward to the secondary publish instance (in this arrangement there are usually only two instances).

Note

The dispatcher is an additional piece of software provided by Adobe in conjunction with CQ that can be installed as a module within any of the major web servers (Microsoft IIS, Apache, etc.). Load-balancing is a feature of the dispatcher itself, and its configuration is described here.

The configuration of failover, on the other hand, is typically a feature of the web server within which the dispatcher is installed. For information on configuring failover, therefore, please consult the documentaion specifc to your web server.

file

Is Multiple Publish a Form of Clustering?

Note

The architecture shown above describes two possible configurations for a multiple publish system: load balancing and failover.

The first setup is sometimes described as a form of active/active clustering and the second as a form of active/passive clustering. However, these archiectures are not considered true CRX clustering because:

  • This solution is specific to the CQ system since the concept of activation and replication of pages is itself specific to CQ. So this is not a generic solution for all CRX applications.
  • Synchronization of the publish instances is dependant on an external process (the replication process configured in the author instance), so the publish instances are not in fact acting as a single system.
  • If content is written to a publish instance from the external web (as is the case with user generated content such as forum comments) CQ uses reverse-replication to copy the content to the author instance, from where it is replcated back to all the publish instances. While this system is sufficient in many cases, it lacls the robustness of content synchronization under true clustering.

For these reasons the multiple publish arrangement is not a true clustering solution as the term is usually employed.

Clustering Architecture

Clustering architectures usually fall into one of two categories: active/active or active/passive.

Active/Active

In an active/active setup the processing load is divided equally among all nodes in the cluster using the a load balancer such as the CQ Dispatcher. In normal circumstances, all nodes in the cluster are active and running at the same time. When a node fails, the load balancer redirects the requests from that node across the remaining nodes.

file

Active/Passive

In an active/passive setup, a front-end server passes all requests back to only one of the cluster nodes (called the primary) which then serves all of these requests itself. A secondary node (there are usually just two) runs on standby. Because it is part of the cluster the secondary node remains in sync with the primary, it just does not actually serve any requests itself. However, if the primary node fails then the front-end server detects this and a "failover" occurs where the server then redirects all requests to the secondary node instead.

file

Hot Backup

The Active/Passive setup described above can also be thought of and used as a "hot backup" system. The secondary server (the slave) functions as the backup for the primary server. Because the clustering system automatically synchronizes the two servers, no manual backing-up is required. In the case of the failure of the primary server, the backup can be immediately deployed by activating the secondary connection.

CQ with Clustering

Above we discussed two common CQ architectures that do not use CRX clustering and then gave two general examples of how CRX clsuterng can work (active/active and active/passive). Now we will bput these together and see how CRX clustering can be used within a CQ installation.

Publish Clustering

There are a number of options for using true CRX clustering within a CQ installation. The first one we will look at is publish clustering.

In the publish clustering arrangement, the publish instances of the CQ installation are combined into a single cluster. The front-end server (including the dispatcher) is then configured either for active/active behavior (where load is equally distributed across cluster nodes) or active/passive behavior (where load is only redirected on failover).

The following diagram illiustrates a publish cluster arrangement:

file

Note

Hot Backup

A variation on publish clustering with failover is to have the secondary publish server completely disconnected from the web, functioning instead as a continually updated and synchronized backup server. If at any time the back up server needs to be put on line, this could then be done manually by reconfiguring the front-end server.

Author Clustering

Clustering can also be employed to improve the performance of the CQ author environment. An arrangement using both publish and author clustering is shown below:

file

Other Variations

It is also possible to set up other variations on the author clustering theme by pairing the author cluster with either multiple (non-clustered) publish instances or with a single publish instance. However, such variants are rarely used in practice.

Clustering and Performance

When faced with a performance problem in single instance CQ system (either at the publish or author levels), clustering may provide a solution. However, the extent of the improvement, if any, depends upon where the performance bottleneck is located.

Under CQ/CRX clustering, read performance scales linearly with the number of nodes in the cluster. However, additional cluster nodes will not increase write performance, since all writes are serialized through the master node.

Clearly, the increase in read performance will benefit the publish environment, since it is primarily a read system. Perhaps surprisingly, clustering can also benefit the author environment because even in the author environment the vast majority of interactions with the repository are reads. In the usual case 97% of repository requests in an author environment are reads, while only 3% are writes.

This means that despite the fact that CQ clustering does not scale writes, clustering can still be a very effective way of improving performance both at the publish and author level.

However, while increasing the number of cluster nodes will increase read performance, a bottleneck will still be reached when the frequency of requests gets to the point where the 3% of requests that are writes overwhelm the capabilities of the single master node.

In addition, while the percentage of writes under normal authoring conditions is about 3%, this can rise in situations where the authoring system handles a large number of more complex processes like workflows and multisite management.

In cases where a write bottleneck is reached, additional cluster nodes will not improve performance. In such circumstances the correct strategy is to increase the hardware performance through improved CPU speed and increased memory.

Configurations Options

Clustering can be configured in a number ways. The common options are:

  • Shared Nothing
  • Shared Data Store
  • Shared External Database
 The configuration parameters are found in the file /crx-quickstart/repository/repository.xml.

Shared Nothing

This is the default configuration. In this configuration all elements of CRX storage are held per cluster node and synchronized over the network. No shared storage is used. The TarPersistenceManager is used for the persistence store and version storage, the TarJournal is used for the journal and the ClusterDataStore is used for the data store.

In most cases you will not need to manually configure the repository.xml file for shared nothing cluster, since the GUI clustering setup automatically takes care of it (see GUI Setup of Shared Nothing Clustering).

<Repository>
    ...
    <DataStore class="com.day.crx.core.data.ClusterDataStore"/>
    ...
    <Workspace name="${wsp.name}" simpleLocking="true">
        ...
        <PersistenceManager class="com.day.crx.persistence.tar.TarPersistenceManager"/>
        ...
    </Workspace>
    ...
    <Cluster>
        <Journal class="com.day.crx.persistence.tar.TarJournal"/>
    </Cluster>
    ...
</Repository>
        

Code samples are intended for illustration purposes only.

Shared Data Store

In this configuration the workspace stores and the journal are maintained per-cluster node as above, but the ClusterDataStore is configured to be shared among the cluster nodes.The parameter path points to the location on a shared file system where the data store will be held. Every node in the cluster must be configured to point to the same shared location.

<DataStore class="com.day.crx.core.data.ClusterDataStore">
    <param name="minRecordLength" value="4096"/>
    <param name="path" value="${rep.home}/shared/datastore"/>
</DataStore>

        

Code samples are intended for illustration purposes only.

Caution

Formerly, the FileDataStore implementation was sometimes used for shared data store installations. This configuration is now deprecated. If a shared data store is required, the ClusterDataStore implementation should be used.

Shared External Database

In some cases you may wish to store all your content in an RDBMS. In such cases the shared external database configuration may be appropriate. In this configuration all cluster nodes shared a common persistence store, journal and data store, which are all held in a single external RDBMS.

 

<Repository>
   ...
   <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
      <param name="url" value="jdbc:mysql://192.168.2.34:3306/crx1"/>
      <param name="user" value="crx"/>
      <param name="password" value="crxpassword"/>
      <param name="minRecordLength" value="4096"/>
      <param name="maxConnections" value="30"/>
   </DataStore>  
   ...
   <Workspace name="${wsp.name}" simpleLocking="true">
      ...
      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager">
         <param name="url" value="jdbc:mysql://192.168.2.34:3306/crx1"/>
         <param name="user" value="crx"/>
         <param name="password" value="crxpassword"/>
         <param name="schemaObjectPrefix" value="${wsp.name}"/>
      </PersistenceManager>
      ...
   </Workspace>
   ...
   <Cluster>
      <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
         <param name="revision" value="${rep.home}/revision.log" />
         <param name="driver" value="com.mysql.jdbc.Driver"/>
         <param name="url" value="jdbc:mysql://192.168.2.34:3306/crx1" />
         <param name="user" value="crx"/>
         <param name="password" value="crxpassword"/>
         <param name="databaseType" value="mysql"/>
     </Journal>
   </Cluster>
   ...
</Repository>
        

Code samples are intended for illustration purposes only.

In addition, the external database storage is sometimes used to store the version storage as well:

 

<Repository>
   ...
   <Versioning rootPath="${rep.home}/version">
      ...
      <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
         <param name="url" value="jdbc:mysql://192.168.2.34:3306/crx1"/>
         <param name="user" value="crx"/>
         <param name="password" value="crxpassword"/>
         <param name="schemaObjectPrefix" value="version_"/>
      </PersistenceManager>
      ...
   </Versioning>
   ...
</Repository>
        

Code samples are intended for illustration purposes only.

When configuring CRX to use an external database, typically, a single backend database system is used to store all the elements (Workspaces, Data Store, Journal and Version Storage) for all instances in the cluster. Note that nothing prevents this backend database system from itself being a clustered system.

For details on the available database storage mechanisms, see Persistence Managers and Other Storage Elements.

Cluster Properties and Cluster Node ID

The file crx-quickstart/repository/cluster_node.id contains the cluster node ID. Each instance within a cluster must have a unique ID.

This file is automatically created by the system. By default it contains a randomly generated UUID, but it can be any string. When copying a cluster node, this file should be copied. If two nodes (instances) within a cluster contain the same cluster node id, only one of them will be able to connect.

Example file:

crx-quickstart/repository/cluster_node.id
08d434b1-5eaf-4b1c-b32f-e9abedf05f23
        

Code samples are intended for illustration purposes only.

The file crx-quickstart/repository/cluster.properties contains cluster configuration properties. The file is automatically updated by the system if the cluster configuration is changed in the GUI.

Example file:

crx-quickstart/repository/cluster.properties
#Cluster properties
#Wed May 23 16:02:06 CEST 2012
cluster_id=86cab8df-3aeb-4985-8eb5-dcc1dffb8e10
addresses=10.0.2.2,10.0.2.3
members=08d434b1-5eaf-4b1c-b32f-e9abedf05f23,fd11448b-a78d-4ad1-b1ae-ec967847ce94
        

Code samples are intended for illustration purposes only.

The cluster_id property contains the identifier for this cluster, each node in the cluster must have the same cluster ID. By default this is a randomly generated UUID, but it can be any string.

The addresses property contains a comma separated list of the IP addresses of all nodes in this cluster. This list is used at the startup of each cluster node to connect to the other nodes that are already running. The list is not needed if all cluster nodes are running on the same computer (which may be the case in certain circumstances, such as testing).

The members properties contains a comma separated list of the cluster node IDs that participate in the cluster. This property is not required for the cluster to work, it is for informational purposes only.

System Properties

The following system properties affect the cluster behavior:

socket.connectTimeout: The maximum number of milliseconds to wait for the master to respond (default: 1000). A timeout of zero means infinite (block until connection established or an error occurs).

socket.receiveTimeout: the maximum number of milliseconds to wait for a reply from the master or slave (SO_TIMEOUT; default: 60000). A timeout of zero means infinite.

com.day.crx.core.cluster.DisableReverseHostLookup: Disable the reverse lookup from the master to the slave when connecting (default: false). If not set, the master checks if the slave is reachable using InetAddress.isReachable(connectTimeout).

Clustering Setup

In this section we describe the process of setting up a CRX cluster.

Clustering Requirements

CRX clustering is designed to operate using network connectivity equivalent to that available between servers in a data centre. Optimal cluster perfomance requires very low network latency between nodes, where available bandwidth is at least 100 MBits/sec, and network links between nodes are not congested with non-cluster traffic. The most common situation, where servers are connected to a common network switch, is ideal.

Because of the need for rapid round-trip communication between cluster nodes during an update, the performance of a CRX cluster is sensitive to the latency between nodes. Where network latency can be kept in the millisecond range, cluster operation is likely to be satisfactory, even when the nodes do not share the same physical location. However, physically separate datacentres often experience network latencies that are higher than 10 ms, and this latency reduces cluster performance.

The following requirements should be observed when setting up clustering:

  • Each cluster node (CRX instance) should be on its own dedicated machine. During development and testing one can install multiple instances on a single machine, but for a production environment this is not recommended.
  • For shared-nothing clustering, the primary infrastructure requirement is that the network connecting the cluster nodes have high reliability, high-availability and low-latency.
  • For shared data store and shared external database clustering, the shared storage (be it file-based or RDBMS-based) should be hosted on a high-reliability, high-availability storage system. For file storage the recommended technologies include enterprise-class SAN systems, NFS servers and CIFS servers. For database storage a high-availability setup running either Oracle or MySQL is recommended. Note that since, in either case, the data is ultimately stored on a shared system, the reliability and availability of the cluster in general depends greatly on the reliability and availability of this shared system.

GUI Setup of Shared Nothing Clustering

By default a freshly installed CRX instance runs as a single-master, zero-slave, shared-nothing cluster. Additional slave nodes can be added to the master easily through the cluster configuration GUI.

If you wish to deploy a shared data store or shared external database cluster, or if you wish to tweak the settings of the default shared-nothing cluster, you will have to perform a manual configuration. In this section we describe the GUI deployment of a shared-nothing cluster. Manual configuration is covered in the next section.

  1. Install two or more CRX instances. In a production environment, each would be installed on a dedicated server. For development and testing, you may install multiple instances on the same machine.

  2. Ensure that all the instance machines are networked together and visible to each other over TCP/IP.

    Caution

    Cluster instances communicate with each other through port 8088. Consequently this port must be open on all servers within a cluster. If one or more of the servers is behind a firewall, that firewall must be configured to open port 8088.

    If you need to use another port (e.g., due to firewal setup), use Manual Cluster Setup approach. You can configure the cluster communication port to another port number, in which case that port must be visible through any firewall that may be in place.

    To configure the cluster communications port, change the portList parmeter in the <Journal> element of repository.xml as described here.

  3. Decide which instance will be the master instance. Note the host name and port of this instance. For example, if you are running the master instance on your local machine, its address might be localhost:4502.

  4. Every instance other than the master will be a slave instance.

    You will need to connect each slave instance to the master by going to their respective cluster configuration pages here:

    http://<slave-address>/libs/granite/cluster/content/admin.html

    file
  5. In the Cluster Configuration page enter the address of the master instance in the field marked Master URL, as follows:

        http://<master-address>/

    For example if both your slave and master on your local machine you might enter

        http://localhost:4502/

    Once you filled in the Master URL, enter your Username and Password on the master instance and click Join. You must have administrator access to set up a cluster.

  6. Joining the cluster may take a few minutes.

    Allow some time before refreshing the master and slave UIs in the browser. Once the slave is properly connected to the master you should see something similar to the following on the master and slave cluster UIs:

    file

    Note

    In some cases, a restart of the slave instance might be required to avoid stale sessions. Addtionally, while the slave is joining the cluster you may get a 503 Service Unavailable response from that server. This is normal and temporary. The best way to see the actual progress of the cluster joining process is to watch the error.log.

Note

When configuring file-based persistence managers such as the Tar PM (as opposed to database-based PMs) the file system location specified (for example, location where the tar PM is configured to store its tar files) should be a true local storage, not network storage. The use of network storage in such cases will degrade performance.

Manual Cluster Setup

In some cases a user may wish to set up a cluster without using the GUI. There are two ways to do this: manual slave join and manual slave clone.

The first method, manual slave join, is the same as the standard GUI procedure except that it is done without the GUI. Using this method, when a slave is added, the content of the master is copied over to it and a new search index on the slave is built from scratch. In cases where an pre-existing instance with a large amount of content is being "clusterized" this process can take a long time.

In such cases it is recommended to use the second method, manual slave clone. In this method the master instance is copied to a new location either at the file system level (i.e., the crx-quickstart directory is copied over) or using the online backup feature and the new instance is then adjusted to play the role of slave. This avoids the rebuilding of the index and for large repositories can save a lot of time.

Caution

The parameter preferredMaster is experimental. It is not supported as it has been found to cause synchronization issues in cluster setups.

Manual Slave Join

The following steps are similar to joining a cluster node using the GUI. That means the data is copied over the network, and the search index is re-created (which may take some time):

If a quickstart deployment is being used (i.e., stand-alone, without an application server) then:

On the master

  • Copy the files crx-quickstart-*.jar and and license.properties to the desired directory.
  • Start the instance:
    java -Xmx512m -jar *.jar
  • Verify that the instance is up and running, then stop the instance.
  • In the file crx-quickstart/repositorycluster.properties, add the IP address of the slave instance you are adding below (if that slave is on a different machine, otherwise this step is not needed).
  • Stop the instance.
  • If a shared data store is to be be used:
    • Change
      crx-quickstart/repository/repository.xml, as described in Shared Data Store and Data Store, above.
    • Assuming you have configured <shared>/datastore/ as the shared location, copy the contents of the crx-quickstart/repository/repository/datastore to the directory <shared>/datastore.
  • Start the instance, and verify that it still works.

On the slave

  • Copy the files crx-quickstart-*.jar and license.properties to the desired directory (usually on a different machine from the master, unless you are just testing).
  • Unpack the JAR file:
    java -Xmx512m -jar crx-quickstart-*.jar -unpack
  • Copy the files repository.xml and cluster.properties from the master:
    cp ../n1/crx-quickstart/repository/repository.xml crx-quickstart/repository/
    cp ../n1/crx-quickstart/repository/cluster.properties crx-quickstart/repository/
  • Copy the namespaces and node types from the master:
    cp -r ../n1/crx-quickstart/repository/repository/namespaces/ crx-quickstart/repository/repository/
    cp -r ../n1/crx-quickstart/repository/repository/nodetypes/ crx-quickstart/repository/repository/
  • If this new slave is on a different machine from the master, append the IP address of the master to the cluster.properties file of the slave:
    echo "addresses=x.x.x.x" >> crx-quickstart/repository/cluster.properties
    where x.x.x.x is replaced by the correct address. As mentioned above, the IP address of the slave should be added to the master's cluster.properties file as well.
  • Start the slave instance:
    java -Xmx512m -jar crx-quickstart-*.jar

If an application server deployment is being used (i.e., the CQ or CRX war file is being deployed) then:

On the master

  • Deploy the war file into the application server. See Installing CRX in an Application Server and Installing CQ in an application server.
  • Stop the application server.
  • In the file crx-quickstart/repositorycluster.properties, add the IP address of the slave instance you are adding below (if that slave is on a different machine, otherwise this step is not needed).
  • If a shared data store is to be be used change crx-quickstart/repository/repository.xml, as described in Shared Data Store and Data Store, above.
  • Move the datastore directory to the required place
  • Start the application server, and verify that it still works.

On the slave

  • Deploy the war file into the application server.
  • Stop the application server.
  • Copy the files repository.xml and cluster.properties from the master:
    cp ../n1/crx-quickstart/repository/repository.xml crx-quickstart/repository/
    cp ../n1/crx-quickstart/repository/cluster.properties crx-quickstart/repository/
  • Copy the namespaces and node types from the master:
    cp -r ../n1/crx-quickstart/repository/repository/namespaces/ crx-quickstart/repository/repository/
    cp -r ../n1/crx-quickstart/repository/repository/nodetypes/ crx-quickstart/repository/repository/
  • If this new slave is on a different machine from the master, append the IP address of the master to the cluster.properties file of the slave:
    echo "addresses=x.x.x.x" >> crx-quickstart/repository/cluster.properties
    where x.x.x.x is replaced by the correct address. As mentioned above, the IP address of the slave should be added to the master's cluster.properties file as well.
  • Start the application server.

Manual Slave Cloning

The following steps clone the master instance and change that clone into a slave, preserving the existing search index:

Master

Your existing repository will be the master instance.

If it is feasible to stop the master instance:

  • Stop the master instance either through the GUI switch, the command line stop script orr, in the case of an application server deployment, by stopping the application server.
  • In the case of a quickstart deployment, copy the crx-quickstart directory of the master over to the location where you want the slave installed, using a normal filesystem copy (cp, for example).
  • In the case of an application server installation, copy the exploded war file from the master to same location in the slave application server (see Installing CRX in an Application Server and Installing CQ in an application server).
  • Restart the master.

If it is not feasible to stop the master instance:

  • Do an online backup of the instance to the new slave location. The online backup tool can be made to write the copy directly into another directory or to a zip file which you can then unpack in the new location. See here for details. The process can be automated using curl or wget. For example:

    curl -c login.txt "http://localhost:7402/crx/login.jsp?UserId=admin&Password=xyz&Workspace=crx.default"

    curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&&zipFileName=&targetDir=<targetDir>"
     

Slave

In the new slave instance directory (or, in the application server case, the exploded war file directory of the slave):

  • Modify the file crx-quickstart/repository/cluster_node.id so that it contains a unique cluster node ID. This ID must differ from the IDs of all other nodes in the cluster.
  • Add the node ID of the master instance and all other slave nodes (apart from this one) separated by commas to the file crx-quickstart/repository/cluster.properties. For example, you could use something like the following command (with the capitalized items replaced with the actual IDs used):

    echo "members=MASTER_NODE_ID,SLAVE_NODE_1_ID" >> crx- quickstart/repository/cluster.properties

  • Add the master instance IP address and the IP address of all other slave instances (apart from this one) to the file crx-quickstart/repository/cluster.properties. For example, you could use something like the following command (with the capitalized items replaced with the actual IDs used):

    echo "addresses=MASTER_IP_ADDRESS,SLAVE_1_IP_ADDRESS" >> crx- quickstart/repository/cluster.properties

  • Remove the file sling.id.file, which stores the instance id by which nodes in a cluster are distinguished from each other. This file will get re-generated with a new ID if missing, so deleting it is sufficient.
    The file is in different places depending on installation, so it needs to be found:

    rm -i $(find . -type f -name sling.id.file)

  • Start the slave instance. It will join the cluster without re-indexing. Note: Once the slave is started, the master cluster.properties file will automatically be updated by appending the node ID and IP address of the slave.

Troubleshooting

Out-of-Sync Cluster Nodes

file

In some cases, when the master instances is stopped while the other cluster instances are still running, The master instance cannot re-join the cluster after being restarted.

This can occur in cases where a write operation was in progress at the moment that the master node was stopped, or where a write operation occured a few seconds before the master instance was stopped. In these cases, the slave instance may not receive all changes from the master instance. When the master is then re-started, CRX will detect that it is out of sync with the remaining cluster instances and the repository will not start. Instead, an error message is written to the server.log saying the repository is not available, and the following or a similar error message in the file crx-quickstart/logs/crx/error.log and crx-quickstart/logs/stdout.log:

ClusterTarSet: Could not open (ClusterTarSet.java, line 710)
java.io.IOException: This cluster node and the master are out of sync. Operation stopped.
Please ensure the repository is configured correctly.
To continue anyway, please delete the index and data tar files on this cluster node and restart.
Please note the Lucene index may still be out of sync unless it is also deleted.
...
java.io.IOException: Init failed
...
RepositoryImpl: failed to start Repository: Cannot instantiate persistence manager 
...
RepositoryStartupServlet: RepositoryStartupServlet initializing failed
        

Code samples are intended for illustration purposes only.

Avoiding Out-of-Sync Cluster Instances

To avoid this problem, ensure that the slave cluster instances are always stopped before the master is stopped.

If you are not sure which cluster instance is currently the master, open the page http://localhost:port/crx/config/cluster.jsp. The master ID listed there will match the the contents of the file crx-quickstart/repository/cluster_node.id of the master cluster instance.

Automatic Out-Of-Sync Prevention

To avoid desynchronization of cluster nodes the following mechanism is used:

  1. As soon as a cluster of two or more nodes is connected, a marker file named clustered.txt is created on each cluster node in its repository root directory (crx-quickstart/repository/clustered.txt). This includes the master node and all slave nodes.
  2. When a master node detects that there are no longer any slave nodes attached to it (for example, if all slave nodes have failed) then it automatically deletes its own clustered.txt file.
  3. The file is not deleted when a slave node is stopped normally. 
  4. Also, the file is not deleted when any node (master or slave) is stopped abnormally (i.e., if the node process is killed or a crash occurs).

The result of this is that in a cluster that has failed, any node that was a slave previously will have this marker file while the former master will not have this file, unless it was shutdown abnormally.

If the file exists on a cluster node when it is restarted, that cluster node will only start as a slave, not a master. It will then attempt to reconnect to a master, once a second, for 60 seconds (by default, see below). If it fails to connect in that time then the startup is aborted and the node shuts down.

In order to restart the cluster in this case you must restart the master first, make sure it is up and running, and then restart the slaves.

 

If the file clustered.txt exists on all nodes in a cluster, including the former master, this means that there is much higher chance that the the cluster is out-of-sync. There fore the administrator must elect one of the nodes to be the new master, remove the clustered.txt from that node and clone that node to create new slave nodes (cloning is described in Manual Slave Clone, above).

The choice of which of the nodes should be the new master can be complex. In general it is best to choose the node which has the most data, since this usually indicates that it was the most recent to be updated (recall that the tarPM is append-only, so over short periods of time we can rely on the larger store being the most recently updated). To detemine this the administrator must compare the sizes of the most recent data_*.tar file in each of the following locations

  • the default workspace (crx-quickstart/repository/workspaces/crx.default/data_*.tar
  • the version workspace (crx-quickstart/repository/version/data_*.tar), and
  • the journal (crx-quickstart/repository/tarJournal/data_*.tar).
(where * is a five digit number, for example, the first data file created on a fresh install is always data_00000.tar)

 

Note that this method of determining which noe to make the new master is not foolproof and other factors may need to be taken into consideration.

The maximum number of seconds that a slave will attempt to connect with  a master can be changed by setting the system property

com.day.crx.core.cluster.WaitForMasterRetries (default: 60)

There is a one second delay after each attempt, so the default results in 60 attempts before giving up.

 

Recovering an Out-of-Sync Cluster Instance

To re-join a cluster instance that is out of sync, there are a number of solutions:

  • Create a new repository and join the cluster node as normal.
  • Use the Online Backup feature to create a cluster node. In many cases this is the fastest way to add a cluster node.
  • Restore an existing backup of the cluster instance node and start it.
  • As described in the error message, delete the index and data tar files that are out-of-sync on this cluster node and restart. Note that the Lucene search index may still be out of sync unless it is also deleted. This procedure is discouraged as it requires more knowledge of the repository, and may be slower than using the online backup feature (specially if the Lucene index needs to be re-built).

Time-to-Sync

The time that it takes to synchronize an out-of-sync cluster node depends on two factors:

  • The length of time that the node has been disconnected from the cluster.
  • The rate of change of information in the live node.

Combined, these factors determine how much data has changed during the time that the node has been disconnected, and therefore, the amount of data that needs to be transfered and written to that node to re-synchronize it.

Since the time taken depends on these variables, it will differ from installation to installation. However, by making some reailistic worst-case scenario projections we can estimate an example sync-time:

  • Assume that a cluster node will not be out of sync for any more than 24 hours, since in most cases this is sufficient time for manual intervention to occur.
  • Also assume that the website in question is the main corporate website for an enterprise of approximately 10,000 employees. 24 hours of work on a website in such an organization is projected as follows:
    • import 250 2MB images
    • create 2400 pages
    • perform 10000 page modifications
    • activate 2400 pages

Internal testing at Adobe has shown that given the above assumptions the results for a CQ 5 installation using CRX clustering are:

  • Synchronization took 13 minutes and 38 seconds.
  • The author response time increased from 400ms to 700ms during the synchronization (including. network latency).
  • The inital crx-quickstart folder size was 1736MB.
  • The crx-quickstart folder size after the simulated 24h was 2588MB.
  • Therefore a total transfer of 852MB of was needed to perform the synchronization.

Locking in Clusters

Active clusters do not support session-scoped locks. Open-scoped locks or application-side solutions for synchronizing write operations should be used instead.

Tar PM Optimization

The Tar PM stores its data in standard Unix-style tar files. Occassionally, you may want to optimize these storage files to increase the speed of data access. Optimizing a Tar PM clustered system is essentially identical to optimizing a stand-alone Tar PM instance (see Optimizing Tar Files).

The only difference is that to optimize a cluster, you must run the optimization process on the master instance. If the optimization is started on the master instance, the shared data as well as the local cache of the other cluster instances is optimized automatically. There is a small delay before the changes are propagated (a few seconds). If one cluster instance is not running while optimizing, the tar files of that cluster instance are automatically optimized the next time the instance is started.

Manual Failover

In an active/passive environment with two nodes, under normal operating conditions, all incoming request are served by the master, while the slave maintains content synchronization with the master but does not itself serve any requests. Ideally, if the master node fails, the slave, having identical content to the master, would automatically jump in and start serving requests, with no noticeable downtime.

Of course, there are many cases where this ideal behavior is not possible, and a manual intervention by an adminstrator is required. The reason for this is that, in general, the slave cannot know with certainty what type of failure has occured. And the type of failure (and there many types) dictates the appropriate response. Hence, intelleigent intervention is usually required.

For example, imagine a two node active/passive cluster with master A and slave B. A and B keep track of each other's state through "heartbeat" signal. That is, they periodically ping each other and wait for a response.

Now, if B finds that its pings are going unanswered for some relatively long period, there are, generally speaking, two possible reasons:

  1. A is not responding because it is inoperable.
  2. A is still operating normally and the lack of response is due to some other reason, most likely a failure of the network connection between the two nodes.

If (1) is the case then the logical thing is for B to become the master and for requests to be redirected to and served by B. In this situation, if B does not take on A's former role, the service will be down; an emergency situation.

But if (2) is the case then the logical thing to do is for B to simply wait and continue pinging until the network connection is restored, at which point it will resynchronize with the master, effecting all changes that occured on the master during the down time. In this situation, if B instead assumes that A is down and takes over, there would be two functioning master nodes, in other words a "split-brain" situation where there is a possibility that the two nodes become desynchronized.

Because these two scenarios are in conflict and the slave cannot distinguish between them, the default setting in CRX is that upon repeated failures to reconnect to a non-responsive master the slave does not become the master. However, see below for a case where you may wish alter this default behavior.

Assuming this default behavior, at this point that a manual intervention is needed. 

Your first priority is to ensure that at least one node is up and serving requests. Once you have accomplished this, you can worry about starting the other nodes, synchronizing them, and so forth. Here is procedure to follow:

  1. First, determine which of the scenarios above applies.
  2. If the master node A is still responding and serving requests then you have scenario (2) and the problem lies in the connection between master slave. 
    • In this case no emergency measures need to be taken, since the service is still functional, you must simply reestablish the cluster.
    • First, stop the slave B.
    • Ensure that the network problem has been solved.
    • Restart B. It should automatically rejoin the cluster. If it is out-sync you can troubleshoot it according to the tips given in Recovering an Out-Of-Sync Cluster Node.
  3. On the other hand, if A is down then you have scenario (1). In this case your first priority shgould be to get functioning instance up and running and serving requests as soon as possible.
    • You have two choices: Restart A and keep it as master or redirect requests to B and make it the new master. Which one you choose should depend on which can be achieved most quickly and with the highest likelyhood of success.
    • If the problem with A is easy to fix then restart it and ensure that it is functioning properly. Then restart B and ensure that it properly rejoins the cluster.
    • If the problem with the master looks more involved, it may be easier to redirect incoming requests to the slave and restart the slave making it he new master.
      • To do this you must first stop B and remove the file
        crx-quickstart/repository/clustered.txt. This is a flag file that CRX creates to keep track of whether a restarted system should regard itself as master or slave. The presence f the file inidicates to the system that before restart it as a slave, so it iwll attempt to automatically rejoin its cluster. The absence of the file indicates that the instance was, before restart, a matsre (or a lone un-clusterd instance), in which case it does not attempt to rejoin any clusters.
      • Now restart B.
      • Once you have confirmed that B is up and running and serving requests you can work on fixing A. When as is in working order you can join it to B, except this time A will be the slave and B the master. Alternatively you may switch back to the original arrangement. Just don't forget abit the clustered.txt file! 
      • It may be possible that A now reports that it is out of sync with cluster node B. See Recovering an Out-Of-Sync Cluster Node for information on fixing this.

Enabling Automatic Failover

As discussed above the default failover behavior in a CRX cluster requires manual intervention because in the typical case a slave cannot distinguish between an inoperable master instance and an inoperable network connection to the master, and it is only the former case that actual failover should be performed.

In some cases, howver, it may make sense to enable automatic failover. This is only recommende if the the slave and master have two or more, distinct but redunadant network connections to one another.

In such a case a slave will only detect heartbeat timeout from the master if either the master is down, or all the redundant network connections are down.

Since the chance of all redunadant conenctions being down at the same time is less than that of a single failed connection, we can have more confidence that a heartbeat timeout really does mean that the matser is down. Obviously this depends on the redundant connections being truly independent, to the greatest extent possible. As well, having even more than two such redundant indepenedent connections would also increase or confidence in the relaiability of the heartbeat monitor.

The level of confidence  in the heartbeat monitor is necessarily a judgement call on the part of the administrator of the cluster. But if it is determined that the level of confidence is high enough, then the default failover behavior can be changed setting to true the parameter becomeMasterOnTimeout in the <Cluster> element of the repository.xml.

Cluster Solution Scenarios

Apart from the general guidelines above, there are number of very specific actions to be taken in with respect to particular types of cluster failures. The tables below summarize these scenarios. This specific information should used in concert with procedure above.

Emergency

Scenario Expected behavior Restore
Master shutdown with high load on both nodes Shutdown within 2 min; failover; possibly out-of-sync Restart old master. Master may be out of sync and may need to be restored.
Slave shutdown with load on both nodes Shutdown within 1 min Restart slave
Master process dies (kill -9) Failover Restart old master. Master may be out of sync and may need to be restored.
Complete hardware failure on master (e.g. power failure) with default config Slave may be blocked or become master Restart slave. Master may be out of sync and may need to be restored.
Heavy load on slave node (DOS behavior) Should still work, slow Remove load, restart slave.
Heavy load on master node (DOS behavior) Should still work, slow Remove load, restart master.
Master runs out of disk space Master should close/kill itself Free space, restart master.
Master runs out of memory Master should close/kill itself Restart master.

Network Problems

Scenario Expected behavior Restore
Network failure on master/slave, restore network within 5 min Slave should re-join Not required
Network failure on master/slave, restore network after 7 min Slave should re-join Not required
Set "becomeMasterOnTimeout", stop network for 6 min Slave should become master (split brain) Restore previous master or slave from the other node
Set "becomeMasterOnTimeout", set system property "socket.receiveTimeout" to "10000" (10 seconds); stop network for 1 min Slave should become master (split brain)
Persistent network failure on master Master continues, slave blocked Restore network
Network failure on master/slave with load on both nodes Master continue, slave blocked Restore network
Slow connection between the nodes Should still work, slow Improve connection

Maintenance Operations

Scenario Expected behavior Restore
Slave shutdown without load Shutdown within 1 min Restart old slave
Master shutdown without load Shutdown within 1 min; failover Restart old master

Project QA with Clustering

When testing a clustered CRX-based project (a website built on CQ for example) the following QA tests should be passed at least once before content entry (with final code but with not all content) and at least once after content entry and before the project goes live:

  • Cluster slave shutdown and restart: Verify rejoin to cluster and basic smoke test (sanity check).
  • Cluster master shutdown and restart: Verify rejoin to cluster and basic smoke test (sanity check).
  • Cluster slave kill -9 or pull plug: Verify rejoin to cluster and basic smoke test (sanity check).
  • Cluster master kill -9 or pull plug: Verify rejoin to cluster and basic smoke test (sanity check).
  • Recover whole cluster from backup (disaster recovery).
​