Replication

Replication agents are central to Adobe Experience Manager (AEM) as the mechanism used to:

  • Publish (activate) content from an author to a publish environment.
  • Explicitly flush content from the Dispatcher cache.
  • Return user input (for example, form input) from the publish environment to the author environment (under control of the author environment).

Requests are queued to the appropriate agent for processing.

Replicating from Author to Publish

Replication, to a publish instance or dispatcher, takes place in several steps:

  • the author requests that certain content be published (activated); this can be initiated by a manual request, or by automatic triggers which have been preconfigured.
  • the request is passed to the appropriate default replication agent; an environment can have several default agents which will always be selected for such actions.
  • the replication agent "packages" the content and places it in the replication queue.
  • in the Websites tab the colored status indicator is set for the individual pages.
  • the content is lifted from the queue and transported to the publish environment using the configured protocol; usually this is HTTP.
  • a servlet in the publish environment receives the request and publishes the received content; the default servlet is http://localhost:4503/bin/receive.
  • multiple author and publish environments can be configured.
file

Replicating from Publish to Author

Features such as comments and forms, allow users to enter information on a publish instance. For this a type of replication is needed to return this information to the author environment, from where it is redistributed to other publish environments. However, due to security considerations, any traffic from the publish to the author environment must be strictly controlled.

This is known as reverse replication and functions using an agent in the publish environment which references the author environment. This agent places the input into an outbox. This outbox is matched with replication listeners in the author environment. The listeners poll the outboxes to collect any input made and then distribute it as necessary. This ensures that the author environment controls all traffic.

Replication - Out of the Box

The Geometrixx website that is included in a standard installation of AEM can be used to illustrate both forms of replication.

To follow this example and use the default replication agents you need to Install AEM with:

  • the author environment on port 4502
  • the publish environment on port 4503

Nota

The replication queues:

  • Default Agent (publish)
  • Reverse Replication (outbox)
  • Reverse Replication Agent (publish_reverse)

should all be enabled by default. If you want to check the status of either the agent or the queue use the Tools console to check. See monitoring your Replication Agents.

Replication (Author to Publish)

  1. Navigate to the support page on the author environment.
        http://localhost:4502/content/geometrixx/en/support.html

  2. Edit the page to add some new text.

  3. Activate Page to publish the changes.

  4. Open the support page on the publish environment:
        http://localhost:4503/content/geometrixx/en/support.html

  5. You can now see the changes that you entered on author.

This replication is actioned from the author environment by the:

  • Default Agent (publish)
    This agent replicates content to the default publish instance.
    Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
    http://localhost:4502/etc/replication/agents.author/publish.html.

Reverse Replication (Publish to Author)

  1. Navigate to the Geometrixx Outdoors Community Hiking page on the publish environment:
        http://localhost:4503/content/geometrixx-outdoors/en/community/hiking.html

  2. At the top of the page, click Sign In Or Register. Enter the credentials for the default admin account and click Sign In.

  3. Below the Best GPS for Hiking topic, click No Comments Yet.

    file
  4. Enter a comment and click Post Comment.

    Your comment appears below the Comments heading.

  5. Open the Hiking community page on the author environment:
        http://localhost:4502/content/geometrixx-outdoors/en/community/hiking.html.
    Click the Comment button to see the comment that you entered on the publish page.

  6. Open CRXDE Lite for the author environment:
        http://localhost:4502/crx/de

  7. Navigate to the user generated content:
        /content/usergenerated/content/geometrixx-outdoors/en/community/hiking/journal/2012/12/best_gps_for_hiking/jcr:content/comments/1Bucket

  8. Here you can see that your comment has been replicated from the publish to the author environment.

This replication is actioned from the publish environment by the:

  • Reverse Replication (outbox)
    This agent stores reverse replicated content in the outbox (repo://var/replication/outbox), which acts as a queue.
    Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
    http://localhost:4502/etc/replication/agents.publish/outbox.html

This agent transfers content to the author environment, by communicating with the:

  • Reverse Replication Agent (publish_reverse)
    This agent polls the default publish instance to retrieve reverse replicated content from the outbox.
    Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
    http://localhost:4502/etc/replication/agents.author/publish_reverse.html

Nota

Reverse replication does not happen when a node is created. This is a known issue. Work around this issue by using the workflow launcher with the following configuration:

  • Event Type: Node Created
  • Node Type: cq:Page
  • Globbing: <branch where pages are created>(/.*)
  • Workflow: Reverse Replication
  • Run Modes: publish

Replication Agents - Out of the Box

The following agents are available in a standard AEM installation:

  • Default Agent
    Used for replicating from author to publish.
  • Dispatcher Flush
    This is used for managing the Dispatcher cache. See Invalidating Dispatcher Cache from the Authoring Environment and Invalidating Dispatcher Cache from a Publishing Instance for more information.
  • Reverse Replication
    Used for replicating from publish to author.
  • Static Agent
    This is an "Agent that stores a static representation of a node into the filesystem.".
    For example with the default settings, content pages and dam assets are stored under /tmp, either as HTML or the appropriate asset format. See the Settings and Rules tabs for the configuration.
    This was requested so that when the page is requested directly from the application server the content can be seen. This is a specialized agent and (probably) will not be required for most instances.

Replication Agents - Configuration Parameters

When configuring a replication agent from the Tools console, four tabs are available within the dialog:

Settings

  • Name

    A unique name for the replication agent.

  • Description

    A description of the purpose this replication agent will serve.

  • Enabled

    Indicates whether the replication agent is currently enabled.

    When the agent is enabled the queue will be shown as:

    • Active when items are being processed.
    • Idle when the queue is empty.
    • Blocked when items are in the queue, but cannot be processed; for example, when the receiving queue is disabled.
  • Serialization Type

    The type of serialization:

    • Default: Set if the agent is to be automatically selected.
    • Dispatcher Flush: Select this if the agent is to be used for flushing the dispatcher cache.
  • Retry Delay

    The delay (waiting time in milliseconds) between two retries, should a problem be encountered.

    Default: 60000

  • Agent User Id

    Depending on the environment, the agent will use this user account to:

    • collect and package the content from the author environment
    • create and write the content on the publish environment

    Leave this field empty to use the system user account (the account defined in sling as the administrator user; by default this is admin).

    Precaución

    For an agent on the author environment this account must have read access to all paths that you want to have replicated.

    Precaución

    For an agent on the publish environment this account must have the create/write access required to replicate the content.

    Nota

    This can be used as a mechanism for selecting specific content for replication.

  • Log Level

    Specifies the level of detail to be used for log messages.

    • Error: only errors will be logged
    • Info: errors, warnings and other informational messages will be logged
    • Debug: a high level of detail will be used in the messages, primarily for debug purposes

    Default: Info

  • Use for reverse replication

    Indicates whether this agent will be used for reverse replication; returns user input from the publish to author environment.

Transport

  • URI

    This specifies the receiving servlet at the target location. In particular, you can specify the hostname (or alias) and context path to the target instance here.

    For example:

    • A Default Agent may replicate to http://localhost:4503/bin/receive
    • A Dispatcher Flush agent may replicate to http://localhost:8000/dispatcher/invalidate.cache

    The protocol specified here (HTTP or HTTPS) will determine the transport method.

    For Dispatcher Flush agents, the URI property is used only if you use path-based virtualhost entries to differentiate between farms, you use this field to target the  farm to invalidate. For example, farm #1 has a virtual host of www.mysite.com/path1/* and farm #2 has a virtual host of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm.

  • User

    User name of the account to be used for accessing the target.

  • Password

    Password for the account to be used for accessing the target.

  • NTLM Domain

    Domain for NTML authentication.

  • NTLM Host

    Host for NTML authentication.

  • Enable relaxed SSL

    Enable if you want self-certified SSL certificates to be accepted.

  • Allow expired certs

    Enable if you want expired SSL certificates to be accepted.

Proxy

The following settings are only needed if a proxy is needed:

  • Proxy Host

    Hostname of the proxy used for transport.

  • Proxy Port

    Port of the proxy.

  • Proxy User

    User name of the account to be used.

  • Proxy Password

    Password of the account to be used.

  • Proxy NTLM Domain

    The proxy NTLM domain.

  • Proxy NTLM Host

    The proxy NTLM domain.

Extended

  • Interface

    Here you can define the socket interface to bind to.

    This sets the local address to be used when creating connections. If this is not set, the default address will be used. This is useful for specifying the interface to use on multi-homed or clustered systems.

  • HTTP Method

    The HTTP method to be used.

    For a Dispatcher Flush agent this is nearly always GET and should not be changed (POST would be another possible value).

  • HTTP Headers

    These are used for Dispatcher Flush agents and specify elements that must be flushed.

    For a Dispatcher Flush agent the three standard entries should not need changing:

    • CQ-Action:{action}
    • CQ-Handle:{path}
    • CQ-Path:{path}

    These are used, as appropriate, to indicate the action to be used when flushing the handle or path. The sub-parameters are dynamic:

    • {action} indicates a replication action
    • {path} indicates a path

    They are substituted by the path/action relevant to the request and therefore do not need to be "hardcoded":

    Nota

    If you have installed AEM in a context other than the recommended default context, then you will need to register the context in the HTTP Headers. For example:
        CQ-Handle:/<yourContext>{path}

  • Close Connection

    Enable to close the connection after each request.

  • Connect Timeout

    Timeout (in milliseconds) to be applied when trying to establish a connection.

  • Socket Timeout

    Timeout (in milliseconds) to be applied when waiting for traffic after a connection has been established.

  • Protocol Version

    Version of the protocol; for example 1.0 for HTTP/1.0.

Triggers

These settings are used to define triggers for automated replication:

  • Ignore default

    If checked, the agent is excluded from default replication; this means it will not be used if a content author issues a replication action.

  • On Modification

    Here a replication by this agent will be automatically triggered when a page is modified. This is mainly used for Dispatcher Flush agents, but also for reverse replication.

  • On Distribute

    If checked, the agent will automatically replicate any content that is marked for distribution when it is modified.

  • On-/Offtime reached

    This will trigger automatic replication (to activate or deactivate a page as appropriate) when the ontimes or offtimes defined for a page occur. This is primarily used for Dispatcher Flush agents.

  • On Receive

    If checked, the agent will chain replicate whenever receiving replication events.

  • No Status Update

    When checked the agent will not force a replication status update.

  • No Versioning

    When checked the agent will not force versioning of activated pages.

Configuring your Replication Agents

Configuring your Replication Agents from the Author Environment

From the Tools tab in the author environment you can configure replication agents that reside in either the author environment (Agents on author) or the publish environment (Agents on publish). The following procedures illustrate the configuration of an agent for the author environment, but can be used for both.

Nota

When a dispatcher handles HTTP requests for author or publish instances, the HTTP request from the replication agent must include the PATH header. In addition to the following procedure, you must add the PATH header to the dispatcher list of client headers. (See /clientheaders (Client Headers).)

  1. Access the Tools tab in AEM.

  2. Click Replication (left pane to open the folder).

  3. Double-click Agents on author (either the left or the right pane).

  4. Click the appropriate agent name (which is a link) to show detailed information on that agent.

  5. Click Edit to open the configuration dialog:

    file
  6. The values provided should be sufficient for a default installation. If you make changes then click OK to save them (see Replication Agents - Configuration Parameters for more details of the individual parameters).

Nota

A standard installation of AEM specifies admin as the user for transport credentials within the default replication agents.

This should be changed to a site-specific replication user account with the privileges to replicate the required path(s).

Configuring Reverse Replication

Reverse replication is used to get user content generated on a publish instance back to an author instance. This is commonly used for moderated forums, blogs, surveys and registration forms, amongst others.

For security reasons, most network topologies do not allow connections from the "Demilitarized Zone" (a subnetwork that exposes the external services to an untrusted network such as the Internet).

As the publish environment is usually in the DMZ, to get content back to the author environment the connection must be initiated from the author instance. This is done with:

  • an outbox in the publish environment where the content is placed.
  • an agent (publish) in the author environment which periodically polls the outbox for new content.

To do this you need:

A reverse replication agent in the author environment

This acts as the active component to collect information from the outbox in the publish environment:

If you want to use reverse replication then ensure that this agent is activated.

file
A reverse replication agent in the publish environment (an outbox)

This is the passive element as it acts as an "outbox". User input is placed here, from where it is collected by the agent in the author environment.

file

Configuring Replication for Multiple Publish Instances

Upon installation a default agent is already configured for replication to a publish instance running on port 4503 of the localhost.

To configure replication for an additional publish instance you need to create, and configure, a new replication agent:

  1. Open the Tools tab in AEM.

  2. Select Replication, then Agents on author in the left panel.

  3. Select New....

  4. Set the Title and Name, then select Replication Agent.

  5. Click Create to create the new agent.

  6. Double-click the new agent item to open the configuration panel.

  7. Click Edit - the Agent Settings dialog will open - the Serialization Type is already defined as Default, this must remain so.

    • In the Settings tab:
      • Activate Enabled.
      • Enter a Description.
      • Set the Retry Delay to 60000.
      • Leave the Serialization Type as Default.
    • In the Transport tab:
      • Enter the required URI for the new publish instance; for example,
            http://localhost:4504/bin/receive.
      • Enter the site-specific user account used for replication.
      • You can configure other parameters as required.
  8. Click OK to save the settings.

You can then test operation by updating, then publishing, a page in the author environment.

The updates will appear on all publish instances that have been configured as above.

If you encounter any problems, you can check the logs on the author instance. Depending on the level of detail required you can also set the Log Level to Debug using the Agent Settings dialog as above.

Nota

This can be combined with use of the Agent User Id to select different content for replicating to the individual publish environments. For each publish environment:

  1. Configure a replication agent for replicating to that publish environment.
  2. Configure a user account; with the access rights required to read the content that will be replicated to that specific publish environment.
  3. Assign the user account as the Agent User Id for the replication agent.

Configuring a Dispatcher Flush agent

Default agents are included with the installation. However, certain configuration is still needed and the same applies if you are defining a new agent:

  1. Open the Tools tab in AEM.

  2. Select Replication, then Agents on publish in the left panel.

  3. Double-click on the Dispatcher Flush item to open the overview.

  4. Click Edit - the Agent Settings dialog will open:

    • In the Settings tab:
      • Activate Enabled.
      • Enter a Description.
      • Leave the Serialization Type as Dispatcher Flush, or set it as such if creating a new agent.
    • In the Transport tab:
      • Enter the required URI for the new publish instance; for example,
            http://localhost:80/dispatcher/invalidate.cache.
      • Enter the site-specific user account used for replication.
      • You can configure other parameters as required.

    For Dispatcher Flush agents, the URI property is used only if you use path-based virtualhost entries to differentiate between farms, you use this field to target the  farm to invalidate. For example, farm #1 has a virtual host of www.mysite.com/path1/* and farm #2 has a virtual host of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm.

    Nota

    If you have installed AEM in a context other than the recommended default context, then you need to configure the HTTP Headers in the Extended tab.

  5. Click OK to save the changes.

  6. Return to the Tools tab, from here you can Activate the Dispatcher Flush agent (Agents on publish).

The Dispatcher Flush replication agent is not active on author. You can access the same page in the publish environment by using the equivalent URI; for example, http://localhost:4503/etc/replication/agents.publish/flush.html.

Controlling Access to Replication Agents

Access to the pages used to configure the replication agents can be controlled by using user and/or group page permissions on the etc/replication node.

Nota

Setting such permissions will not affect users replicating content (e.g. from the Websites console or sidekick option). The replication framework does not use the "user session" of the current user to access replication agents when replicating pages.

Configuring your Replication Agents from CRXDE Lite

Various parameters of your replication agents can be configured using CRXDE Lite.

If you navigate to /etc/replication you can see the following three nodes:

  • agents.author
  • agents.publish
  • treeactivation

The two agents hold configuration information about the appropriate environment, and are only active when that environment is running. For example, agents.publish will only be used in the publish environment. The following screenshot shows the publish agent in the author environment, as included with AEM WCM:

file

Monitoring your Replication Agents

To monitor a replication agent:

  1. Access the Tools tab in AEM.

  2. Click Replication.

  3. Double-click the link to agents for the appropriate environment (either the left or the right pane); for example Agents on author.

    The resulting window shows an overview of all your replication agents for the author environment, including their target and status.

  4. Click the appropriate agent name (which is a link) to show detailed information on that agent:

    file

    Here you can:

    • See whether the agent is enabled.
    • See the target of any replications.
    • See whether the replication queue is currently active (enabled).
    • See whether there are any items in the queue.
    • Refresh or Clear to update the display of queue entries; this helps you see items enter and leave the queue.
    • View Log to access the log of any actions by the replication agent.
    • Test Connection to the target instance.
    • Force Retry on any queue items if required.

    Precaución

    Do not use the "Test Connection" link for the Reverse Replication Outbox on a publish instance.

    If a replication test is performed for an Outbox queue, any items that are older than the test replication will be re-processed with every reverse replication.

    If such items already exist in a queue, they can be found with the following XPath JCR query and should be removed.

    /jcr:root/var/replication/outbox//*[@cq:repActionType='TEST'] 

​