Show Menu
TOPICS×

About Incremental Index

You can use Incremental Index to index "pieces" of your live or staged website, such as a collection of frequently changed pages.

Using Incremental Index

An incremental index takes only seconds to perform and is useful on large capacity websites that can take many hours to completely index.
When you generate an incremental index, status information is displayed, such as start time, elapsed time, and errors during the indexing process. Information about the status of your last index is also displayed.
You can stop or restart the incremental indexing process at any time.
While the new incremental index builds for your live website, customers can continue to search your site using your last incremental index.

Configuring an incremental index of a staged website

You can configure what website pages you want to include in your incremental Index by specifying website URLs and URL masks.
To configure an incremental index of a staged website
  1. On the product menu, click Index > Incremental Index > Configuration .
  2. On the Incremental Index Configuration page, use the various fields to specify which pages that you want to index.
    Field
    Description
    Add or Update URLs
    Specify URLs.
    The search robot only indexes the specified documents that have changed since the last time you indexed.
    Additionally, the search robot follows links that are contained within the specified documents and indexes only those documents that have changed.
    This field must contain document URLs only and not masks as in the following example:
    https://www.mydomain.com/products/new.html
    You can use the following keywords with the URL:
    • noindex
      If you do not want to index the text on the page that matches a specified URL, but you want to follow the page's links, add noindex after the URL as in the following example:
      https://www.mydomain.com/products/new.html noindex
      Be sure you separate noindex from the URL with a space; a comma is not a valid separator.
    • nofollow
      If you want to index the text on the page that matches the specified URL, but you do not want to follow the page's links, add nofollow after the URL as in the following example:
      https://www.mydomain.com/products/new.html nofollow
      Be sure you separate nofollow from the URL with a space; a comma is not a valid separator.
    Find and Update URL Masks
    Specify simple URL masks—full path, partial path, or paths that use wild cards or regular expressions.
    The search robot finds all matching documents and indexes only those documents that have changed since the last time you indexed.
    Additionally, the search robot follows links that are contained within the matching documents and indexes only those pages that have changed. For example:
    https://www.mydomain.com/products/household/*.html
    You can also use regular expressions as in the following example:
    regexp ^https://www\.mydomain\.com/products/household/.*\.html$
    You can also use the keywords nofollow and noindex as described in Add or Update URLs above.
    Include and Exclude URL Masks
    Specify simple include or exclude URL masks—full path, partial path, or paths that use wild cards or regular expressions.
    The search robot finds and indexes ("include") or ignores ("exclude") documents based on the type of mask that is specified.
    When indexing a site, directions are followed in order of appearance. For example, the following list of masks:
    include https://www.mydomain.com/products/household/lightbulbs*.html
    exclude https://www.mydomain.com/products/
    indexes the pages lightbulbs1.html and lightbulbs2.html . However, it does not index any other pages that are listed under the products directory.
    A URL mask that appears first always takes precedence over one that appears later in the list. Additionally, if the search robot encounters a document that matches both an include mask and an exclude mask, the mask that is listed first takes precedence.
    You can also use the keywords nofollow and noindex as described in Add or Update URLs above.
    Include and Exclude Date Masks
    Specify simple include or exclude date masks—full path, partial path, or paths that use wild cards or regular expressions.
    The search robot finds and indexes ("include") or ignores ("exclude") documents based on both the URL and the date of documents.
    You can use the following types of date masks:
    • include-days NNN
      The search robot indexes all documents that match the specified URL mask and are NNN days or more old.
      You can follow the URL mask with one or more of the following keywords:
      • nofollow
      • noindex
      • server-date
      For example, the following mask includes all documents in the /archive/support folder that are 0 days or older:
      include-days 0 https://www.mydomain.com/archive/support/
    • include-date YYYY-MM-DD
      The search robot indexes all documents that match the specified URL mask and are as old or older than the YYYY-MM-DD date.
      You can follow the URL mask with one or more of the following keywords:
      • nofollow
      • noindex
      • server-date
      The following mask example includes all documents in the /archive/ folder dated on or before July 25, 2011:
      include-date 2011-07-25 https://www.mydomain.com/archive/
    • exclude-days NNN
      Disable indexing of all documents that match the specified URL mask and are NNN days or more old.
      Optionally, you can follow the URL mask by the keyword server-date .
      The following mask example excludes all PDF files that are 90 days old or older from your index:
      exclude-days 90 *.pdf
    • exclude-date YYYY-MM-DD
      Disable indexing of all documents that match the specified URL mask and are as old or older than the date YYYY-MM-DD.
      Optionally, you can follow the URL mask by the keyword server-date .
      The following mask example excludes all documents in the /archive/ folder dated on or before April 23, 2004:
      exclude-date 2004-04-23 https://www.mydomain.com/archive/
    Delete URLs
    Specify URLs.
    The search robot finds and deletes the specified documents from your search index. If a specified page is already in your search index, the robot deletes it before it adds or updates any other pages.
    This field must contain only document URLs, and not masks.
    Find and Delete URL Masks
    Specify simple URL masks—full path, partial path, or ones that use wild cards or regular expressions.
    If the specified URL mask matches pages in your search index, the search robot deletes the pages before it adds or updates any other pages. For example:
    https://www.mydomain.com/products/1998/household/*
    You can also use regular expressions as in the following example:
    regexp ^https://www\.mydomain\.com/products/199[567]/.*$
  3. Click Save Changes .
  4. (Optional) Do one of the following:

Setting the incremental index schedule for a live website

You can select the Incremental Index frequency and the base time that is used to crawl and update your incremental index.
The time that you select is local according to the time zone that is configured in Account Settings.
Web servers are often scheduled to go down for maintenance in the middle of the night. If your server is down during a scheduled index time, the indexing process will fail. Be sure that you select a time of day when your web server is available.
The index schedule only applies to your live index; you cannot schedule staged indexes.
To set the incremental index schedule for a live website
  1. On the product menu, click Index > Incremental Index > Live Schedule .
  2. On the In the Incremental Index Schedule page, in the Incrementally Index drop-down list, select the indexing frequency in hours or minutes.
  3. In the Base Time drop-down list, select the starting time when you want to regenerate a new incremental index.
  4. Click Save Changes .

Running an incremental index of a live or staged website

You can use Incremental Index to index "pieces" of your live or staged website, such as a collection of frequently changed pages.
To run an incremental index of a live or staged website
  1. On the product menu, do one of the following:
    • Click Index > Incremental Index > Live Index .
    • Click Index > Incremental Index > Staged Index .
  2. Click Incremental Index Now .
  3. (Optional) If indexing errors occurred, click View Errors to view the associated log.

Viewing the incremental index log of a live or staged website

When a live incremental index or a staged incremental index is complete, you can view its associated log to troubleshoot any errors that occurred.
You cannot export logs, nor save them. The log remains available for viewing until the new index occurs.
To view the incremental index log of a live or staged website
  1. On the product menu, do one of the following:
    • Click Index > Incremental Index > Live Log .
    • Click Index > Incremental Index > Staged Log .
  2. On the log page, at the top or bottom, do any of the following:
    • Use the navigation options First , Prev , Next , Last , or Go to line to move through the log.
    • Use the display options Errors only , Wrap line , or Show to refine what you see.