Show Menu
TOPICS×

About Scripted Index

With Scripted Index you can write, update, and maintain incremental indexing options without the need to log in. The search robot reads instructions from a text file that is hosted on your server.

Using Scripted Index

About configuring scripted incremental indexing

To use Scripted Index, you use the Scripted Incremental Index Configuration page to specify the URL to a script file (a plain text file) that is located on your server. For example, https://www.mysite.com/indexlist.txt . As your site changes, you can add command blocks to the text file either manually or automatically (with a script triggered by the arrival of information from a news feed, stock ticker, or other altered file).
When the scripted incremental index begins, the search robot reads the text file and runs the new commands that are found in that file. By default, the search robot processes only the new commands, which are determined by the file date. Unless you check Clear Date at the time you configure Scripted Index, the search robot "remembers" the date-specifier of the most recently processed block.

About the script file

The script file that you specify in the URL is a plain text file that is located on your server. You can use carriage returns, line feeds, or both for the end-of-line sequence. A blank line contains zero or more white space characters followed by an end-of-line sequence. All commands are case-insensitive.
The text file is organized in blocks that describe the information that the search robot uses when it performs a scripted incremental index.
Blocks are ordered by date, with the oldest blocks at the top of the text file, and the most recent blocks at the bottom. Each block begins with a single line date-command and a date-specifier command, and ends with a blank-line separator as in the following block example (in between are several commands):
A leading zero is required for all ordinal dates lower than the 10th when using the HTTP 1.1 style. For example, November 6th is 06 Nov, not 6 Nov.
Command
Description
date-command
The first line of each block starts with one of two date commands:
  • date
    Use the "date" command to indicate that the date-specifier will consist of a day, date, time, and time zone.
  • seconds
    Use seconds to indicate that the date-specifier will consist of a time in epoch seconds (for example, 784111777). When using seconds , make sure that the number of seconds increases between blocks.
date-specifier
The date-specifier command typically records either the ordinal date and time (date command) or the time in epoch seconds (seconds command) that the block information was added to the file. For example:
date Sun, 06 Nov 1994 08:49:37 GMT (HTTP 1.1 style) date Sunday, 06-Nov-94 08:49:37 GMT (HTTP 1.0 style) date Sun Nov 6 08:49:37 1994 (Unix asctime() date style) seconds 784111777 (Unix epoch-seconds style)
A leading zero is required for all ordinal dates lower than the 10th when using the HTTP 1.1 style. For example, November 6th is 06 Nov, not 6 Nov.
The search robot "remembers" the date-specifier of the most recently processed block and only indexes information that it considers to be "newer." (Real-time does not matter to the search robot. Instead, the time in relation to other previously processed times is what matters.)
After the search robot reads a block with a date-specifier of 10:00 p.m, for example, it does not read any blocks that record times before 10:00 p.m., regardless of when the index operation runs. In a worst-case scenario, you might mistakenly enter the year "2040" instead of "2004" in your date-specifier. In such an instance, the search robot indexes the 2040 block during the next indexing operation and then refuses to read any other blocks of information (unless one post-dates 2040). If this should happen, remove all previously processed blocks from the text file, click Clear Date , and then push it live.
comment line
Begin comment lines with the "#" character.
Each comment line must be a line of its own; you cannot type end-of-line comments.
A comment line is not considered a blank line. It can also appear anywhere in a block, even before a date or seconds command as in the following example:
    #Added by Cathy Read after the Y2K seminar     date Mon, 29 Dec 1999 09:32:20 GMT 
action-command
Each text block can contain as many action commands as you want. The following action-command options correspond to those for standard incremental indexing:
  • add
    Use with URL. The search robot only indexes the specified URLs that have changed since your last indexing operation. Additionally, the search robot follows links that are contained within specified documents and indexes only those documents that have changed.
    You can follow the URL with nofollow or noindex keywords as in the following example:
    add https://www.mydomain.com/ noindex
  • update
    Use with URL mask. The search robot finds and updates all documents that match the specified URL mask.
    You can follow the URL with nofollow or noindex keywords as in the following example:
    update https://www.mydomain.com/products/
  • include or exclude
    Use with URL mask. The search robot finds and indexes ("include") or ignores ("exclude") documents based on the type of mask specified.
    For example,
    include https://www.mydomain.com/products/household/lightbulbs*.html
    or
    exclude https://www.mydomain.com/archive/
  • include-date or exclude-date
    Use with URL mask. The search robot finds and indexes ("include") or ignores ("exclude") documents based on the both the URL and the date of documents. The following types of masks are available:
    • include-days NNN
      The search robot indexes all documents that match the specified URL mask and are NNN days or more old.
      You can follow the URL mask with the keywords nofollow , noindex , and/or server-date .
    • include-date YYYY-MM-DD
      The search robot indexes all documents that match the specified URL mask and are as old or older than the date YYYY-MM-DD, where "YYYY" is the 4 digit year, "MM" is the one- or two-digit month (1-12), and "DD" is the one- or two-digit day (1-31).
      You can follow the URL mask with the keywords nofollow , noindex , and/or server-date .
    • exclude-days NNN
      Disables indexing of all documents that match the specified URL mask and are NNN days or more old.
      You can follow the URL mask with the keyword server-date .
    • exclude-date YYYY-MM-DD
      Disables indexing of all documents that match the specified URL mask and are as old or older than the date YYYY-MM-DD.
      You can follow the URL mask with the keyword server-date .
  • delete
    Specify URLs. The search robot removes documents from the index that are identified by the URL.
  • deletemask
    The search robot removes documents from the index that match the specified URL mask.
See also About URL Masks .

Script file example

In the following script file example, the search robot processes the blocks provided that the date-specifiers post-date the date-specifier of the most recently processed block. If that is the case, then the following indexing operations occur:
  • Deletes y2k-problems.html from the index.
  • Adds no-y2k-problems.html to the search index and does not follow any of the links for no-y2k-problems.html .
  • While crawling, exclude URLs that match housewares.htm and lightfixtures.htm l from the search index.
  • Include all other directories and documents under www.mydomain.com .
  • Update all documents within the products and information directories, crawling and indexing all subsidiary links that have changed since the last indexing operation.
  • While crawling, exclude URLs in the archive section of the website if they are dated on or before January 1, 1999.
  • Exclude URLs that match housewares.html and lightfixtures.html from the search index.
  • Index files in the help directory, but do not crawl or index any links from those files.
  • Crawl and index any other files encountered for www.mydomain.com .
# Start of file. 
# Added by John Smith 
date Sat, 01 Jan 2004 16:05:53 PST 
exclude https://www.mydomain.com/housewares.html 
exclude https://www.mydomain.com/lightfixtures.html 
include https://www.mydomain.com/ 
delete https://www.mydomain.com/y2k-problems.html 
add https://www.mydomain.com/no-y2k-problems.html nofollow 
 
date Sun, 02 Jan 2004 20:19:08 PST 
# Added by the wire service updater 
exclude-date 1999-01-01 https://www.mydomain.com/archive server-date 
exclude https://www.mydomain.com/housewares.html 
exclude https://www.mydomain.com/lightfixtures.html 
include https://www.mydomain.com/help/ nofollow 
include https://www.mydomain.com/ 
# no add files, just update existing files 
# update all files in the "products" directory 
update https://www.mydomain.com/products/ 
# update all files in the "information" directory 
update regexp ^https://www\.mydomain\.com/information/.*$ 
# End of file.

Configuring a scripted incremental index

You can specify a script that you have created that writes, updates, and maintains an incremental index, without the need to log in. The search robot reads instructions from the text file that is hosted on your server to perform the incremental index.
To configure a scripted incremental index
  1. On the product menu, click Index > Scripted Index > Configuration .
  2. On the Scripted Incremental Index Configuration page, in the Script File URL , enter the URL to the text file script that is located on your server.
  3. (Optional) Check Clear Date if you do not want the search robot to "remember" the date-specifier of the most recently processed block.
    By default, the search robot processes only new blocks of commands that are found in the text file, which is determined by the file's date. If you do not want the default, check Clear Date .
  4. Click Save Changes .
  5. (Optional) Do one of the following:

Setting the scripted incremental index schedule for a live website

You can schedule scripted incremental indexing to occur at regular intervals throughout the day.
The base time that you select is local according to the time zone that is configured in Account Settings.
Web servers are often scheduled to go down for maintenance in the middle of the night. If your server is down during a scheduled index time, the indexing process will fail. Be sure that you select a time of day when your web server is available.
The index schedule only applies to your live index; you cannot schedule staged incremental indexes.
To set the scripted incremental index schedule for a live website
  1. On the product menu, click Index > Scripted Index > Live Schedule .
  2. On the Scripted Incremental Index Schedule page, in the Read the Scripted Incrementally Indexing File drop-down list, select the frequency that you want the scripted incremental index text file to run, in hours or minutes.
  3. In the Base Time drop-down list, select the starting time when you want to regenerate a new scripted incremental index.
  4. Click Save Changes .

Running a scripted incremental index of a live or staged website

You can use Scripted Incremental Index to index "pieces" of your live or staged website, such as a collection of frequently changed pages, all without the need to log in.
To use this feature, be sure that you have configure a scripted incremental index text file.
To run a scripted incremental index of a live or staged website
  1. On the product menu, do one of the following:
    • Click Index > Scripted Index > Live Index .
    • Click Index > Scripted Index > Staged Index .
  2. Click Scripted Index Now .
  3. (Optional) If indexing errors occurred, click View Errors to view the associated log.

Viewing the scripted incremental index log of a live or staged website

When a live full scripted index or a staged full scripted index is complete, you can view its associated log to troubleshoot any errors that occurred.
You cannot export logs, nor save them. However, the log remains available for viewing until the new index occurs.
To view the incremental index log of a live or staged website
  1. On the product menu, do one of the following:
    • Click Index > Scripted Index > Live Log .
    • Click Index > Scripted Index > Staged Log .
  2. On the log page, at the top or bottom, do any of the following:
    • Use the navigation options First , Prev , Next , Last , or Go to line to move through the log.
    • Use the display options Errors only , Wrap line , or Show to refine what you see.