XML Decoder Groups
The processing of XML files as log sources to define decoders for extracting data from the XML file.
Defining XML decoder groups for XML log sources requires knowledge of the XML file's structure and contents, the data to be extracted, and the fields in which that data is stored. This section provides basic descriptions of the parameters that you can specify for decoders. The manner in which you use any decoder depends on the XML file that contains your source data.
For information about format requirements for XML log sources, see Log Sources . For assistance with defining XML decoders, contact Adobe.
The top level of an XML decoder is a decoder group (XMLDecoderGroup), which is a set of decoder tables that you use to extract data from an XML file of a particular format. If you have XML files of different formats, then you must define a decoder group for each format. Each decoder group consists of one or more decoder tables.
The following table describes the Tables parameter and all of the sub-parameters that you must specify to define an XML decoder group.
Each table in a decoder group represents one level of data to be extracted from the XML file. For example, if you want to extract data about visitors, then you would create a decoder table that consists of the information you want to extract for each visitor. You also can create decoder tables within decoder tables (see Children).
To add a table to a decoder group
The extended fields (for example, x-trackingid, x-email) in which the data is stored. The data to be stored in the field is determined by the Path and/or Operation subfields.
The Path is the field's level within the structured XML file. A field's path is relative to the path of the table in which it is defined. Examples include tag.tag.tag or tag.tag.tag.@attribute . Note that paths are case-sensitive.
An Operation is applied to each line in the specified path to produce an output. The following operations are available:
To add a field to a decoder table
The level within the structured XML file for which the decoder table contains information. For a child XML decoder table, the path is relative to the parent table's path. Note that paths are case-sensitive.
For example, if your XML file contains the structure:<logdata>
<visitor> ... </visitor> </logdata> </code> <p> then the path would be <span class="filepath"> logdata.visitor </span>. </p> </td>
The value of this parameter should always be "Log Entry."
Note: Do not change this value without consulting Adobe.
Optional. One or more embedded decoder tables. Each child includes the Fields, Path, and Table parameters described above.
To add a child to a decoder table
- Right-click Children and click Add new > XMLDecoderTable . Define Field, Operation and Path as appropriate.
To use an XML file as a log source for a dataset, XML decoder groups and tables must be defined to extract the information that is to be processed into the dataset. In this example, you can see how to define decoder groups and tables for a sample XML log source for a web dataset.
The following XML file contains information about a website visitor, including a Experience Cloud ID, email address, physical address, and information about the visitor's page views.
Since we have a single XML file, we need only one decoder group, which we name "Sample XML Format." This decoder group applies to any other XML files of the same format as this file. To begin constructing XML decoder tables within this decoder group, we must first determine what information we want to extract and the fields in which the data will be stored.
In this example, we extract information about the visitor and the page views associated with that visitor. To do this, we create a top-level (parent) XML decoder table with information about the visitor and an embedded (child) XML decoder table with information about that visitor's page views.
Information for the parent (visitor) table is as follows
- A data type identifier for each row of data in the XML file. We use VISITOR as our identifier so that we can quickly identify rows of data pertaining to the visitor and not to the page views. We can store this value in the x-rowtype field.
- The visitor's ID, which we store in the x-trackingid field.
- The visitor's email address (contact.email), which we store in the x-email field.
- The visitor's registration status. If the visitor is a registered user, then we can store the value "1" in the x-is-registered field.
- The Path value is logdata.visitor, and the Table value is Log Entry. For information about these parameters, see the XMLDecoderGroup table above.
Information for the child (page views) table is as follows:
- A data type identifier for each row of data in the XML file. We use "PAGEVIEW" as our identifier so that we can quickly identify rows of data pertaining to the visitor's page views and not to the visitor only. We store this value in the x-rowtype field.
- The visitor's ID. This value is inherited from the parent table and is stored in the x-trackingid field.
- The timestamp of each page view, which is stored in the x-event-time field.
- The URI of each page view, which is stored in the cs-uri-stem field.
- The Path value is pageview, and the Table value is "Log Entry." For information about these parameters, see the XMLDecoderGroup table above.
The following screen capture shows a portion of Log Processing Dataset Include file with the resulting XML decoder group for the sample XML file based on the discussed structure of the parent and child XML decoder tables.
A table showing the output of this decoder for our sample XML file looks something like the following:
You can create a table like the one above in data workbench by using a field viewer interface. For information about the field viewer interface, see Dataset Configuration Tools .
Using #value on XML element to read its attribute value
You can now use the #value tag in XML paths to pull the value of an XML element.
For example, previously specifying a path of <Hit><Page name="Home Page" index="20">home.html</Page></Hit> left you unable to read the value of the <Page> tag. To read the value of a <Page> tag and its attributes, you can use Hit.Page.@name and Hit.Page.@index respectively. You can also pull the value of the tag using Hit.Page.#value expression.
For example, you can read the value of tag <varValue> by adding following field in decoder:
7 = XMLDecoderField: Field = string: x-varvalue-name-added Operation = string: LAST Path = string: <b>#value</b> Path = string: varValue Table = string: Log Entry
Similarly, you can read the value of tag <Rep> by adding following field in decoder:
7 = XMLDecoderField: Field = string: x-rep-name-added Operation = string: LAST Path = string: Rep.# <b>value</b> Path = string: Reps Table = string: Log Entry
In contrast, to read the value of element tag with no attribute, a <text> tag under a <line> tag and its value can be read directly by giving " text" in a path or using line.text, depending on how you have built the decoder.
2 = XMLDecoderField: Field = string: x-chat-text Operation = string: LAST Path = string: <b>text</b> Path = string: <b>line</b> Table = string: Log Entry