Understanding Dataset Configuration
Dataset configuration refers to the process of editing the configuration files whose parameters provide the rules for dataset construction.
The constructed dataset physically resides in the temp.db file stored on the data workbench server computer, but the configuration files for the dataset reside within a directory for a profile. A profile contains a set of configuration files that construct a dataset (including its extended dimensions) for a specific analysis purpose. In addition, a profile contains the definitions of entities such as metrics, derived dimensions, workspaces, reports, and visualizations that enable analysts to interact with the dataset and obtain information from it.
The profile whose dataset configuration files you are editing is referred to as your dataset profile. A dataset profile references multiple inherited profiles, which can be any profiles that you create and maintain so that you can configure your Adobe application to best fit your analysis needs. A dataset profile also may reference internal profiles that are provided with your Adobe application to form the basis for all of the functionality available in your application.
For more information about the different types of profiles that are available with Adobe applications, see the Data Workbench User Guide .
A dataset profile for any Adobe application must contain the following configuration files on the Insight Server machine:
- Profile.cfg: Lists the inherited profiles and processing servers for the profile. Processing servers are the Insight Server DPUs that process the data for the profile. If you have installed an Insight Server cluster, you can specify multiple Insight Server computers to run a single profile.For instructions to add inherited profiles to a dataset profile's Profile.cfg file, see the Server Products Installation and Administration Guide . For information about installing an Insight Server cluster or configuring a dataset profile to run on an Insight Server cluster, see the Server Products Installation and Administration Guide .
- Dataset\Transformation.cfg: Controls the transformation phase of the dataset construction process. See Transformation . The Transformation.cfg file typically configures the dataset for profile-specific analysis. For more information about the Transformation.cfg file, see Transformation Configuration File .
- Dataset Include Files: A dataset include file contains a subset of the parameters contained in the Log Processing.cfg or Transformation.cfg file for the dataset profile but is stored and managed within an inherited profile. Dataset include files supplement the main dataset configuration files. For more information, see Dataset Include Files .
The dataset profile provided to you during the implementation of your Adobe application contains a set of dataset configuration files that you can open, edit, and save using the Profile Manager.
For information about the Profile Manager, see the Insight User Guide .
Although not required for all datasets, these files enable you to control other aspects of the dataset construction process:
- Log Processing Mode.cfg: The Log Processing Mode.cfg file lets you pause processing of data into a dataset, specify offline sources, or specify the frequency at which the data workbench server saves its state files. See Additional Configuration Files .
- Server.cfg: The Server.cfg file specifies the default data cache size (in bytes) for data workbench machines that connect to the data workbench server. See Additional Configuration Files .
- Transform.cfg and Transform Mode.cfg: These files are available only if you have licensed the data transformation functionality to use with your Adobe application. The Transform.cfg file contains the parameters that define the log sources and data transformations for transformation functionality. The transformations that you define manipulate the source data and output it into a format that you specify. The Insight Transform Mode.cfg file enables you to pause processing of data into a dataset, specify offline sources, or specify the frequency at which the Insight Server running transformation functionality saves its state files. See Transform Functionality .
For information about specific dataset configuration tasks, use the table below to locate and read about the tasks of interest of you:
|If you would like to...||See...|
Define log sources
Determine which log entries enter the dataset during log processing
Enable the splitting of tracking IDs with large amounts of event data
Configure an Insight Server to run as a file server unit
Configure an Insight Server to run as a centralized normalization server
Set the time zone to be used for creating time dimensions and making time conversions
Make minor changes to the dataset configuration files included with the internal profiles provided by Adobe
Specify new fields of data to be passed from log processing to transformation
Create extended dimensions
Define parameters to use throughout log processing or transformation
Learn about the Insight interfaces that enable you to monitor or manage your dataset
Hide certain extended dimensions so they do not show on the dimension menu in Insight
Override certain dataset configuration files in a profile that you cannot or do not want to modify
Reprocess your dataset