Show Menu
TOPICS×

Query dataset data using Data Access API

This document provides a step-by-step tutorial that covers how to locate, access, and download data stored within a dataset using the Data Access API in Adobe Experience Platform. You will also be introduced to some of the unique features of the Data Access API, such as paging and partial downloads.

Getting started

This tutorial working understanding of how to create and populate a dataset. See the dataset creation tutorial for more information.
The following sections provide additional information that you will need to know in order to successfully make calls to the Platform APIs.

Reading sample API calls

This tutorial provides example API calls to demonstrate how to format your requests. These include paths, required headers, and properly formatted request payloads. Sample JSON returned in API responses is also provided. For information on the conventions used in documentation for sample API calls, see the section on how to read example API calls in the Experience Platform troubleshooting guide.

Gather values for required headers

In order to make calls to Platform APIs, you must first complete the authentication tutorial . Completing the authentication tutorial provides the values for each of the required headers in all Experience Platform API calls, as shown below:
  • Authorization: Bearer {ACCESS_TOKEN}
  • x-api-key: {API_KEY}
  • x-gw-ims-org-id: {IMS_ORG}
All resources in Experience Platform are isolated to specific virtual sandboxes. All requests to Platform APIs require a header that specifies the name of the sandbox the operation will take place in:
  • x-sandbox-name: {SANDBOX_NAME}
For more information on sandboxes in Platform, see the sandbox overview documentation .
All requests that contain a payload (POST, PUT, PATCH) require an additional header:
  • Content-Type: application/json

Sequence diagram

This tutorial follows the steps outlined in the sequence diagram below, highlighting the core functionality of the Data Access API.
The Catalog API allows you to retrieve information regarding batches and files. The Data Access API allows you to access and download these files over HTTP as either full or partial downloads, depending on the size of the file.

Locate the data

Before you can begin to use the Data Access API, you need to identify the location of the data that you want to access. In the Catalog API, there are two endpoints that you can use to browse an organization's metadata and retrieve the ID of a batch or file that you want to access:
  • GET /batches : Returns a list of batches under your organization
  • GET /dataSetFiles : Returns a list of files under your organization
For a comprehensive list of endpoints in the Catalog API, please refer to the API Reference .

Retrieve a list of batches under your IMS Organization

Using the Catalog API, you can return a list of batches under your organization:
API format
GET /batches

Request
curl -X GET 'https://platform.adobe.io/data/foundation/catalog/batches/' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response
The response includes an object that lists of all of the batches related to the IMS Organization, with each top-level value representing a batch. The individual batch objects contain the details for that specific batch. The response below has been minimized for space.
{
    "{BATCH_ID_1}": {
        "imsOrg": "{IMS_ORG}",
        "created": 1516640135526,
        "createdClient": "{CREATED_CLIENT}",
        "createdUser": "{CREATED_BY}",
        "updatedUser": "{CREATED_BY}",
        "updated": 1516640135526,
        "status": "processing",
        "version": "1.0.0",
        "availableDates": {}
    },
    "{BATCH_ID_2}": {
    ...
    }
}

Filter the list of batches

Filters are often required to find a particular batch in order to retrieve relevant data for a particular use case. Parameters can be added to a GET /batches request in order to filter the returned response. The request below will return all batches created after a specified time, within a particular data set, sorted by when they were created.
API format
GET /batches?createdAfter={START_TIMESTAMP}&dataSet={DATASET_ID}&sort={SORT_BY}

Property
Description
{START_TIMESTAMP}
The start timestamp in milliseconds (for example, 1514836799000).
{DATASET_ID}
The dataset identifier.
{SORT_BY}
Sorts the response by the value provided. For example, desc:created sorts the objects by creation date in descending order.
Request
curl -X GET 'https://platform.adobe.io/data/foundation/catalog/batches?createdAfter=1521053542579&dataSet=5cd9146b21dae914b71f654f&orderBy=desc:created' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response
{   "{BATCH_ID_3}": {
        "imsOrg": "{IMS_ORG}",
        "relatedObjects": [
            {
                "id": "5c01a91863540f14cd3d0439",
                "type": "dataSet"
            },
            {
                "id": "00998255b4a148a2bfd4804c2f327324",
                "type": "batch"
            }
        ],
        "status": "success",
        "metrics": {
            "recordsFailed": 0,
            "recordsWritten": 2,
            "startTime": 1550791835809,
            "endTime": 1550791994636
        },
        "errors": [],
        "created": 1550791457173,
        "createdClient": "{CLIENT_CREATED}",
        "createdUser": "{CREATED_BY}",
        "updatedUser": "{CREATED_BY}",
        "updated": 1550792060301,
        "version": "1.0.116"
    },
    "{BATCH_ID_4}": {
        "imsOrg": "{IMS_ORG}",
        "status": "success",
        "relatedObjects": [
            {
                "type": "batch",
                "id": "00aff31a9ae84a169d69b886cc63c063"
            },
            {
                "type": "dataSet",
                "id": "5bfde8c5905c5a000082857d"
            }
        ],
        "metrics": {
            "startTime": 1544571333876,
            "endTime": 1544571358291,
            "recordsRead": 4,
            "recordsWritten": 4
        },
        "errors": [],
        "created": 1544571077325,
        "createdClient": "{CLIENT_CREATED}",
        "createdUser": "{CREATED_BY}",
        "updatedUser": "{CREATED_BY}",
        "updated": 1544571368776,
        "version": "1.0.3"
    }
}

A full list of parameters and filters can be found in the Catalog API reference .

Retrieve a list of all files belonging to a particular batch

Now that you have the ID of the batch that you want to access, you can use the Data Access API to get a list of files belonging to that batch.
API format
GET /batches/{BATCH_ID}/files

Property
Description
{BATCH_ID}
Batch identifier of the batch that you are trying to access.
Request
curl -X GET 'https://platform.adobe.io/data/foundation/export/batches/5c6f332168966814cd81d3d3/files' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response
{
    "data": [
        {
            "dataSetFileId": "8dcedb36-1cb2-4496-9a38-7b2041114b56-1",
            "dataSetViewId": "5cc6a9b60d4a5914b7940a7f",
            "version": "1.0.0",
            "created": "1558522305708",
            "updated": "1558522305708",
            "isValid": false,
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io:443/data/foundation/export/files/8dcedb36-1cb2-4496-9a38-7b2041114b56-1"
                }
            }
        }
    ],
    "_page": {
        "limit": 100,
        "count": 1
    }
}
}

Property
Description
data._links.self.href
The URL to access this file.
The response contains a data array that lists all the files within the specified batch. Files are referenced by their file ID, which is found under the dataSetFileId field.

Access a file using a file ID

Once you have a unique file ID, you can use the Data Access API to access the specific details about the file, including its name, size in bytes, and a link to download it.
API format
GET /files/{FILE_ID}

Property
Description
{FILE_ID}
The identifier of the file you want to access.
Request
curl -X GET 'https://platform.adobe.io/data/foundation/export/files/8dcedb36-1cb2-4496-9a38-7b2041114b56-1' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Depending on whether the file ID points to an individual file or a directory, the data array returned may contain a single entry or a list of files belonging to that directory. Each file element will contain details such as the file's name, size in bytes, and a link to download the file.
Case 1: File ID points to a single file
Response
{
    "data": [
        {
            "name": "{FILE_NAME}.parquet",
            "length": "249058",
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/files/{FILE_ID_1}?path={FILE_NAME_1}.parquet"
                }
            }
        }
    ],
    "_page": {
        "limit": 100,
        "count": 1
    }
}

Property
Description
{FILE_NAME}.parquet
The name of the file.
_links.self.href
The URL to download the file.
Case 2: File ID points to a directory
Response
{
    "data": [
        {
            "dataSetFileId": "{FILE_ID_2}",
            "dataSetViewId": "460590b01ba38afd1",
            "version": "1.0.0",
            "created": "150151267347",
            "updated": "150151267347",
            "isValid": true,
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/files/{FILE_ID_2}"
                }
            }
        },
        {
            "dataSetFileId": "{FILE_ID_3}",
            "dataSetViewId": "460590b01ba38afd1",
            "version": "1.0.0",
            "created": "150151267685",
            "updated": "150151267685",
            "isValid": true,
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/files/{FILE_ID_3}"
                }
            }
        }
    ],
    "_page": {
        "limit": 100,
        "count": 2
    }
}

Property
Description
data._links.self.href
The URL to download the associated file.
This response returns a directory containing two separate files, with IDs {FILE_ID_2} and {FILE_ID_3} . In this scenario, you will need to follow the URL of each file in order to access the file.

Retrieve the metadata of a file

You can retrieve the metadata of a file by making a HEAD request. This returns the file's metadata headers, including its size in bytes and file format.
API format
HEAD /files/{FILE_ID}?path={FILE_NAME}

Property
Description
{FILE_ID}
The file's identifier.
{FILE_NAME }
The file name (for example, profiles.parquet)
Request
curl -I 'https://platform.adobe.io/data/foundation/export/files/8dcedb36-1cb2-4496-9a38-7b2041114b56-1?path=profiles.parquet' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response
The response headers contain the metadata of the queried file, including:
  • Content-Length : Indicates the size of the payload in bytes
  • Content-Type : Indicates the type of file.

Access the contents of a file

You can also access the contents of a file using the Data Access API.
API format
GET /files/{FILE_ID}?path={FILE_NAME}

Property
Description
{FILE_ID}
The file's identifier.
{FILE_NAME }
The file name (for example, profiles.parquet).
Request
curl -X GET 'https://platform.adobe.io/data/foundation/export/files/8dcedb36-1cb2-4496-9a38-7b2041114b56-1?path=profiles.parquet' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response
A successful response returns the file's contents.

Download partial contents of a file

The Data Access API allows for downloading files in chunks. A range header can be specified during a GET /files/{FILE_ID} request to download a specific range of bytes from a file. If the range is not specified, the API will download the entire file by default.
The HEAD example in the previous section gives the size of a specific file in bytes.
API format
GET /files/{FILE_ID}?path={FILE_NAME}

Property
Description
{FILE_ID}
The file's identifier.
{FILE_NAME}
The file name (for example, profiles.parquet)
Request
curl -X GET 'https://platform.adobe.io/data/foundation/export/files/8dcedb36-1cb2-4496-9a38-7b2041114b56-1?path=profiles.parquet' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}' \
  -H 'Range: bytes=0-99'

Property
Description
Range: bytes=0-99
Specifies the range of bytes to download. If this is not specified, the API will download the entire file. In this example, the first 100 bytes will be downloaded.
Response
The response body includes the first 100 bytes of the file (as specified by the "Range" header in the request) along with HTTP Status 206 (Partial Contents). The response also includes the following headers:
  • Content-Length: 100 (the number of bytes returned)
  • Content-type: application/parquet (a parquet file was requested, therefore the response content type is parquet)
  • Content-Range: bytes 0-99/249058 (the range requested (0-99) out of the total number of bytes (249058))

Configure API response pagination

Responses within the Data Access API are paginated. By default, the maximum number of entries per page is 100. Paging parameters can be used to modify the default behavior.
  • limit : You can specify the number of entries per page according to your requirements using the "limit" parameter.
  • start : The offset can be set by the "start" query parameter.
  • & : You can use an ampersand to combine multiple parameters in a single call.
API format
GET /batches/{BATCH_ID}/files?start={OFFSET}
GET /batches/{BATCH_ID}/files?limit={LIMIT}
GET /batches/{BATCH_ID}/files?start={OFFSET}&limit={LIMIT}

Property
Description
{BATCH_ID}
Batch identifier of the batch that you are trying to access.
{OFFSET}
The specified index to start the result array (for example, start=0)
{LIMIT}
Controls how many results gets returned in the result array (for example, limit=1)
Request
curl -X GET 'https://platform.adobe.io/data/foundation/export/batches/5c102cac7c7ebc14cd6b098e/files?start=0&limit=1' \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

Response :
The response contains a "data" array with a single element, as specified by the request parameter limit=1 . This element is an object containing the details of the first available file, as specified by the start=0 parameter in the request (remember that in zero-based numbering, the first element is "0").
The _links.next.href value contains the link to the next page of responses, where you can see that the start parameter has advanced to start=1 .
{
    "data": [
        {
            "dataSetFileId": "{FILE_ID_1}",
            "dataSetViewId": "5a9f264c2aa0cf01da4d82fa",
            "version": "1.0.0",
            "created": "1521053793635",
            "updated": "1521053793635",
            "isValid": false,
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/files/{FILE_ID_1}"
                }
            }
        }
    ],
    "_page": {
        "limit": 1,
        "count": 6
    },
    "_links": {
        "next": {
            "href": "https://platform.adobe.io/data/foundation/export/batches/5c102cac7c7ebc14cd6b098e/files?start=1&limit=1"
        },
        "page": {
            "href": "https://platform.adobe.io/data/foundation/export/batches/5c102cac7c7ebc14cd6b098e/files?start=0&limit=1",
            "templated": true
        }
    }
}