Show Menu
TOPICS×

Privacy request processing in the Data Lake

Adobe Experience Platform Privacy Service processes customer requests to access, opt out of sale, or delete their personal data as delineated by legal and organizational privacy regulations.
This document covers essential concepts related to processing privacy requests for customer data stored in the Data Lake.

Getting started

It is recommended that you have a working understanding of the following Experience Platform services before reading this guide:
  • Privacy Service : Manages customer requests for accessing, opting out of sale, or deleting their personal data across Adobe Experience Cloud applications.
  • Catalog Service : The system of record for data location and lineage within Experience Platform. Provides an API that can be used to update dataset metadata.
  • Experience Data Model (XDM) System : The standardized framework by which Experience Platform organizes customer experience data.
  • Identity Service : Solves the fundamental challenge posed by the fragmentation of customer experience data by bridging identities across devices and systems.

Understanding identity namespaces

Adobe Experience Platform Identity Service bridges customer identity data across systems and devices. Identity Service uses identity namespaces to provide context to identity values by relating them to their system of origin. A namespace can represent a generic concept such as an email address ("Email") or associate the identity with a specific application, such as an Adobe Advertising Cloud ID ("AdCloud") or Adobe Target ID ("TNTID").
Identity Service maintains a store of globally defined (standard) and user-defined (custom) identity namespaces. Standard namespaces are available for all organizations (for example, "Email" and "ECID"), while your organization can also create custom namespaces to suit its particular needs.
For more information about identity namespaces in Experience Platform, see the identity namespace overview .

Adding identity data to datasets

When creating privacy requests for the Data Lake, valid identity values (and their associated namespaces) must be provided for each individual customer in order to locate their data and process it accordingly. Therefore, all datasets that are subject to privacy requests must contain an identity descriptor in their associated XDM schema.
Any datasets based on schemas that do not support identity descriptor metadata (such as ad-hoc datasets) currently cannot be processed in privacy requests.
This section walks through the steps of adding an identity descriptor to an existing dataset's XDM schema. If you already have a dataset with an identity descriptor, you can skip ahead to the next section .
When deciding which schema fields to set as identities, keep in mind the limitations of using nested map-type fields .
There are two methods of adding an identity descriptor to a dataset schema:

Using the UI

In the Experience Platformuser interface, the Schemas workspace allows you to edit your existing XDM schemas. To add an identity descriptor to a schema, select the schema from the list and follow the steps for setting a schema field as an identity field in the Schema Editor tutorial.
Once you have set the appropriate fields within the schema as identity fields, you can proceed to the next section on submitting privacy requests .

Using the API

This section assumes you know the unique URI ID value of your dataset's XDM schema. If you do not know this value, you can retrieve it by using the Catalog Service API. After reading the getting started section of the developer guide, follow the steps outlined in for listing or looking up Catalog objects to find your dataset. The schema ID can be found under schemaRef.id
This section includes calls to the Schema Registry API. For important information related to using the API, including knowing your {TENANT_ID} and the concept of containers, see the getting started section of the developer guide.
You can add an identity descriptor to a dataset's XDM schema by making a POST request to the /descriptors endpoint in the Schema Registry API.
API format
POST /descriptors

Request
The following request defines an identity descriptor on an "email address" field in a sample schema.
curl -X POST \
  https://platform.adobe.io/data/foundation/schemaregistry/tenant/descriptors \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}' \
  -d '
      {
        "@type": "xdm:descriptorIdentity",
        "xdm:sourceSchema": "https://ns.adobe.com/{TENANT_ID}/schemas/fbc52b243d04b5d4f41eaa72a8ba58be",
        "xdm:sourceVersion": 1,
        "xdm:sourceProperty": "/personalEmail/address",
        "xdm:namespace": "Email",
        "xdm:property": "xdm:code",
        "xdm:isPrimary": false
      }'

Property
Description
@type
The type of descriptor being created. For identity descriptors, the value must be "xdm:descriptorIdentity".
xdm:sourceSchema
The unique URI ID of your dataset's XDM schema.
xdm:sourceVersion
The version of the XDM schema specified in xdm:sourceSchema .
xdm:sourceProperty
The path to the schema field that the descriptor is being applied to.
xdm:namespace
One of the standard identity namespaces recognized by Privacy Service, or a custom namespace defined by your organization.
xdm:property
Either "xdm:id" or "xdm:code", depending on the namespace being used under xdm:namespace .
xdm:isPrimary
An optional boolean value. When true, this indicates that the field is a primary identity. Schemas may contain only one primary identity. Defaults to false if not included.
Response
A successful response returns HTTP status 201 (Created) and the details of the newly created descriptor.
{
  "@type": "xdm:descriptorIdentity",
  "xdm:sourceSchema": "https://ns.adobe.com/{TENANT_ID}/schemas/fbc52b243d04b5d4f41eaa72a8ba58be",
  "xdm:sourceVersion": 1,
  "xdm:sourceProperty": "/personalEmail/address",
  "xdm:namespace": "Email",
  "xdm:property": "xdm:code",
  "xdm:isPrimary": false,
  "meta:containerId": "tenant",
  "@id": "f3a1dfa38a4871cf4442a33074c1f9406a593407"
}

Submitting requests

This section covers how to format privacy requests for the Data Lake. It is strongly recommended that you review the Privacy Service UI or Privacy Service API documentation for complete steps on how to submit a privacy job, including how to properly format submitted user identity data in request payloads.
The following section outlines how to make privacy requests for the Data Lake using the Privacy Service UI or API.

Using the UI

When creating job requests in the UI, be sure to select AEP Data Lake and/or Profile under Products in order to process jobs for data stored in the Data Lake or Real-time Customer Profile, respectively.

Using the API

When creating job requests in the API, any userIDs that are provided must use a specific namespace and type depending on the data store they apply to. IDs for the Data Lake must use "unregistered" for their type value, and a namespace value that matches one the privacy labels that have been added to applicable datasets.
In addition, the include array of the request payload must include the product values for the different data stores the request is being made to. When making requests to the Data Lake, the array must include the value aepDataLake .
The following request creates a new privacy job for the Data Lake, using the unregistered "email_label" namespace. It also includes the product value for the Data Lake in the include array:
curl -X POST \
  https://platform.adobe.io/data/core/privacy/jobs \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'Content-Type: application/json' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {IMS_ORG}' \
  -d '{
    "companyContexts": [
      {
        "namespace": "imsOrgID",
        "value": "{IMS_ORG}"
      }
    ],
    "users": [
      {
        "key": "user12345",
        "action": ["access","delete"],
        "userIDs": [
          {
            "namespace": "email_label",
            "value": "ajones@acme.com",
            "type": "unregistered"
          },
          {
            "namespace": "email_label",
            "value": "jdoe@example.com",
            "type": "unregistered"
          }
        ]
      }
    ],
    "include": ["aepDataLake"],
    "expandIds": false,
    "priority": "normal",
    "regulation": "ccpa"
}'

Delete request processing

When Experience Platform receives a delete request from Privacy Service, Platform sends confirmation to Privacy Service that the request has been received and affected data has been marked for deletion. The records are then removed from the Data Lake within seven days. During that seven-day window, the data is soft-deleted and is therefore not accessible by any Platform service.
In future releases, Platform will send confirmation to Privacy Service after data has been physically deleted.

Next steps

By reading this document, you have been introduced to the important concepts involved with processing privacy requests for the Data Lake. It is recommended that you continue reading the documentation provided throughout this guide in order to deepen your understanding of how to manage identity data and create privacy jobs.
See the document on privacy request processing for Real-time Customer Profile for steps on processing privacy requests for the Profile store.

Appendix

The following section contains additional information for processing privacy requests in the Data Lake.

Labeling nested map-type fields

It is important to note that there are two kinds of nested map-type fields that do not support privacy labeling:
  • A map-type field within an array-type field
  • A map-type field within another map-type field
Privacy job processing for either of the two examples above will eventually fail. For this reason, it is recommended that you avoid using nested map-type fields to store private customer data. Relevant consumer IDs should be stored as a non-map datatype within the identityMap field (itself a map-type field) for record-based datasets, or the endUserID field for time-series-based datasets.