Show Menu
TOPICS×

Platform SDK guide

This tutorial provides you with information on converting data_access_sdk_python to the new Python platform_sdk in both Python and R. This tutorial provides information on the following operations:

Build authentication

Authentication is required to make calls to Adobe Experience Platform, and is comprised of API Key, IMS Org ID, a user token, and a service token.

Python

If you are using Jupyter Notebook, please use the below code to build the client_context :
client_context = PLATFORM_SDK_CLIENT_CONTEXT

If you are not using Jupyter Notebook or you need to change the IMS Org, please use the below code sample:
from platform_sdk.client_context import ClientContext
client_context = ClientContext(api_key={API_KEY},
              org_id={IMS_ORG},
              user_token={USER_TOKEN},
              service_token={SERVICE_TOKEN})

R

If you are using Jupyter Notebook, please use the below code to build the client_context :
library(reticulate)
use_python("/usr/local/bin/ipython")
psdk <- import("platform_sdk")

py_run_file("../.ipython/profile_default/startup/platform_sdk_context.py")
client_context <- py$PLATFORM_SDK_CLIENT_CONTEXT

If you are not using Jupyter Notebook or you need to change the IMS Org, please use the below code sample:
library(reticulate)
use_python("/usr/local/bin/ipython")
psdk <- import("platform_sdk")
client_context <- psdk$client_context$ClientContext(api_key={API_KEY},
              org_id={IMS_ORG},
              user_token={USER_TOKEN},
              service_token={SERVICE_TOKEN})

Basic reading of data

With the new Platform SDK, the maximum read size is 32 GB, with a maximum read time of 10 minutes.
If your read time is taking too long, you can try using one of the following filtering options:
The IMS Org is set within the client_context .

Python

To read data in Python, please use the code sample below:
from platform_sdk.dataset_reader import DatasetReader
dataset_reader = DatasetReader(client_context, "{DATASET_ID}")
df = dataset_reader.limit(100).read()
df.head()

R

To read data in R, please use the code sample below:
DatasetReader <- psdk$dataset_reader$DatasetReader
dataset_reader <- DatasetReader(client_context, "{DATASET_ID}") 
df <- dataset_reader$read() 
df

Filter by offset and limit

Since filtering by batch ID is no longer supported, in order to scope reading of data, you need to use offset and limit .

Python

df = dataset_reader.limit(100).offset(1).read()
df.head

R

df <- dataset_reader$limit(100L)$offset(1L)$read() 
df

Filter by date

Granularity of date filtering is now defined by the timestamp, rather than being set by the day.

Python

df = dataset_reader.where(\
    dataset_reader['timestamp'].gt('2019-04-10 15:00:00').\
    And(dataset_reader['timestamp'].lt('2019-04-10 17:00:00'))\
).read()
df.head()

R

df2 <- dataset_reader$where(
    dataset_reader['timestamp']$gt('2018-12-10 15:00:00')$
    And(dataset_reader['timestamp']$lt('2019-04-10 17:00:00'))
)$read()
df2

The new Platform SDK supports the following operations:
Operation
Function
Equals ( = )
eq()
Greater than ( > )
gt()
Greater than or equal to ( >= )
ge()
Less than ( < )
lt()
Less than or equal to ( <= )
le()
And ( & )
And()
Or ( | )
Or()

Filter by selected columns

To further refine your reading of data, you can also filter by column name.

Python

df = dataset_reader.select(['column-a','column-b']).read()

R

df <- dataset_reader$select(c('column-a','column-b'))$read() 

Get sorted results

Results received can be sorted by specified columns of the target dataset and in their order (asc/desc) respectively.
In the following example, dataframe is sorted by "column-a" first in ascending order. Rows having the same values for "column-a" are then sorted by "column-b" in descending order.

Python

df = dataset_reader.sort([('column-a', 'asc'), ('column-b', 'desc')])

R

df <- dataset_reader$sort(c(('column-a', 'asc'), ('column-b', 'desc')))$read()

Basic writing of data

The IMS Org is set within the client_context .
To write data in Python and R, use one of the following examples below:

Python

from platform_sdk.models import Dataset
from platform_sdk.dataset_writer import DatasetWriter

dataset = Dataset(client_context).get_by_id("{DATASET_ID}")
dataset_writer = DatasetWriter(client_context, dataset)
write_tracker = dataset_writer.write({PANDA_DATAFRAME}, file_format='json')

R

dataset <- psdk$models$Dataset(client_context)$get_by_id("{DATASET_ID}")
dataset_writer <- psdk$dataset_writer$DatasetWriter(client_context, dataset)
write_tracker <- dataset_writer$write({PANDA_DATAFRAME}, file_format='json')

Next steps

Once you have configured the platform_sdk data loader, the data undergoes preparation and is then split to the train and val datasets. To learn about data preparation and feature engineering please visit the section on data preparation and feature engineering in the tutorial for creating a recipe using JupyterLab notebooks.