Show Menu
TOPICS×

Package source files into a recipe

This tutorial provides instructions on how you can package the provided Retail Sales sample source files into an archive file, which can be used to create a recipe in Adobe Experience Platform Data Science Workspace by following the recipe import workflow either in the UI or using the API.
Concepts to understand:
  • Recipes : A recipe is Adobe's term for a Model specification and is a top-level container representing a specific machine learning, artificial intelligence algorithm or ensemble of algorithms, processing logic, and configuration required to build and execute a trained model and hence help solve specific business problems.
  • Source files : Individual files in your project that contain the logic for a recipe.

Recipe creation

Recipe creation starts with packaging source files to build an archive file. Source files define the machine learning logic and algorithms used to solve a specific problem at hand, and are written in either Python, R, PySpark, or Scala. Built archive files take the form of a Docker image. Once built, the packaged archive file is imported into Data Science Workspace to create a recipe in the UI or using the API .

Docker based model authoring

A Docker image allows a developer to package up an application with all the parts it needs, such as libraries and other dependencies, and ship it out as one package.
The built Docker image is pushed to the Azure Container Registry using credentials supplied to you during the recipe creation workflow.
To obtain your Azure Container Registry credentials, log into platform.adobe.com Adobe Experience Platform. On the left navigation column, navigate to Workflows . Select Import Recipe followed by selecting Launch . See the screen shot below for reference.
The Configure page opens. Provide an appropriate Recipe Name , for example, "Retail Sales recipe", and optionally provide a description or documentation URL. Once complete, click Next .
Select the appropriate Runtime , then choose a Classification for Type . Your Azure Container Registry credentials are generated once complete.
Type is the class of machine learning problem the recipe is designed for and is used after training to help tailor evaluating the training run.
  • For Python recipes select the Python runtime.
  • For R recipes select the R runtime.
  • For PySpark recipes select the PySpark runtime. An artifact type auto populates.
  • For Scala recipes select the Spark runtime. An artifact type auto populates.
Note the values for Docker Host , Username , and Password . These are used to build and push your Docker image in the workflows outlined below.
The Source URL is provided after completing the steps outlined below. The configuration file is explained in subsequent tutorials found in next steps .

Package the source files

Start by obtaining the sample codebase found in the experience-platform-dsw-reference Experience Platform Data Science Workspace Reference repository.

Build Python Docker image

If you have not done so, clone the github repository onto your local system with the following command:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

Navigate to the directory experience-platform-dsw-reference/recipes/python/retail . Here, you will find the scripts login.sh and build.sh used to login to Docker and to build the python Docker image. If you have your Docker credentials ready, enter the following commands in order:
# for logging in to Docker
./login.sh
 
# for building Docker image
./build.sh

Note that when executing the login script, you need to provide the Docker host, username, and password. When building, you are required to provide the Docker host and a version tag for the build.
Once the build script is complete, you are given a Docker source file URL in your console output. For this specific example, it will look something like:
# URL format: 
{DOCKER_HOST}/ml-retailsales-python:{VERSION_TAG}

Copy this URL and move on to the next steps .

Build R Docker image

If you have not done so, clone the github repository onto your local system with the following command:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

Navigate to the directory experience-platform-dsw-reference/recipes/R/Retail - GradientBoosting inside your cloned repository. Here, you'll find the files login.sh and build.sh which you will use to login to Docker and to build the R Docker image. If you have your Docker credentials ready, enter the following commands in order:
# for logging in to Docker
./login.sh
 
# for build Docker image
./build.sh

Note that when executing the login script, you need to provide the Docker host, username, and password. When building, you are required to provide the Docker host and a version tag for the build.
Once the build script is complete, you are given a Docker source file URL in your console output. For this specific example, it will look something like:
# URL format: 
{DOCKER_HOST}/ml-retail-r:{VERSION_TAG}

Copy this URL and move on to the next steps .

Build PySpark Docker image

Start by cloning the github repository onto your local system with the following command:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

Navigate to the directory experience-platform-dsw-reference/recipes/pyspark/retail . The scripts login.sh and build.sh are located here and used to login to Docker and to build the Docker image. If you have your Docker credentials ready, enter the following commands in order:
# for logging in to Docker
./login.sh
 
# for building Docker image
./build.sh

Note that when executing the login script, you need to provide the Docker host, username, and password. When building, you are required to provide the Docker host and a version tag for the build.
Once the build script is complete, you are given a Docker source file URL in your console output. For this specific example, it will look something like:
# URL format: 
{DOCKER_HOST}/ml-retailsales-pyspark:{VERSION_TAG}

Copy this URL and move on to the next steps .

Build Scala Docker image

Start by cloning the github repository onto your local system with the following command in terminal:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

Next, navigate to the directory experience-platform-dsw-reference/recipes/scala/retail where you can find the scripts login.sh and build.sh . These scripts are used to login to Docker and build the Docker image. If you have your Docker credentials ready, enter the following commands to terminal in order:
# for logging in to Docker
./login.sh
 
# for building Docker image
./build.sh

When executing the login script, you need to provide the Docker host, username, and password. When building, you are required to provide the Docker host and a version tag for the build.
Once the build script is complete, you are given a Docker source file URL in your console output. For this specific example, it will look something like:
# URL format: 
{DOCKER_HOST}/ml-retailsales-spark:{VERSION_TAG}

Copy this URL and move on to the next steps .

Next steps

This tutorial went over packaging source files into a Recipe, the prerequisite step for importing a Recipe into Data Science Workspace. You should now have a Docker image in Azure Container Registry along with the corresponding image URL. You are now ready to begin the tutorial on importing a packaged recipe into Data Science Workspace. Select one of the tutorial links below to get started:

Building binaries (deprecated)

Binaries are not supported in new PySpark and Scala recipes and set to be removed in a future release. Please follow the Docker workflows when working with PySpark and Scala. The following workflows are only applicable to Spark 2.3 recipes.

Build PySpark binaries (deprecated)

If you have not done so, clone the github repository onto your local system with the following command:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

Navigate in to the cloned repository on your local system and run the following commands in order to build the required .egg file for importing a PySpark recipe:
cd recipes/pyspark
./build.sh

The .egg file is generated in the dist folder.
You can now move on to the next steps .

Build Scala binaries (deprecated)

If you have not already done so, run the following command to clone the Github repository to your local system:
git clone https://github.com/adobe/experience-platform-dsw-reference.git

To build the .jar artifact used to import a Scala recipe, navigate to your cloned repository and follow the steps below:
cd recipes/scala/
./build.sh

The generated .jar artifact with dependencies is found in the /target directory.
You can now move on to the next steps .