Create and publish a machine learning model walkthrough
Pretend you own an online retail website. When your customers shop at your retail website, you want to present them with personalized product recommendations to expose a variety of other products your business offers. Over the span of your website's existence, you have continuously gathered customer data and want to somehow use this data towards generating personalized product recommendations.
Adobe Experience Platform Data Science Workspace provides the means to achieve your goal using the prebuilt Product Recommendations Recipe . Follow this tutorial to see how you can access and understand your retail data, create and optimise a machine learning Model, and generate insights in Data Science Workspace.
This tutorial reflects the workflow of Data Science Workspace, and covers the following steps for creating a machine learning Model:
Before starting this tutorial, you must have the following prerequisites:
- Access to Adobe Experience Platform. If you do not have access to an IMS Organization in Experience Platform, please speak to your system administrator before proceeding.
- Enablement assets. Please reach out to your account representative to have the following items provisioned for you.
- Recommendations Recipe
- Recommendations Input Dataset
- Recommendations Input Schema
- Recommendations Output Dataset
- Recommendations Output Schema
- Golden Data Set postValues
- Golden Data Set Schema
- Download the three required Jupyter Notebook files from the Adobe public Git repository , these will be used to demonstrate the JupyterLab workflow in Data Science Workspace.
- A working understanding of the following key concepts used in this tutorial:
- Experience Data Model : The standardization effort led by Adobe to define standard schemas such as Profile and ExperienceEvent, for Customer Experience Management.
- Datasets: A storage and management construct for actual data. A physical instantiated instance of an XDM Schema .
- Batches: Datasets are made up of batches. A batch is a set of data collected over a period of time and processed together as a single unit.
- JupyterLab: JupyterLab is an open-source web-based interface for Project Jupyter and is tightly integrated into Experience Platform.
Prepare your data
To create a machine learning Model that makes personalized product recommendations to your customers, previous customer purchases on your website must be analyzed. This section explores how this data is ingested into Platform through Adobe Analytics, and how that data is transformed into a Feature dataset to be used by your machine learning Model.
Explore the data and understand the schemas
- Log in to Adobe Experience Platform and click Datasets to list all existing datasets and select the dataset that you would like to explore. In this case, the Analytics dataset Golden Data Set postValues .
- Select Preview Dataset near the top right to examine sample records, then click Close .
- Select the link under Schema in the right rail to view the schema for the dataset, then go back to the dataset details page."
The other datasets have been pre-populated with batches for previewing purposes. You can view these datasets by repeating the above steps.
Golden Data Set postValues
Golden Data Set schema
Analytics source data from your website
Recommendations Input Dataset
Recommendations Input Schema
The Analytics data is transformed into a training dataset using a feature pipeline. This data is used to train the Product Recommendations machine learning Model. itemid and userid correspond to a product purchased by that customer.
Recommendations Output Dataset
Recommendations Output Schema
The dataset for which scoring results are stored, it will contain the list of recommended products for each customer.
Train and evaluate your Model
Now that your data is prepared and the Recipe is ready to be used, you can create, train, and evaluate your machine learning Model.
Create a Model
A Model is an instance of a Recipe, enabling you to train and score with data at scale.
- In Adobe Experience Platform, navigate to Models from the left navigation column, then click Recipes at the top of the page to display a list of all available Recipes for your organization..
- Locate and open the provided Recommendations Recipe by clicking its name, entering the Recipe's overview page. Click Create a Model either from the center (if there are no existing Models) or from the top right of the Recipe Overview page.
- A list of available input datasets for training is shown, select Recommendations Input Dataset and click Next .
- Provide a name for the Model, for example "Product Recommendations Model". Available configurations for the model are listed, containing settings for the Model's default training and scoring behaviors. No changes are needed as these configurations are specific to your organization. Review the configurations and click Finish .
- The Model has now been created and the Model's Overview page appears within a newly generated training run. A training run is generated by default when a Model is created.
You can choose to wait for the training run to finish, or continue to create a new training run in the following section.
Train the Model using custom Hyperparameters
- On the Model Overview page, click Train near the top right to create a new training run. Select the same input dataset you used when creating the Model and click Next .
- The Configuration page appears. Here you can configure the training run's "num_recommendations" value, also known as a Hyperparameter. A trained and optimized Model will utilize the best-performing Hyperparameters based on the results of the training run.Hyperparameters cannot be learned, therefore they must be assigned before training runs occur. Adjusting Hyperparameters may change the accuracy of the Trained Model. Since optimizing a Model is an iterative process, multiple training runs may be required before a satisfactory evaluation is achieved.Set num_recommendations to 10.
- An additional data point will appear on the Model evaluation chart once the new training run completes, this may take up to several minutes.
Evaluate the Model
Each time a training run completes, you can view the resulting evaluation metrics to determine how well the Model performed.
- Review the evaluation metrics (Precision and Recall) for each completed training run by clicking on the training run.
- Explore the information provided for each evaluation metric. The higher these metrics, the better the Model performed.
- You can see the dataset, schema, and configuration parameters used for each training run on the right rail.
- Navigate back to the Model page and identify the top performing training run by observing their evaluation metrics.
Operationalize your Model
The final step in the Data Science workflow is to operationalize your model in order to score and consume insights from your data store.
Score and generate insights
- On the product recommendations Model Overview page, click the name of the best-performing training run, with the highest recall and precision values.
- On the top-right of the training run details page, click Score .
- Select the Recommendations Input Dataset as the scoring input dataset, which is the same dataset you used when you created the Model and executed its training runs. Then, click Next .
- Select the Recommendations Output Dataset as the scoring output dataset. Scoring results will be stored in this dataset as a batch.
- Review the scoring configurations. These parameters contain the input and output datasets selected earlier along with the appropriate schemas. Click Finish to begin the scoring run. The run may take several minutes to complete.
View scored insights
Once the scoring run has successfully completed, you will be able to preview the results and view the insights generated.
- On the scoring runs page, click on the completed scoring run, then click Preview Scoring Results Dataset on the right rail.
- In the preview table, each row contains product recommendations for a particular customer, labeled as recommendations and userId respectively. Since the num_recommendations Hyperparameter was set to 10 in the sample screenshots, each row of recommendations can contain up to 10 product identities delimited by a number sign (#).
Well done, you have successfully generated product recommendations!
This tutorial introduced you to the workflow of Data Science Workspace, demonstrating how raw unprocessed data can be turned into useful information through machine learning. To learn more about using the Data Science Workspace, continue to the next guide on creating the retail sales schema and dataset .