Create an Apache Spark on Azure HDInsights base connection using the Flow Service API

NOTE
The Apache Spark on Azure HDInsights connector is in beta. See the Sources overview for more information on using beta-labeled connectors.

A base connection represents the authenticated connection between a source and Adobe Experience Platform.

This tutorial walks you through the steps to create a base connection for Apache Spark on Azure HDInsights (hereinafter referred to as “Spark”) using the Flow Service API.

Getting started

This guide requires a working understanding of the following components of Adobe Experience Platform:

  • Sources: Experience Platform allows data to be ingested from various sources while providing you with the ability to structure, label, and enhance incoming data using Platform services.
  • Sandboxes: Experience Platform provides virtual sandboxes which partition a single Platform instance into separate virtual environments to help develop and evolve digital experience applications.

The following sections provide additional information that you will need to know in order to successfully connect to Spark using the Flow Service API.

Gather required credentials

In order for Flow Service to connect with Spark, you must provide values for the following connection properties:

Credential
Description
host
The IP address or host name of the Spark server.
username
The user name that you use to access Spark Server.
password
The password corresponding to the user.
connectionSpec.id
The connection specification returns a source’s connector properties, including authentication specifications related to creating the base and source connections. The connection specification ID for Spark is: 6a8d82bc-1caf-45d1-908d-cadabc9d63a6

For more information about getting started refer to this Spark document.

Using Platform APIs

For information on how to successfully make calls to Platform APIs, see the guide on getting started with Platform APIs.

Create a base connection

A base connection retains information between your source and Platform, including your source’s authentication credentials, the current state of the connection, and your unique base connection ID. The base connection ID allows you to explore and navigate files from within your source and identify the specific items that you want to ingest, including information regarding their data types and formats.

To create a base connection ID, make a POST request to the /connections endpoint while providing your Spark authentication credentials as part of the request parameters.

API format

POST /connections

Request

The following request creates a base connection for Spark:

curl -X POST \
    'https://platform.adobe.io/data/foundation/flowservice/connections' \
    -H 'Authorization: Bearer {ACCESS_TOKEN}' \
    -H 'x-api-key: {API_KEY}' \
    -H 'x-gw-ims-org-id: {ORG_ID}' \
    -H 'x-sandbox-name: {SANDBOX_NAME}' \
    -H 'Content-Type: application/json' \
    -d '{
        "name": "Spark test connection",
        "description": "A Spark test connection",
        "auth": {
            "specName": "HDInsights Basic Authentication",
        "params": {
            "host":  "{HOST}",
            "username": "{USERNAME}",
            "password":"{PASSWORD}"
            }
        },
        "connectionSpec": {
            "id": "6a8d82bc-1caf-45d1-908d-cadabc9d63a6",
            "version": "1.0"
        }
    }'
Parameter
Description
auth.params.host
The host of the Spark server.
auth.params.username
The username associated with your Spark connection.
auth.params.password
The password associated with your Spark connection.
connectionSpec.id
The Spark connection specification ID: 6a8d82bc-1caf-45d1-908d-cadabc9d63a6.

Response

A successful response returns details of the newly created connection, including its unique identifier (id). This ID is required to explore your data in the next tutorial.

{
    "id": "a45f2f58-e3a2-46ba-9f2f-58e3a2b6baf2",
    "etag": "\"900009d6-0000-0200-0000-5e8500010000\""
}

Next steps

By following this tutorial, you have created a Spark base connection using the Flow Service API. You can use this base connection ID in the following tutorials:

recommendation-more-help
337b99bb-92fb-42ae-b6b7-c7042161d089