文档 Experience Platform Data Ingestion 指南

检索数据引入错误诊断

Last update: Thu May 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time)

主题：
Data Ingestion

创建对象：

Developer

Adobe Experience Platform提供两种上传和摄取数据的方法。您可以使用批量摄取，这允许您使用各种文件类型（如CSV）插入数据；也可以使用流式摄取，这允许您将其数据插入到 Platform 实时使用流式端点。

本文档提供了有关监控批次摄取、管理部分批次摄取错误的信息，以及部分批次摄取类型的参考。

快速入门

本指南要求您对Adobe Experience Platform的以下组件有一定的了解：

Experience Data Model (XDM) System：用于实现此目标的标准化框架 Experience Platform 组织客户体验数据。
Adobe Experience Platform Data Ingestion：将数据发送到的方法 Experience Platform.

正在读取示例API调用

本教程提供了示例API调用来演示如何设置请求的格式。这些资源包括路径、必需的标头和格式正确的请求负载。此外，还提供了在API响应中返回的示例JSON。有关示例API调用文档中使用的约定的信息，请参阅以下章节：如何读取示例API调用在 Experience Platform 疑难解答指南。

收集所需标题的值

为了调用 Platform API，您必须先完成身份验证教程. 完成身份验证教程将提供所有中所有所需标头的值 Experience Platform API调用，如下所示：

Authorization: Bearer {ACCESS_TOKEN}
x-api-key: {API_KEY}
x-gw-ims-org-id: {ORG_ID}

中的所有资源 Experience Platform，包括那些属于 Schema Registry，与特定的虚拟沙盒隔离。的所有请求 Platform API需要一个标头，用于指定将在其中执行操作的沙盒的名称：

x-sandbox-name: {SANDBOX_NAME}

NOTE

有关中沙箱的详细信息 Platform，请参见沙盒概述文档.

正在下载错误诊断 download-diagnostics

Adobe Experience Platform允许用户下载输入文件的错误诊断。诊断程序将保留在 Platform 最多30天。

列出输入文件 list-files

以下请求检索在最终确定的批次中提供的所有文件的列表。

API格式

GET /batches/{BATCH_ID}/meta?path=input_files

属性

描述

{BATCH_ID}

要查找的批次的ID。

请求

curl -X GET https://platform.adobe.io/data/foundation/export/batches/af838510-2233-11ea-acf0-f3edfcded2d2/meta?path=input_files \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {ORG_ID}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

响应

成功响应将返回JSON对象，其中详细说明诊断的保存位置。

{
    "_page": {
        "count": 1,
        "limit": 100
    },
    "data": [
        {
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/batches/af838510-2233-11ea-acf0-f3edfcded2d2/meta?path=input_files/fileMetaData1.json"
                }
            },
            "length": "1337",
            "name": "fileMetaData1.json"
        },
                {
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io/data/foundation/export/batches/af838510-2233-11ea-acf0-f3edfcded2d2}/meta?path=input_files/fileMetaData2.json"
                }
            },
            "length": "1042",
            "name": "fileMetaData2.json"
        }
    ]
}

检索输入文件诊断 retrieve-diagnostics

检索完所有不同输入文件的列表后，可以使用以下请求检索单个文件的诊断。

API格式

GET /batches/{BATCH_ID}/meta?path=input_files/{FILE}

属性

描述

{BATCH_ID}

要查找的批次的ID。

{FILE}

您正在访问的文件的名称。

请求

curl -X GET https://platform.adobe.io/data/foundation/export/batches/af838510-2233-11ea-acf0-f3edfcded2d2/meta?path=input_files/fileMetaData1.json \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {ORG_ID}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

响应

成功的响应将返回包含以下内容的JSON对象 path 详细说明诊断保存位置的对象。响应将返回 path 中的对象 JSON行格式。

{"path": "F1.json"}
{"path": "etc/F2.json"}

检索批次摄取错误 retrieve-errors

如果批次包含故障，则应检索有关这些故障的错误信息，以便重新摄取数据。

检查状态 check-status

要检查摄取的批的状态，必须在GET请求的路径中提供批的ID。要了解有关使用此API调用的更多信息，请阅读目录端点指南.

API格式

GET /catalog/batches/{BATCH_ID}
GET /catalog/batches/{BATCH_ID}?{FILTER}

参数

描述

{BATCH_ID}

此 id 要检查其状态的批的值。

{FILTER}

用于筛选响应中返回结果的查询参数。多个参数由&符号(&)。欲知更多信息，请阅读筛选目录数据.

请求

curl -X GET https://platform.adobe.io/data/foundation/catalog/batches/af838510-2233-11ea-acf0-f3edfcded2d2 \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {ORG_ID}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

无错误响应

成功响应将返回，其中包含有关批次状态的详细信息。

{
    "af838510-2233-11ea-acf0-f3edfcded2d2": {
        "status": "success",
        "tags": {
            "acp_enableErrorDiagnostics": true,
            "acp_partialIngestionPercent": 5
        },
        "relatedObjects": [
            {
                "type": "dataSet",
                "id": "5deac2648a19d218a888d2b1"
            }
        ],
        "id": "af838510-2233-11ea-acf0-f3edfcded2d2",
        "externalId": "af838510-2233-11ea-acf0-f3edfcded2d2",
        "inputFormat": {
            "format": "parquet"
        },
        "imsOrg": "{ORG_ID}",
        "started": 1576741718543,
        "metrics": {
            "inputByteSize": 568,
            "inputFileCount": 4,
            "inputRecordCount": 519,
            "outputRecordCount": 497,
            "failedRecordCount": 0
        },
        "completed": 1576741722026,
        "created": 1576741597205,
        "createdClient": "{API_KEY}",
        "createdUser": "{USER_ID}",
        "updatedUser": "{USER_ID}",
        "updated": 1576741722644,
        "version": "1.0.5"
    }
}

属性

描述

metrics.failedRecordCount

由于解析、转换或验证而无法处理的行数。此值可通过减去以下值得出： inputRecordCount 从 outputRecordCount. 此值将在所有批次中生成，无论是否为 errorDiagnostics 已启用。

有错误的响应

如果批次具有一个或多个错误，并且启用了错误诊断，则响应将返回有关错误的更多信息，包括有效负载本身以及可下载的错误文件。请注意，包含错误的批的状态可能仍具有成功状态。

{
    "01E8043CY305K2MTV5ANH9G1GC": {
        "status": "success",
        "tags": {
            "acp_enableErrorDiagnostics": true,
            "acp_partialIngestionPercent": 5
        },
        "relatedObjects": [
            {
                "type": "dataSet",
                "id": "5deac2648a19d218a888d2b1"
            }
        ],
        "id": "01E8043CY305K2MTV5ANH9G1GC",
        "externalId": "01E8043CY305K2MTV5ANH9G1GC",
        "inputFormat": {
            "format": "parquet"
        },
        "imsOrg": "{ORG_ID}",
        "started": 1576741718543,
        "metrics": {
            "inputByteSize": 568,
            "inputFileCount": 4,
            "inputRecordCount": 519,
            "outputRecordCount": 514,
            "failedRecordCount": 5
        },
        "completed": 1576741722026,
        "created": 1576741597205,
        "createdClient": "{API_KEY}",
        "createdUser": "{USER_ID}",
        "updatedUser": "{USER_ID}",
        "updated": 1576741722644,
        "version": "1.0.5",
        "errors": [
           {
             "code": "INGEST-1212-400",
             "description": "Encountered 5 errors in the data. Successfully ingested 514 rows. Please review the associated diagnostic files for more details."
           },
           {
             "code": "INGEST-1401-400",
             "description": "The row has corrupted data and cannot be read or parsed. Fix the corrupted data and try again.",
             "recordCount": 2
           },
           {
             "code": "INGEST-1555-400",
             "description": "A required field is either missing or has a value of null. Add the required field to the input row and try again.",
             "recordCount": 3
           }
        ]
    }
}

属性

描述

metrics.failedRecordCount

errors.recordCount

指定错误代码失败的行数。此值为仅限生成条件 errorDiagnostics 已启用。

NOTE

如果没有错误诊断程序可用，则会显示以下错误消息：

code language-json

code language-json
`{ "errors": [{ "code": "INGEST-1211-400", "description": "Encountered errors while parsing, converting or otherwise validating the data. Please resend the data with error diagnostics enabled to collect additional information on failure types" }] }`

{
       "errors": [{
               "code": "INGEST-1211-400",
               "description": "Encountered errors while parsing, converting or otherwise validating the data. Please resend the data with error diagnostics enabled to collect additional information on failure types"
       }]
}

后续步骤 next-steps

本教程介绍了如何监测部分批次摄取错误。有关批量摄取的更多信息，请阅读批量摄取开发人员指南.

附录 appendix

本节提供有关摄取错误类型的补充信息。

部分批次摄取错误类型 partial-ingestion-types

摄取数据时，部分批量摄取有三种不同的错误类型：

无法读取的文件
无效的架构或标头
不可分析的行

无法读取的文件 unreadable

如果摄取的批次具有不可读的文件，则该批次的错误将附加到批次本身。有关检索失败批次的更多信息，请参阅检索失败的批次指南.

无效的架构或标头 schemas-headers

如果摄取的批次具有无效架构或无效标头，则批次的错误将附加到批次本身。有关检索失败批次的更多信息，请参阅检索失败的批次指南.

不可分析的行 unparsable

如果您摄取的批次具有不可分析的行，则可以使用以下请求查看包含错误的文件列表。

API格式

GET /export/batches/{BATCH_ID}/meta?path=row_errors

参数

描述

{BATCH_ID}

此 id 从中检索错误信息的批次的值。

请求

curl -X GET https://platform.adobe.io/data/foundation/export/batches/01EFZ7W203PEKSAMVJC3X99VHQ/meta?path=row_errors \
  -H 'Authorization: Bearer {ACCESS_TOKEN}' \
  -H 'x-api-key: {API_KEY}' \
  -H 'x-gw-ims-org-id: {ORG_ID}' \
  -H 'x-sandbox-name: {SANDBOX_NAME}'

响应

成功响应将返回包含错误的文件列表。

{
    "data": [
        {
            "name": "conversion_errors_0.json",
            "length": "1162",
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io:443/data/foundation/export/batches/01EFZ7W203PEKSAMVJC3X99VHQ/meta?path=row_errors%2Fconversion_errors_0.json"
                }
            }
        },
        {
            "name": "parsing_errors_0.json",
            "length": "153",
            "_links": {
                "self": {
                    "href": "https://platform.adobe.io:443/data/foundation/export/batches/01EFZ7W203PEKSAMVJC3X99VHQ/meta?path=row_errors%2Fparsing_errors_0.json"
                }
            }
        }
    ],
    "_page": {
        "limit": 100,
        "count": 2
    }
}

然后，您可以使用检索有关错误的详细信息诊断检索端点.

检索错误文件的示例响应如下所示：

{
    "_corrupt_record": "{missingQuotes: 'v1'}",
    "_errors": [{
        "code": "1401",
        "message": "Row is corrupted and cannot be read, please fix and resend."
    }],
    "_filename": "parsing_errors_0.json"
}

recommendation-more-help

2ee14710-6ba4-4feb-9f79-0aad73102a9a