Data Quality user documentation

Bulk loading enables Catalog users to manage large volumes of data, including adding, updating, and removing records. It is accessed through Data Studio Workflow steps.

To use this feature, you need to set up an API key with Catalog create/update permissions and configure the Catalog step settings.

The Bulk Loader helps you:

Manage Objects – Quickly create, update or delete large numbers of new records.
Link Objects – Set up and maintain relationships between objects (such as Customers and Accounts).
Manage Interests – Add, update, or remove interests.

Initial setup

First, in Aperture Data Studio, generate an API key with Catalog create/update permissions.
Next, create a Catalog Step setting and paste the key value into the API key field. Leave all other settings unchanged.

Retrieving object data from the Catalog

To retrieve Catalog object information, use a Source step within a Data Studio workflow.

In the Source step, set the Source to the Step settings configured in the setup instructions above.
Once the source is selected, the Object type dropdown will refresh to display a list of all available Catalog Object types.
Choose the required Object type.

Enable Fetch from Catalog on edit mode to view returned Catalog data without running the workflow. The data is displayed when Show step results is clicked.

Managing objects

To manage Catalog objects, use the Update Catalog step.

First, import the data you want to work with into Data Studio. It is possible to use Catalog data as a Source.
Then, within a Data Studio workflow, connect a Source step to an Update Catalog step.
In the Source step, select the data you want to use, then in the Update Catalog step, choose the primary operation you want to perform.

The possible operations are:

Create: Adds new records to the Catalog. Only records that do not already exist will be created.
Update: Modifies existing records in the Catalog. Records must already exist - no new records will be created.
Update/Create: Updates records if they already exist, or creates them if they do not.
Delete: Removes existing records from the Catalog based on the provided data. This operation permanently deletes the specified records, so it should be used with caution.

Once the operation has been selected, map the fields, associations, and interests that need to be created or updated.

If the records you are working with already exist in the Catalog, you can use the object’s unique identifier (UUID) to specify which record should be modified when performing Update, Update/Create, or Delete operations. This is especially useful when multiple objects share the same name, as the UUID ensures that the correct record is targeted.

Enable Fetch from Catalog on edit mode to view the results of the Update Catalog operations without running the workflow. The requested operation is applied, and the outcomes are shown when Show step results is clicked.

Field mapping options

If the Do not ignore empty values option is enabled, any empty or null input values will overwrite and clear the corresponding existing values in the Catalog.
If this option is disabled, empty or null inputs are skipped, and existing Catalog values remain unchanged.

Association and Interest mapping options

Overwrite existing values (ignore empty values): Empty or null values are ignored and existing values are preserved.
Overwrite existing values: Empty or null input values will clear existing values in the Catalog.
Add to existing values: New values are added to existing values.

If adding or updating an object with multiple associations then it is necessary to convert the association inputs to a List field. This can be achieved using the Transform step and a To list function.

Bulk loader results

Results can be retrieved in two ways:

Connect the Source or Update Catalog step to an output step such as Take snapshot or Export step, and run the workflow.
Enable the Fetch from Catalog on edit mode option on the relevant step and click Show step results.

Source step results
The Source step returns a list of objects, including their names, fields, associations, and interest type values, along with the corresponding UUIDs for each.

Update Catalog step results
The Update Catalog step returns the input values alongside the result of the Catalog operation for each input. A result status is provided for each record, along with a message containing additional details where applicable.

Notes

Important considerations

Transoperation-Based: All data in a bulk load is treated as a single transoperation. Either all changes are applied or none are applied.
Approval process bypassed: All bulk-loaded data is processed immediately, without being subject to the standard approval process.
Relationship requirements: If your input data references objects that don't exist in the system, the import will fail.
Duplicate Checking: The system prevents importing duplicate records (same object names or relationships that already exist).
Performance: Very large imports (100,000+ records) may take time to process and validate.
Column headers: Column names in your input data must exactly match the field names in Data Studio (case-sensitive in some cases).
Required fields: Any fields marked as "required" in your object type must have values in the input data.

Validation rules

UUID Format: If you are using UUIDs to identify data, make sure they are in a valid format.
Data Types: Dates, numbers, and other values must be in the correct format.
Unique Names: Depending on your object type configuration, duplicate names may not be allowed.
Required Fields: Mandatory fields must be filled in and cannot be left empty.

When import fails

If your bulk load fails validation:

Check the returned error message to identify the problem.
Fix the issue in your input data (e.g., add missing required fields, remove duplicates).
Re-upload the file and try again.

You can retry as many times as needed - failed imports don't affect your system.

Was this helpful?

Previous: Catalog administration

Next: Viewing objects

Bulk loading