Overview

Duplicate store objects enable the management and persistence of Duplicate stores in Data Studio. Stores can be both established and updated using the Find duplicates step, and custom settings can be maintained for each store. When the Find duplicates step is first run the configured store will be established on the Find duplicates server and retained to disk. Subsequent runs using the same store will add and/or update the records in the store. Once the store has been established it can also be used with the Find duplicates query and delete steps.

Duplicate stores list screen

This screen provides information on each of the stores accessible in the current Space. From here users can create, edit, delete, clear, and set the sharing options of Duplicate stores.

Creation and management

From the Duplicate stores list screen click Create new Duplicate store, or to edit an existing store, select the Edit details action.

The External label (Duplicate store ID) is the name of the store on the Find duplicates server. This will be generated automatically based on the Duplicate store name but formatted to be compliant with external label restrictions and to be unique in this environment.

Selecting Encrypt Duplicate store will encrypt the store on establishment and require an encryption key to perform all store operations.

If encryption is enabled, the Encryption key field will appear in the Create new Duplicate store page, allowing you to enter an encryption key. You can use the default encryption key generated by the system or enter your own key.

Selecting ‘Include timestamp columns’ will append the ‘Created timestamp’ and ‘Previous updated timestamp’ columns to the output of the Find Duplicates step to indicate when updated records were created or previously modified.

The server location for the Duplicate store defaults to the server configured in Settings > Workflow steps > Find duplicates. If ‘Use remote server as default’ is disabled, the embedded Find duplicates server will be the default. Otherwise the embedded server will not be configurable in the Duplicate store settings.

Duplicate stores configured to use the embedded Find duplicates server are retained in the Data Studio repository, in the experianmatch sub-directory. Duplicate stores configured to use a separate instance of a Find duplicates server will be retained on the same machine as the remote instance.

Under Settings an existing Step setting configuration for rules and blocking keys can be chosen, or, if ‘Custom options’ is selected, the rules and blocking keys can be entered manually.

Find duplicates step

From the Find duplicate step you can select a temporary, a previously created Duplicate store or you can create a new one. If you select the Temporary store, the store data will not be persisted. If you select Duplicate store, the store will be established and persisted on disk when the Workflow is first run. Once a store is established, the default behaviour of the Find duplicates step is to perform a maintenance operation meaning records will be added to or updated in the existing store. If you want to re-establish the duplicate store every time the workflow is run, overwriting the existing store, check the Clear and re-establish store option on the step.

Sharing

You can share Duplicate stores globally or with specific Spaces. From the Duplicate stores list page, click Sharing options. To use a Duplicate store that's been shared with you, click Include from another Space.