Cluster ID

A cluster is a group of records that are considered to be duplicates (i.e. represent the same underlying real-world subject). For the Harmonize duplicates step to identify which clusters have to be harmonized into a single record, a column containing a unique identifier for each cluster is required.

The Find duplicates step automatically creates a Duplicates: Cluster Id column, so a subsequent Harmonize duplicates step will automatically select this column.

However, if your clusters have been identified already (using Data Studio or other means), you can simply select the column that contains your cluster IDs without using the Find duplicates step.