Rules and blocking keys

Rules and blocking keys define how records are matched in Aperture Data Studio. To create a new set of rules or blocking keys or view existing ones go to Step settings > Find duplicates settings, or from the Duplicate stores screen either click Create new Duplicate store or select the Edit details action on the action menu of an existing Duplicate store.

Configuration

When configuring blocking keys, you can set the blocking key limit, which is the limit at which point potential matches generated by the blocking key value are ignored. The larger the block, the more comparisons need to be made. A block of 500 records would need almost 125k comparisons as every record needs to be compared with every other record. You can set a limit for each key individually.

When configuring rules, the following options can be selected:

  • The Purpose of the rules:
    • Search only: Search only rules can be used when searching the duplicate store. You can add as many sets of search only rules as you require. If you only specify search rules for a store, no clusters will be formed when the store is populated however if your use case does not require clustering (e.g. duplicate prevention) then maintaining a search only store has performance advantages when adding or updating records.
    • Clustering: Clustering rules are used to form clusters as records are added to the store. Only one set of clustering rules can be defined per store. Clustering rules can also be used when searching the store.
  • Enable cluster management:
    • This option is only available for clustering rules. It allows manual refinement of clusters that have been generated by the configured rules and blocking keys, by users with the assigned capability.
  • Associated blocking keys:
    • You can select a subset of available blocking keys for each set of rules. Only the associated blocking keys will be used to generate potential matches. This extra control means a blocking key can be considered for search but not clustering and vice versa.

Default Find duplicates step settings

Aperture Data Studio provides default Find duplicates step settings for use with the Find duplicates step which can be found in Step settings > Find duplicates settings:

  • Individual: groups records with similar names at similar addresses. For example, GBR_Individual_Default will find individuals in Great Britain. Note that emails, phone numbers, and other identifiers will not be taken into account, but can be added manually.
  • Household: groups records with the same or similar family names at a similar address. For example, GBR_Household_Default will find households in Great Britain.
  • Location: groups records with similar addresses or locations. For example, GBR_Location_Default will find locations in Great Britain.

Default blocking keys and rules are provided for Australia (AUS), Great Britain (GBR), and United States (USA) as detailed in the table below:

Name Summary
AUS_Individual_Default Default Australia individual level rules and blocking keys based on name and address
AUS_Household_Default Default Australia household level rules and blocking keys based on surname (last name) only and address
AUS_Location_Default Default Australia location level rules and blocking keys based on address only
GBR_Individual_Default Default United Kingdom individual level rules and blocking keys based on name and address
GBR_Individual_Alternative Alternative United Kingdom individual level rules and blocking keys based on name and address that may produce better results for large cities
GBR_Household_Default Default United Kingdom household level rules and blocking keys based on surname (last name) only and address
GBR_Location_Default Default United Kingdom location level rules and blocking keys based on address only
USA_Individual_Default Default United States of America individual level rules and blocking keys based on name and address
USA_Household_Default Default United States of America household level rules and blocking keys based on surname (last name) only and address
USA_Location_Default Default United States of America location level rules and blocking keys based on address only

​​​​​​​

The summary of each step setting is included to explain the purpose of the blocking keys and rules. The details of a step setting can also be viewed when clicked in the Step settings list screen.

Environment and Space level step settings

Find duplicate settings can be configured at an Environment level to be available by all Spaces in the Environment or at Space level:

  • Environment level: Creating Find duplicates step settings in the System Space (go to Space drop down > select System) will make them available to all Spaces within an Environment. This will be useful if all duplicate stores across different Spaces shall use the same rules and blocking keys. Users will automatically see those Find duplicates step settings in their step settings.
  • Space level: Find duplicates step settings created in a single Space, on the other hand, will only be visible in that Space.