Analyze blocking keys

The Analyze blocking keys section shows you the efficiency and effectiveness of your blocking keys and suggests potential performance improvements.

To begin analyzing your blocking key configuration, click the Analyze button. This process can take several minutes for large datasets.

Once the analysis is complete, you will see a table containing a summary of the performance of each of your blocking keys.

  • The cost column shows the number of score pairs which were generated by this blocking key but not necessarily the number of actual comparisons as record pairs may already have been matched by another blocking key.
  • The Matches not found by other keys column shows the number of matched records which were only found by this blocking key - if a value of 0 is shown, it means that other blocking keys matched all the records that this blocking key also matched.
  • The Overlaps with other keys section of the table shows which blocking keys matched the same records and the number of records that they both matched.

To determine which blocking keys are the best candidates for optimization, we suggest looking at the cost to matches not found by other keys ratio and starting with the largest values. If your blocking key is generating a large number of score pairs but few score pairs not found by other blocking keys, it may be worth making that blocking key more specific. Note that there isn't a linear relationship between the number of score pairs and overall processing time so reducing the number of score pairs by 50% is unlikely to result in a 50% drop in processing time.

Possible improvement

The possible improvement section lists changes to your existing blocking keys to make them more efficient. Each blocking key suggestion has a Description column, which describes the change made to the original blocking key, a Definition column, which contains the modified version of the blocking key, and a further four columns, which show the effectiveness and efficiency of the suggested blocking key.

To apply a suggested blocking key change, replace the existing elementSpecifications value for the specified blocking key with the suggestion given in the Definition column. For example, if your blocking keys contain the following:

{
    "description": "FullPostcode",
    "elementSpecifications": [
        {
            "elementType": "POSTCODE",
            "elementModifiers": [
                "STANDARDSPELLING"
            ],
            "includeFromNChars": 5,
            "truncateToNChars": 7
        }
    ]
}

And you want to apply the following suggestion to add surname to your FullPostcode key:

[
    {
        "elementType": "POSTCODE",
        "elementModifiers": [
            "STANDARDSPELLING"
        ],
        "includeFromNChars": 5,
        "truncateToNChars": 7
    },
    {
        "elementType": "SURNAME",
        "algorithm": {
            "name": "DOUBLE_METAPHONE"
        },
        "includeFromNChars": 1,
        "truncateToNChars": 10
    }
]

Then you would replace the value of elementSpecifications in your FullPostcode key with the Definition given and update the description to reflect the change:

{
    "description": "FullPostcode+Surname",
    "elementSpecifications": [
        {
            "elementType": "POSTCODE",
            "elementModifiers": [
                "STANDARDSPELLING"
            ],
            "includeFromNChars": 5,
            "truncateToNChars": 7
        },
        {
            "elementType": "SURNAME",
            "algorithm": {
                "name": "DOUBLE_METAPHONE"
            },
            "includeFromNChars": 1,
            "truncateToNChars": 10
        }
    ]
}