Before being able to use the Find duplicates workbench, you will need to have a duplicate store which stores the results from the Find duplicates step.

To access the workbench, open your web browser and go to http://{server name}:{port} where server name is the same server name that you use to access Data Studio and the port is the port number set during installation.

Windows

On Windows, the default TCP/IP port number for Find Duplicates workbench is 26312. If the workbench is installed on your local machine, you can use the shortcut on your desktop or if you haven't changed the default port, go to: http://localhost:26312/.

Linux

On Linux, the default TCP/IP port number for Find Duplicates workbench is 8090. If the workbench is installed on your local machine, you can use the shortcut on your desktop or if you haven't changed the default port, go to: http://localhost:8090/. To change the default port (e.g. TCP/IP port 26312), modify the YAML configuration file.

Select the duplicate store

To begin using the workbench, you will first be prompted to enter the full path to the duplicate store. We strongly suggest creating a copy of the duplicate store and using the copy instead of the original to avoid problems with file access. The duplicate store path that you specify must be accessible to the machine that the workbench is installed on.

If your duplicate store is encrypted, press the Select encryption key button and select the file containing the encryption key.

Lucene queries

By default, the workbench will perform a simple text search on any search terms that you specify. If you wish to use more advanced search terms, enable the Lucene query syntax searching option which will allow you to use Lucene query parser syntax when searching for records. When using Lucene query syntax, you should use the input dataset column names in lowercase as field names.

Using the workbench

Once you have loaded the duplicate store into the workbench, you will be able to navigate to any of the following four tabs depending on what action you would like to perform:

While using the workbench, you can change the blocking keys and rules at any time by clicking the Change settings dropdown in the top right of the workbench.

To load a new duplicate store, click Change settings > Change duplicate store.