Address searching process

Before Batch API can search on an address, it needs to know what level of searching to undertake, and how to return any matches that it finds. You will need to configure the option speifications in the configuration file.

The configuration file contains many settings which govern the basic processing that Batch API performs and allow you to define options such as the default dataset, cleaning options, and how the output address should look. The main configuration file used within the API for these settings is called qaworld.ini.

Batch API goes through a complex process when it attempts to match your address against the dataset(s). Understanding the process helps you to get the most out of the Batch API.

Matching keys

If you are using Batch API with the AddressBase® Premium dataset, Batch API can perform key matching against your input data before carrying out the address matching process. This can potentially improve the confidence level of any address matches obtained. Key matching can be carried out against the following two types of data:

  • Unique Property Reference Numbers (UPRNs)
  • Unique Delivery Point Reference Numbers (UDPRNs)

In order to match this information against AddressBase® Premium data, you must first specify which fields in your input data contain UPRNs or UDPRNs. See Set the input address format for information on how to do this. Once key matching has been completed, Batch API begins the normal address matching process.

Address matching

The Batch API process consists of five stages:

  • Stage 1: Pre-process address
  • Stage 2: Match country
  • Stage 3: Match Street, Organization, PO Box and Place
  • Stage 4: Match Premises
  • Stage 5: Select Best Match

Diagram showing all 5 stages

Batch API may return an address as unmatched if the place and street are matched, but the premises are not matched.

The first thing Batch API does is attempt to put the input address into a standard format. The input address has been submitted as a single line, with address elements separated by commas. For example:

3 Mornington Mews, County Grove, London,SE5

Batch API splits this address at the position of each comma so that the address looks like this:

3 Mornington Mews
County Grove
London
SE5

Once it has completed its formatting, Batch API tries to identify the country that the input address relates to. Batch API does this by matching the contents of the last two lines against a list of countries, ignoring non-alphabetic characters.

If a country is identified in the input address, Batch API goes on to verify (Match success) that the relevant dataset is installed. If it is not, Batch API will mark the address as 'Country not available' and stops the search.

If no country is found in the address, then Batch API tries to move on to the next stage using the default dataset. If no default dataset is set, Batch API will reject the address as 'Unidentified country' and stop the search.

Depending on the country, Batch API may expand street abbreviations, which means that all street descriptors such as 'Rd' or 'Ave' are expanded to 'Road' and 'Avenue', so that they match the descriptors in the dataset.

If you have specified that address fields occur on particular input lines, then Batch API will use these to help decide which elements it can match with.

If you have not made any specifications of this type, Batch API will make some assumptions, in particular that a place or a postal code will not occur in the first address field and that street elements will always occur before place elements. Batch API will make one or more attempts to locate a valid sequence of street and place combinations in the address. At the same time, if Batch API can locate PO box or organization names along with a valid place, it will take these to be potential valid matches.

Batch API will also break single words out of address lines in order to locate the best combination of elements for matching.

By this stage, Batch API has matched the input address as far as the street. To find the full verified address it also needs to match the property information.

After matching a place and a street, Batch API compares the property information in the input address against all the premises in the dataset for that street. If no match is achieved, the input address is marked as partial address found.

Batch API now retrieves the full verified address and assigns it a 'quality' score by comparing it with the original input address.

During this comparison, Batch API evaluates a number of matching rules and assigns the match a score. If there is more than one match, this process is repeated for all the matches. If there are two or more matches that have the highest score, Batch API marks the input address as either 'Partial address' or 'Multiple match', depending on the matching rules that were passed.

Batch API

API Process