Validate and enrich data

Assess the quality of your data by defining rules.

To perform validation, you have to set validation rule(s). In the step dialog, click Rules to either:

  • select from the list of available rules or
  • create a new rule and apply it

If you're creating a new rule, click Add group first. Enter the required details - including the pass and fail values and click Add

To view validation results, click on the required option:

  • Show passing rows
  • Show failing rows
  • Show all rows
  • Results by rule
  • Results by rule for analysis
  • Results by group
  • Results by group for analysis

If you are viewing Results by rule or Results by group, you can view the passing or failing rows for that rule/rule group using the right-click options or the toolbar buttons.

More options

Show group name in rule results Can be disabled if the group name is not required e.g. all rules belong to a single group
Show rule results in failing rows Can be disabled if failure information per rule is not required in final output

Lineage metadata

To further improve data quality, Data Studio supports the capturing of lineage metadata of data sources. Lineage metadata can be included in workflow outputs and subsequently be used for further integration or processing.

In the Validate step, lineage metadata can be selected to be included in the output using the dropdown chooser provided under "More Options". The metadata is included in two outputs for the Validate step:

  • Show results by rule
  • Show results by group

Interactive Datasets

The results of the validation can be saved to a dataset by including a Take Snapshot step in your workflow. Furthermore, if the Interactive (with drill down) option is selected when creating a new Dataset on the Take Snapshot step, you can drill in to the data to view the passing or failing rows for the rule/rule groups in the dataset. Interactivity is supported for the Results by rule and Results by group outputs.

Validate emails based on the format or domain address.

Select the Email column and pick one of the two available Validation type options:

  • Format Check: Checks whether the value matched a valid email format. Returns either true or false.
    Examples of valid and invalid email formats:

    Format Result
    info@gmail.com Valid
    first.second-name@gmail.com Valid
    first.name+tag@gmail.com Valid
    name@info@gmail.com Invalid
    name"not"right@test.com Invalid
    another.test.com Invalid
    name@incorrect_domain.com Invalid
    com.domain@name Invalid
    first_address@gmail.com, second_address@gmail.com Invalid

    Only one email can be validated at once; lists of emails as seen in the last example will be rejected.

  • Domain Level: Checks whether the value has a domain that exists and is an email server. This option returns both an overall validity result (true or false) in the Email domain: Result column, and additional information in the Email domain: Error column describing the reason for failure. The possible outcomes are:

    Error Result Description
    Valid True Domain exists and is a mail server.
    Bad format False Email format is invalid.
    Invalid domain False Domain validation check failed. The domain may not exist, or may have been flagged as illegitimate, disposable, harmful, nondeterministic or unverifiable.
    Invalid name False Local part validation failed. For example it may have been identified as a spam trap or role account such as "admin@server.com".
    Failed to perform DNS lookup False An error occurred when attempting to perform the DNS lookup.

Domain level validation results are cached with results refreshed every 30 days. The cache validity is configurable in Settings > Workflow steps by changing the Email validation cache validity setting.

Click Show step results to view the results.

Validate global phone numbers.

Once connected to the data, you have to specify the following:

  • Select phone column - specify the column containing the phone numbers you want to validate
  • Select country - pick the country to use for phone validation. You can either select a single country or pick a column with country data. For the latter, please ensure that the country names adhere to ISO 3166-1 alpha-2, ISO 3166-1 alpha-3 or ISO3166 country name standards.
  • Select validation type - choose the validation rule(s) that will be applied:
    • Valid phone: shows True for phone numbers that have been successfully validated against the selected country and False otherwise.
    • Valid phone region: shows True for phone numbers that have been successfully validated against the region of the selected country and False otherwise.
    • Possible phone : shows True for possible phone numbers that have been successfully validated against the selected country and False otherwise.
    • Invalid phone: shows True for phone numbers that have been unsuccessfully validated against the selected country and False otherwise.
    • Invalid phone region: shows True for phone numbers that have been unsuccessfully validated against the region of the selected country and False otherwise.
    • Not possible phone : shows True for not possible phone numbers that have been successfully validated against the selected country and False otherwise.

Click Show step results to view the results. The following columns will be appended to your data:

  • Validation results – shows the result of the validation rule (one column per each applied rule)
  • Phone Country - shows the country for the validated phone number.
  • Phone Number Type – shows the phone type (e.g. mobile or fixed line).

Use this step to validate and enrich addresses in bulk using Experian Batch, depending on your license.

If your data has address columns tagged already, this step will automatically pick up all the columns tagged as addresses and list as Selected columns.

To enrich valid addresses, choose one of the available Additional datasets. The additional datasets that are available to you will depend on your license.

Using Additional options you can specify how the validated addresses will be returned:

  • Output columns - This defines how a cleansed address will be returned from the Validate Addresses step: How many columns there will be, and which address elements will go in each column, and what additional formatting (for example casing or truncation) will be applied.
    • Standard (7-column layout).
    • Component (28-column layout).
    • One custom layout can also be defined for each country.
  • Results columns - This defines what information is returned about how the address was cleansed. You can ether return a simple summary of the cleaning action (Good Match, Unmatched, and so on), or a much more detailed breakdown of the match code.
    • Standard (returns the result code).
    • Detailed (returns the result code and additional metadata including the match success and the confidence of the match)

Find out how to configure Experian Batch for the Validate addresses step.

Result codes

An address cleansed in Data Studio will result in one of the following possible results:

Validation result Description
Verified Correct Experian Batch verified the input address as a good-quality match to a complete address. No corrections or formatting changes were necessary.
Good Match Experian Batch verified the input address as a good-quality match to a complete address, although minor corrections or formatting changes may have been applied.
Good Premise Partial Experian Batch was not able to find a full match to a correct address, but found a good match to premise level by excluding organization or sub-premise details.
Tentative Match Experian Batch found a match to a complete address, but the overall differences between the input and cleaned addresses are significant enough to reduce the confidence in the match.
Multiple Matches Experian Batch found more than one correct address which matched the input address. This means that no single address could be matched with high confidence.
Poor Match Experian Batch found a match to an address, but with low confidence. This often means that the cleaned address is not deliverable.
Partial Match Experian Batch was unable to find a full correct address which matched the input address. This often occurs when the property number is missing from the input address.
Foreign Address Experian Batch could not find a matching address because the input address referred to a different country.
Unmatched Experian Batch was unable to match the input address to any correct address.
Processing Failure The address input has not returned any results and may be of a bad format. Please report this to whoever manages Aperture Data Studio for your organization.