Assess the quality of your data by defining rules.
To perform validation, you have to set validation rule(s). In the step dialog, click Rules to either:
If you're creating a new rule, click Add group first. Enter the required details - including the pass and fail values and click Add
To view validation results, click on the required option:
If you are viewing Results by rule or Results by group, you can view the passing or failing rows for that rule/rule group using the right-click options or the toolbar buttons.
|Show group name in rule results||Can be disabled if the group name is not required e.g. all rules belong to a single group|
|Show rule results in failing rows||Can be disabled if failure information per rule is not required in final output|
To further improve data quality, Data Studio supports the capturing of lineage metadata of data sources. Lineage metadata can be included in workflow outputs and subsequently be used for further integration or processing.
In the Validate step, lineage metadata can be selected to be included in the output using the dropdown chooser provided under "More Options". The metadata is included in two outputs for the Validate step:
The results of the validation can be saved to a dataset by including a Take Snapshot step in your workflow. Furthermore, if the Interactive (with drill down) option is selected when creating a new Dataset on the Take Snapshot step, you can drill in to the data to view the passing or failing rows for the rule/rule groups in the dataset. Interactivity is supported for the Results by rule and Results by group outputs.
Validate emails based on the format or domain address.
Select the Email column and pick one of the two available Validation type options:
Format Check: Checks whether the value matched a valid email format. Returns either true or false.
Examples of valid and invalid email formats:
Only one email can be validated at once; lists of emails as seen in the last example will be rejected.
Domain Level: Checks whether the value has a domain that exists and is an email server. This option returns both an overall validity result (true or false) in the Email domain: Result column, and additional information in the Email domain: Error column describing the reason for failure. The possible outcomes are:
|Valid||True||Domain exists and is a mail server.|
|Bad format||False||Email format is invalid.|
|Invalid domain||False||Domain validation check failed. The domain may not exist, or may have been flagged as illegitimate, disposable, harmful, nondeterministic or unverifiable.|
|Invalid name||False||Local part validation failed. For example it may have been identified as a spam trap or role account such as "email@example.com".|
|Failed to perform DNS lookup||False||An error occurred when attempting to perform the DNS lookup.|
Domain level validation results are cached with results refreshed every 30 days. The cache validity is configurable in Settings > Workflow steps by changing the Email validation cache validity setting.
Click Show step results to view the results.
Validate global phone numbers.
Once connected to the data, you have to specify the following:
Click Show step results to view the results. The following columns will be appended to your data:
Use this step to validate and enrich addresses in bulk using Experian Batch, depending on your license.
If your data has address columns tagged already, this step will automatically pick up all the columns tagged as addresses and list as Selected columns.
To enrich valid addresses, choose one of the available Additional datasets. The additional datasets that are available to you will depend on your license.
Using Additional options you can specify how the validated addresses will be returned:
Find out how to configure Experian Batch for the Validate addresses step.
An address cleansed in Data Studio will result in one of the following possible results:
|Verified Correct||Experian Batch verified the input address as a good-quality match to a complete address. No corrections or formatting changes were necessary.|
|Good Match||Experian Batch verified the input address as a good-quality match to a complete address, although minor corrections or formatting changes may have been applied.|
|Good Premise Partial||Experian Batch was not able to find a full match to a correct address, but found a good match to premise level by excluding organization or sub-premise details.|
|Tentative Match||Experian Batch found a match to a complete address, but the overall differences between the input and cleaned addresses are significant enough to reduce the confidence in the match.|
|Multiple Matches||Experian Batch found more than one correct address which matched the input address. This means that no single address could be matched with high confidence.|
|Poor Match||Experian Batch found a match to an address, but with low confidence. This often means that the cleaned address is not deliverable.|
|Partial Match||Experian Batch was unable to find a full correct address which matched the input address. This often occurs when the property number is missing from the input address.|
|Foreign Address||Experian Batch could not find a matching address because the input address referred to a different country.|
|Unmatched||Experian Batch was unable to match the input address to any correct address.|
|Processing Failure||The address input has not returned any results and may be of a bad format. Please report this to whoever manages Aperture Data Studio for your organization.|