Access lineage metadata

To further improve data quality, Data Studio can capture lineage metadata. This ensures traceability for a variety of purposes including data governance, auditing and root cause analysis.

Lineage metadata refers to information about the origin of the batches of data in a Dataset, its transformations and characteristics. This can be included in Workflow outputs and subsequently used for further integration or processing, allowing the metadata to be treated as data for in depth analysis and record management.

Default lineage metadata

Each Dataset has metadata associated with it which may be automatically captured or defined by the user. The following elements will be captured automatically:

  • Batch ID
  • Column name (except for the Source step)
  • Company
  • Credential
  • Database
  • Dataset
  • File Name
  • Host Name
  • Interface Name
  • Keyspace
  • Port
  • Project
  • Qualified table name
  • REST config file
  • REST sample endpoint
  • Schema
  • Server
  • SID
  • Source type
  • Sub filenames
  • System
  • Table name
  • Timestamp

For more details, see default metadata by source type.

Working with metadata

Lineage metadata can be included in the output of these Workflow steps:

To provide more granular control, metadata elements can be individually selected from a checkbox list, so that only relevant metadata is displayed. Each metadata property will be outputed as an individual additional column, styled differently to ensure it's not confused with data.

JDBC custom metadata

Data Studio supports JDBC connections and in addition to the metadata automatically captured about the source system when data is loaded, you can define custom metadata.

This can be configured in the customjdbc.json file or in datadirectJdbc.json for native drivers.

Custom metadata consists of string key/value pairs. When adding custom metadata, create a new connectionParam prefixed with CUSTOM_. This is required in order for it to appear as custom metadata in Data Studio.

You can define custom metadata at two different levels, when defining:

  • an External system (e.g. to label a system as "Development" or "Production")
  • a Dataset

If the same key is defined at both levels, the Dataset level definition will take precedence. Note that we don't support custom keys that are identical to any of the automatically-captured metadata properties.

Default lineage metadata (by source type)

Metadata Single file (e.g. CSV)
Batch ID Y
Column name (not Source step) Y
Company
Credential
Database
Dataset Y
File Name Y
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
REST configuration file
REST endpoint
Schema
Server
SID
Source type Y
Sub filenames
System
Table name
Timestamp Y
Metadata Multiple tab file (e.g. Excel)
Batch ID Y
Column name (not Source step) Y
Company
Credential
Database
Dataset Y
File Name Y
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name
REST configuration file
REST endpoint
Schema
Server
SID
Source type Y
Sub filenames Y
System
Table name
Timestamp Y
Metadata Oracle
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID Y
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata SQL Server
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata MySQL
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata DB2
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Informix
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server Y
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata PostgreSQL
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata OpenEdge
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Sysbase
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Hive
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Greenplum
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Salesforce
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata MongoDb
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Redshift
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Cassandra
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace Y
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata SparkSQL
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database Y
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port Y
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata ServiceCloud
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name Y
Interface Name Y
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Google BigQuery
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name
Interface Name
Keyspace
Port
Project Y
Qualified table name Y
REST configuration file
REST endpoint
Schema
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata AutonomousRestConnector
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file Y
REST endpoint Y
Schema
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Eloqua
Batch ID Y
Column name (not Source step) Y
Company Y
Credential Y
Database
Dataset Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata Jira
Batch ID Y
Column name (not Source step) Y
Company Y
Credential Y
Database
Dataset Y
File Name
Host Name
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y
Metadata SalesCloud
Batch ID Y
Column name (not Source step) Y
Company
Credential Y
Database
Dataset Y
File Name
Host Name Y
Interface Name
Keyspace
Port
Project
Qualified table name Y
REST configuration file
REST endpoint
Schema Y
Server
SID
Source type Y
Sub filenames
System Y
Table name Y
Timestamp Y