Export and update data

Data Studio gives you the ability to export your cleansed data to a file or a database.

This step allows you to export your Workflow's output data to a file or an External system that uses a JDBC connection.

Each Workflow can contain multiple Export steps, so you can export data at any point. Each of these steps can be configured individually.

Click Export Settings in the step dialog to specify where and how the data will be exported.

Exporting to a file

  • Specify the Exporter (.csv, .txt, .xls, .xlsx or .json).
  • Choose the Target System - the export location. The default Server export directory will export files into the appropriate Space's export directory on the server. You can also export to an external file system (Apache Hadoop HDFS, Amazon S3, Microsoft Azure Blob Storage, or SFTP server) if one is configured.
  • Enter the name of the file or assign a Workflow parameter for Filename.
    • The Filename is made up of several components. By default, it will have two components; one for the name of the file and a second one for a timestamp.
    • The timestamp is the date and time at the point that the Workflow is executed.
    • You can add new components by using @ on your keyboard to select and enter a template item.
    • Components can be deleted by removing the template directly from the Filename.
    • Each component can either be custom text or a timestamp in one of the available datetime formats.
    • An example of the resulting filename is shown as you build up the filename components.
  • To remove the default extension (such as .csv) from exported files, un-select Include default file extension.
    • You can export files with a custom extension by excluding the default and specifying a custom extension at end of the Filename.
  • Enable Overwrite existing to allow a file with the same name to be exported more than once (the latest copy will replace the previous one). If this option is disabled, you will get an error when trying to export to the same file name more than once.
  • Select Compression or encryption to export the output as one of the following:
    • None for a normal export file
    • Compressed only to export as a compressed file. Currently, only the .zip files are supported. By default, the Compressed filename will have the same filename as defined by the compression type. You can change this if required.
      • To add new components, use @ on your keyboard to select and enter a template item.
      • To delete components, remove the template directly from the Compressed filename.
      • Each component can either be custom text or a timestamp in one of the available datetime formats.
      • An example of the resulting filename is shown as you build up the compressed filename components.
    • Compress and encrypt to export as an encrypted file. By default, the Encrypted filename will have the same filename as defined by the encryption type. You can change this if required.
      • Select an existing key based on the Encryption key name. Find out how to add a new encryption key.
      • To add new components, use @ on your keyboard to select and enter a template item.
      • To delete components, remove the template directly from the Encrypted filename.
      • Each component can either be custom text or a timestamp in one of the available datetime formats.
      • An example of the resulting filename is shown as you build up the encrypted filename components.
  • If the Include column names is checked, the first row in the exported file will contain column names. This doesn't apply to .json files.
  • Column name style determines how the column names will be displayed in the export file:
    • None - the column name as it appears in Data Studio.
    • Database style - the column name will be in upper case and underscores will be used to separate words.
    • Lower case - the column name will be in lower case.
    • Upper case - the column name will be in upper case.
    • Humanized - the first word will be capitalized and underscores turned into spaces.
  • Specify the Character set (most sets are supported).
Custom settings for CSV files

You can specify a delimiter and filename extension for CSV files (created in an Export step or downloaded from a grid of data):

  1. Navigate to the installation folder (by default C:\Program Files\Experian\Aperture Data Studio {version}) and open the server.properties file.

  2. On a new line in the file, add the setting you want to modify:

    Setting name Default value Description
    Data.csvDelimiter , Any string; special keywords can be used (see below)
    Data.csvFilenameExtension csv All non alphanumeric characters will be removed
    Keyword Value
    BAR ¦
    COLON :
    COMMA ,
    NONE \O
    PIPE |
    SEMICOLON ;
    SPACE ' '
    TAB \t

    For example, to change the delimiter to tab, the entry in server.properties should be:

    Data.csvDelimiter=TAB
    

Exporting to an External System using JDBC

  • Select JDBC as the Exporter.
  • Choose a DBMS system from the list of available ones. This is a list of the External System connections configured in Data Studio that use a JDBC driver.
  • Choose a Credential for authenticating with the DBMS. These credentials will have been created alongside the External System definition.
  • Select the target table from the list in Table name. These are the tables that you can access in the target DBMS using the selected credential.
  • Mode determines how the data will be exported into the target system:
    • Insert: New rows are inserted into the table. The equivalent SQL statement is:
      "INSERT INTO tablename (column1, …) VALUES (value1, …)"
    • Update: Existing rows are updated. The mapping dialog will show which source columns are mapped to which target columns and which ones have been specified as keys. You can change the mapping and keys at this stage. The keys referred to are the columns that will appear in the equivalent SQL WHERE clause:
      "UPDATE tablename SET (column1=value1, …) WHERE (columnName=value, … )"
    • Delete: Rows are deleted. The keys are the column names used in the SQL WHERE clause:"DELETE FROM tablename WHERE columnName=value, … "
    • Insert or update (if insert fails): If a row insert fails (there is a key collision), then the statement will be retried as an update.
    • Update or insert (if update fails): If the update fails (no record found with the given key), then the statement will be retried as an insert.
  • For Insert mode, you have the option to Truncate table before carrying out the row inserts.
  • For all modes other than Insert, you will need to select one or more Key columns from those in your Workflow data. These columns need to match the key fields in your selected target table.
  • Select Atomic database update if the export should be performed in a single transaction. In this mode it is possible that the database may lock table(s) for the duration of the update. This may be for some considerable time for large exports.
  • Select Continue Workflow execution on JDBC exports error if you want the Workflow to continue to execute when there is any INSERT, UPDATE or DELETE errors. This option is unchecked by default. If this option is selected, two new output nodes will appear on the step: Show passing rows and Show failing rows.
    • Show passing rows contains the rows that successfully executed.
    • Show failing rows contains the rows that failed to execute.
  • Commit batch size: Commit a batch after the specified number of records have been sent to the database. This setting is only available when Atomic database update is unset. Note that the final batch usually has fewer records.
  • Pre SQL: Optionally enter a SQL statement that will be executed before the export. The SQL may contain any valid statements and will be executed on the target database.
  • Post SQL: Optionally enter a SQL statement, similar to Pre SQL but which will be executed after the export has completed.