Snapshots are copies of data that allow you to track changes over time or store your results for use elsewhere. A Snapshot is a type of Dataset that's created by and stored in Data Studio.

Taking Snapshots

Connect the Take Snapshot step's input to the data you'd like to capture in a Snapshot. When the Workfow is run, a new batch of data will be created.

This data can then be added to an existing or new Dataset:

  • Select an existing one from the Dataset list.

  • Use Create new Dataset to create a new one.

    • Name - add a name as it will appear in Data Studio.
    • Summary - an optional short summary of the Dataset.
    • Description - an optional, longer description.
    • Dataset type - choose one of the options: Single batch will write the latest data to the Dataset and keep no history. Multi batch will retain older data in the Dataset allowing for trend analysis.
    • Interactivity - This option will only be available if you're making a copy of Results by rule or Results by group from the Validate step. Selecting Interactive (with drill down) will result in a Dataset that allows users to drill into the underlying data and view the passing/failing rows for the rules/rule groups.
    • Add Batch Timestamp Column - When selected, an additional column will be added to the Dataset, recording the timestamp that the batch was created. This option is useful for trend analysis.
    • Allow automatic batch deletion - This will ensure that data batches which have been used in a Workflow are deleted after they have been processed (i.e. after the Workflow using those batches has been run). This option is intended to be used so that no batch is processed through a Workflow twice. It's used in conjunction with the Delete batches on completion setting in the Source step.
    • Allow auto-refresh - Not applicable to Snapshots.
    • Publish to ODBC - Make the Dataset visible through ODBC connections.
    • Compression level - Choose one of the options: Row-based to store data by row, Column-based to store data by column or None to not use compression for this Snapshot. The value selected by default is determined by what's been specified in Settings > Performance > Compression. Find out more about the compression levels.

Using Snapshots

Snapshots can be used like other Datasets:

  • As a Workflow source - add a Source step to a Workflow and select the required Snapshot Dataset.
  • As a View source - choose the required Snapshot Dataset when creating a View.

Snapshot schema changes

You may want to update the schema if columns have been added/removed when writing data to a Snapshot.

Resolving Snapshot warnings

A summary of the unresolved changes will appear as a warning on the step. To fix this, you can change the input or resolve the conflicts. Click Update schema: {X} schema change(s) to open the Update Schema dialog (where {X} is the number of changes found) to see a consolidated list of existing, new and missing columns.

You can now:

  • Include or Exclude new columns in the Snapshot's schema. You can change this at any point later.
  • Set all missing mandatory columns as Optional. Optional columns will be retained and filled with empty values in the target Dataset.

Click Apply to update the schema. The step will become valid and you'll be able view the step results/run the Workflow.