Change Dataset schema

You may want to change the Dataset schema after creating it. You can do it manually or automatically.

Manual schema changes

You can apply schema changes to a Dataset when uploading new or refreshing data:

  1. Go to Datasets.
  2. Click the required Dataset's Options.
  3. Select Upload new data/Refresh data (the option will depend on whether the selected Dataset was created from a file upload or an external system connection).
  4. (For file upload Datasets) Click Browse for file and select the required file.
  5. Depending on the compatibility of the new data, you'll see the following options:
  • If compatible, you'll have the option to click Apply and load your data immediately, or optionally Configure the Dataset such as edit any column annotations or parser settings.
  • If incompatible, you must click Configure to review and resolve any differences before the data can be added to the Dataset.

Proceed to configure the Dataset. You'll be presented with a consolidated list of existing, new and missing columns.

To completely exclude a column, tick the Exclude checkbox: new columns will not be added and existing/missing ones will be permanently removed from the Dataset.

Optional columns

Set all columns that may be missing in future uploads or data refreshes as optional. When manually uploading/refreshing data, you'll be notified but able to continue. Missing optional columns will not prevent files being loaded to Dropzones or Workflows being run with the Auto-refresh sources enabled. All missing optional columns will be populated with empty data.

Automatic schema changes

  • Dropzone files
    You can add new data to a Dataset by copying files to the designated Dropzone. If no issues are found, the data will be parsed and uploaded. The result of the upload will fire any related events.
  • Auto-refresh data sources
    New data can be added to a Dataset during the auto-refresh process. Any changes detected will fire related events.

Schema change Events

Data Studio will automatically create the following events (which you can set events for):

  • Mandatory columns which are present in the Dataset but missing from the new data will fire a Dataset automatic load failed event. Your data will not be loaded.
  • Optional columns which are present in the Dataset but missing from the new data will not prevent loading, if there are no other errors. These missing columns will appear blank in the Dataset.
  • Columns in the new data which are not present in the Dataset will fire a Dataset automatic load warning. Your data will load, if there are no other errors. These columns will not be automatically added to the Dataset.
  • When your data loads, a Dataset loaded event will be fired.

Snapshot schema changes

You may want to update the schema if any columns have been added or removed when writing data to Snapshots.

Updating Snapshot schema

When the Take Snapshot step's input schema doesn't match the selected target Dataset schema, a warning will be shown. Click Update schema: {X} schema change(s) to open the Update Schema dialog (where {X} is the number of changes found) to see a consolidated list of existing, new and missing columns.

Review and triage the changes before they take effect:

  • If new columns are added to the input schema, the Take Snapshot step will display a warning. You can Include or Exclude missing column(s) in/from the target Dataset. Until this is done, the step will execute but only write the old columns' data.
  • If columns have been removed from the input schema, the Snapshot step will display a warning and will become invalid, causing Workflow execution to fail. You'll have to either change the input or resolve the warning manually.