Trigger workflow executions

Workflow trigger execution is the process of triggering the execution of a workflow when a 'watched' file changes. This allows for the unattended execution of workflows when a new version of a file is copied/uploaded.

This process is driven by a configuration file (written in YAML) which, when uploaded to Data Studio, will cause certain source files to be monitored such that when they change, the specified workflows are executed.

The following sample YAML will be used to illustrate the intended behavior:

---
workflows:
  - name: Data Validation
  - sourceTriggers:
    - source: Customer v1
      Location: C:\ApertureDataStudio\sampledata
      filenamePattern: Customer V\d+\.csv
      appendExtension: .tmp.#
      replaceSource: true

The order of the keywords is hierarchical: the file has to start with three dashes (---) followed by the keyword workflows:. Next, indented with two spaces and a dash is the keyword name: which is the (case-sensitive) workflow name. Lastly, the indented keyword sourceTriggers: starts a new section used to define various parameters (using sub-keywords) for one or more source triggers.

Multiple source triggers are supported, however, the trigger is the logical OR of the data source changes (not AND) meaning that only one data source file has to change in order to trigger the workflow execution.

The parameters for the workflows keyword are:

  • name (required)
    The (case sensitive) name of the workflow that will be triggered when the watched file or files change or appear.

The parameters for the sourceTriggers keyword are:

  • source (required)
    The name of the source used in the workflow to be triggered. This should be the name as shown in the Available data sources tab in the Workflow Designer (rather than the name of the file on disk - the file extension and underscore characters shouldn't appear).

  • location (required)
    This defines the both the location of the source file named in the previous parameter and the location in which the watched file will reside. It will be either:

    • the name of a file data store defined in filedatastores.properties or

    • the directory path location on disk where the file is stored

    For example, where my filedatastores.properties file defines a data store as: Sample\u0020Data\u0020source=c\:/ApertureDataStudio/sampledata.

    The location may be specified as Sample Data Source or as C:\aperturedatastuio\sampledata. A file path may be defined for the user's import directory (e.g. C:\ApertureDataStudio\import\5).

  • filenamePattern (required)
    Defines the name of the file(s) that have to be watched. You may use a Regular Expression that has to match a file used as a source input. Note that on Linux file names are case sensitive.
    For example, Customer V\d+\.csv indicates that any .csv file named Customer V{n} (where {n} is an integer number) will be used as a trigger. In this case, all of the following will be valid trigger files: Customer V1.csv, Customer V2.csv, Customer V99.csv.

  • replaceSource (optional, defaults to false if not present)
    This is a Boolean value that determines whether the watched file will be used as the new source for the triggered workflow when the watched file has the defined filename pattern but a different name from what is defined in the source keyword. Use replaceSource=true if you want the watched file that triggers the workflow to be used as the source when the workflow is automatically executed.

  • appendExtension (optional)
    An optional parameter indicating that if the trigger file has the same name as the original source, the original file will be backed up rather than overwritten. The backed up file will be written to the \data\backups folder in the Data Studio database and appended with the given extension.
    If the extension contains a #, the # will be replaced by a number to ensure the uniqueness of the renamed files.

This example defines multiple workflows with multiple sources. The first workflow defined in this configuration file has two source triggers defined, meaning the workflow will be executed is either source is updated. The second workflow's source trigger has another simple filename pattern defined, to detect files arriving that have been timestamped in the format yyyymmdd.

workflows:
  - name: My First Workflow
  - sourceTriggers:
    - source: Customer Data
      location: c:\data\customer
      filenamePattern: customer.csv
    - source: Product Data
      location: c:\data\product
      filenamePattern: product_\d+\.csv
  - name: My Second Workflow
  - sourceTriggers:
    - source: Order Data
      location: c:\data\order
      filenamePattern: product_20[0-9]{6}.csv
      replaceSource: true

You can use filedatastores.properties to define folders on the server that are visible to Data Studio users via the UI.

Any location defined in this file can store workflow sources and be used as a location for watched files. Here's an example of a filedatastores.properties file:

Sample\u0020Data\u0020source=c\:/ApertureDataStudio/sampledata
Admin\u0020Data\u0020source=c\:/ApertureDataStudio/import/5
Trainee\u0020User\u0020source=c\:/ApertureDataStudio/import/1058
Trigger\u0020source=d\:/ApertureDataStudio/triggers; flatten=true

You upload the YAML file in the same way as any other data file. The only requirement is that the file has to have the .yaml extension.

When a YAML file is uploaded, the FileUploadHandler will parse the file and report any parse errors to the user. The contents of the YAML file will replace the previous upload.

Therefore, to delete a workflow, remove the workflow name from the YAML file and re-upload it. To add a workflow, add it to the YAML file and re-upload it. To modify a workflow, change the details in the YAML file and re-upload it.

All existing triggers can be removed by uploading a YAML file with no workflows:

---
workflows:

The workflow entries in the YAML file will be checked for:

  • The name matches a known workflow (case dependent)
  • The source matches a known source name within the workflow
  • The location matches a data store name or directory that is known to the server and exists on disk
  • The filenamePattern is a valid regular expression
  • There are no invalid keywords in the YAML file

This file will be loaded at server startup so that if the server is shut down and restarts, all previously watched workflows will be reloaded just as if the user had reloaded the YAML file.

All YAML file uploads and the resulting parse actions are reported back to the user in the UI and in the server's log file. The uploads and all workflow executions are audited as usual.

The administrator may verify that the correct workflow triggers have been loaded by clicking on the username in the top menu and selecting Show Workflow Triggers.

When uploading a file with the same name as any of the 'watched' files, you will get an option to either overwrite or create a new version of the file.

To ensure the defined trigger continues to work, you have to overwrite the existing file.

A dialog will appear when the job has completed successfully.

When manually uploading a file with the same name as any of the 'watched' files, you will get an option to either overwrite or create a new version of the file.

To ensure the defined trigger continues to work, you have to overwrite the existing file.

A dialog will appear when the job has completed successfully.

You can set up notifications to report on the state of the triggered workflow.

Because workflows are executed asynchronously, if the user is currently logged in, they will see the Job Completed dialog once a workflow has completed executing.