Common Workflow Language (CWL) Command Line Tool Description, v1.2 §

This version:

Latest stable version:

Authors:

Contributors to v1.2:

Incorporates the work of past authors and contributors to CWL v1.0 and CWL v1.1.

This standard was approved on 2020-08-07 by the CWL leadership team consisting of:

Publisher: Common Workflow Language project, a member project of Software Freedom Conservancy

Abstract §

A Command Line Tool is a non-interactive executable program that reads some input, performs a computation, and terminates after producing some output. Command line programs are a flexible unit of code sharing and reuse, unfortunately the syntax and input/output semantics among command line programs is extremely heterogeneous. A common layer for describing the syntax and semantics of programs can reduce this incidental complexity by providing a consistent way to connect programs together. This specification defines the Common Workflow Language (CWL) Command Line Tool Description, a vendor-neutral standard for describing the syntax and input/output semantics of command line programs.

Status of this document §

This document is the product of the Common Workflow Language working group. The source for the latest version of this document is available at

https://github.com/common-workflow-language/cwl-v1.2/

The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0.

Table of contents

1. Introduction §

The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility.

1.1 Introduction to the CWL Command Line Tool draft standard v1.2.1 §

This specification represents the latest development draft from the CWL project. There are no new features nor behavior changes in CWL v1.2.1 as compared to CWL v1.2. v1.2.1 fixes only typos, adds clarifications, and adds additional conformance tests. Some changes to the schema defining CWL have been changed to aid the auto-generation of libraries for the reading and writing of CWL documents.

Do not write cwlVersion: v1.2.1, nor cwlVersion: v1.2.0. The syntax meaning of all terms in CWL `1.2.x is the same. However, when reporting results from running the CWL conformance tests, please do report all three components; for example "99% of CWL v1.2.0 required tests" or "100% of CWL v1.2.1 required tests".

See also the Schema-Salad v1.2.1 changelog

1.2 Changelog for v1.2.1 §

  • CWL has been assigned an official IANA Media Type of application/cwl for either JSON or YAML format. For JSON formatted CWL documents, application/cwl+json has also been assigned and can be used. For specifying a YAML formatted CWL document, one can use application/cwl+yaml but that is not an official IANA media-type yet; as of 2023-07-23 the +yaml suffix has yet to be approved. The above has been documented in the Syntax section.
  • There is now an unofficial JSON Schema for CWL documents, donated by Francis Charette-Migneault. This schema captures much, but not all, of the potential complexity of CWL documents. It was created for the draft OGC API - Processes - Part 2: Deploy, Replace, Undeploy standard. To support the testing of this unofficial JSON Schema for CWL, some of the should_fail: true tests have had the label json_schema_invalid added.
  • For consistency, all references to URIs have been replaced with IRIs (Internationalized Resource Identifiers).
  • The difference between $() and ${} were clarified. We now make more explicit that ${...} evaluates to (function() { ... })().
  • The publisher of this document is now explicitly named; it is the Common Workflow Language project, a member project of Software Freedom Conservancy.
  • The Parameter References section has been updated to clarify ambiguity on null and array .length. Three conformance tests to verify this were added as well (params_broken_null, length_for_non_array, user_defined_length_in_parameter_reference).
  • It is now explicit in the description of the size field of a File object that size is measured in bytes, as was already stated in the introduction to the File object description.
  • The concept of "opaque identifier(s)"/"opaque strings" as mentioned in the SecondaryFileSchema, Parameter References, and Runtime Environment sections is now defined in the newly added glossary: they are nonsensical values that are swapped out with a real value later in the evaluation process. Workflow and tool expressions should not rely on it nor try to parse it.
  • The purpose and valid circumstances for using CommandLineTool.id has been made more explicit: It is a unique identifier for that CommandLineTool; Only useful for CommandLineTools in a $graph. This id value should not be exposed to users in graphical or terminal user interfaces.
  • How cwl.output.json is used to perform output binding has been clarified, especially with regard to any path and location fields for File and Directory objects referenced within. Two new conformance tests (json_output_path_relative, json_output_location_relative) have been added to verify these clarifications.
  • The BNF grammar description of CWL Parameter References has been reformatted so that symbols get code formatting. :: is replaced with ::= (meaning that the symbol on the left must be replaced with the expression on the right).
  • The example expansion of the stdin shortcut erroneously used ${ } instead of $( ); this has been corrected.

1.2.1 Clarifications to the schema in CWL v1.2.1 to aid autogenerated libraries §

Many different CWL parsers are autogenerated from the official CWL schema by using schema-salad --codegen.

In CWL v1.2.1 we made many clarifications to the schema to enable faster parsing; or to produce better results for end users. These changes do not change the CWL syntax or its meaning; we are just now modeling it better.

  • The schema for Requirements has changed to enable faster parsing by autogenerated libraries. The class field is now a static enum with a single permissible value instead of a generic string (for example: class: DockerRequirement for a DockerRequirement hint or requirement.) This allows for autogenerated CWL parsers to recognize any requirement immediately instead of having to check for matching field names and valid values, as was done previously.

  • Likewise, the schema for CommandLineTool; the class field is now a static enum with a single permissible value (class: CommandLineTool) instead of a generic string.

  • The schema for the CommandLineBinding.position field now has an explicit default value of 0. Previously this was only expression textually in the description of that field.

  • The schema for the File.streamable field now has an explicit default value of false to match the textual description.

    Note: Other fields like ResourceRequirement.coresMin, .coresMax, .ramMin, .ramMax, .tmpdirMin, .tmpdirMax have not had defaults set in the schema so that parsers can discriminate easily between a value not provided and the default value (0).

  • Everywhere the schema allows a value of type long we also explicitly allow a value of type int: File.size, ResourceRequirement.coresMin, ResourceRequirement.coresMax, ResourceRequirement.ramMin, ResourceRequirement.ramMax, ResourceRequirement.tmpdirMin, ResourceRequirement.tmpdirMax, ResourceRequirement.outdirMin, ResourceRequirement.outdirMax, and ToolTimeLimit.timelimt.

    By JSON rules this is implicit, but by making it explicit we aid autogenerated CWL libraries especially in languages such as Java.

  • The schema for the default field of CommandInputParameter has been expanded from Any? to ["null", File, Directory, Any] so that autogenerated CWL libraries will deserialize any 'File' or 'Directory' objects automatically for the user.

  • The schema for the hints field of CommandLineTool has been expanded from: Any[]? to ["null", { type: array, items: [ ProcessRequirement, Any] } ]. This allows autogenerated CWL parsers to deserialize any of the standard CWL hints instead of forcing the users of those parsers to convert the unserialized hints to normal objects themselves.

1.2.2 Updated Conformance Tests for v1.2.1 §

  • Conformance tests are now referred to by their textual identifiers (id). Previously this was the label field. Tests without a label/id have been given one.

  • tests/loadContents/cwloutput-nolimit.cwl: Made explicit that bigstring is an additional output as generated by the existing mkfilelist.py script's use of cwl.output.json.

  • The number of different software containers used in the conformance tests has been reduced to four. See the list in the CONFORMANCE_TESTS.md instructions.

  • tests/secondaryfiles/rename-inputs.cwl has been simplified by changing it to reference another input parameter instead of using self. The behavior of processing secondary files patterns in order and being able to reference earlier ones later is not part of CWL v1.2.

    This doesn't exactly replicate the previous behavior, because it introduces a new input parameter, however it does demonstrate the ability to rename a file and have it staged as a secondary file without having to use InitialWorkDirRequirement.

  • directory_input_docker was incorrectly marked as required, it is optional unless the feature ShellCommandRequirement is stated as being supported.

  • glob_outside_outputs_fails was incorrectly marked as required, it is optional unless the feature DockerRequirement is stated as being supported.

  • stage_file_array was incorrectly marked as required, it is optional unless both the features InlineJavascriptRequirement and DockerRequirement are stated as being supported.

  • stage_file_array_basename and stage_file_array_entryname_overrides were both incorrectly marked as required, they are optional unless the feature InitialWorkDirRequirement and InlineJavascriptRequirement are both stated as being supported.

  • optional_numerical_output_returns_0_not_null was incorrectly marked as required, it is optional unless InlineJavascriptRequirement is stated as being supported.

  • cwloutput_nolimit and loadcontents_limit were incorrectly marked as optional; this has been corrected as both are required.

  • Made it explicit that if a CommandLineTool contains logically chained commands (e.g. echo a && echo b) then the stdout File/object must include the output of every command. The stdout_chained_commands mandatory conformance test was added to verify this.

  • metadata: The description and content of this test has been updated to be more focused on the metadata present in the CWL file.

1.2.3 New Mandatory Conformance Tests for v1.2.1 §

  • params_broken_null: Test parameter reference that refers to null.something.
  • length_for_non_array: Test parameter reference that refers to length of non-array input.
  • user_defined_length_in_parameter_reference: Test 'length' in a parameter reference where it does not refer to length of an array input.
  • directory_literal_with_literal_file_in_subdir_nostdin: Test non-stdin reference to literal File via a nested Directory literal.
  • colon_in_paths: Confirm that colons are tolerated in input paths, string values, stdout shortcut, and output file & directory names.
  • colon_in_output_path: Confirm that colons are tolerated in output directory names.
  • record_with_default: Confirm that records with defaults are accepted.
  • record_outputeval_nojs: Use of outputEval on a record itself, not the fields of the record (without javascript). An equivalent test with InlineJavascriptRequirement was added as well: record_outputeval.
  • runtime-outdir: Test use of $(runtime.outdir) for outputBinding.glob.
  • stdout_chained_commands: Test that chaining two echo calls causes the workflow runner to emit the combined output to stdout. This is to confirm that the workflow runner will not create an expression such as echo a && echo b > out.txt, but instead will produce the correct echo a && echo b, and capture the output correctly.
  • record_order_with_input_bindings: Test sorting arguments at each level (inputBinding for all levels).
  • json_output_path_relative: Test support for reading cwl.output.json where 'path' is relative path in output dir.
  • json_output_location_relative: Test support for reading cwl.output.json where 'location' is relative reference to output dir.
  • filename_with_hash_mark: Test for correct handling of URL-quoting in a filename to refer to filename containing a hash mark.
  • capture_files: Test that a type error is raised if directories are returned by glob evaluation when type is File.
  • capture_dirs: Test that a type error is raised if files are returned by glob evaluation when type is Directory.
  • capture_files_and_dirs: Test that both files and directories are captured by glob evaluation when type is [Directory, File].
  • very_big_and_very_floats_nojs: Confirm that very big and very small numbers are represented on the command line using decimals, not scientific notation.

1.2.4 New Optional Conformance Tests for v1.2.1 §

1.2.4.1 InlineJavaScriptRequirement §

  • record_outputeval: Use of outputEval on a record itself, not the fields of the record.
  • staging-basename: Use of an ExpressionTool to change basename of file, then correctly staging the file using the new name.
  • js-input-record: A test case for JavaScript with an input record.
  • very_big_and_very_floats: Confirm that very big and very small numbers are represented on the command line using decimals, not scientific notation.

1.2.4.2 MultipleInputFeatureRequirement §

  • multiple-input-feature-requirement: MultipleInputFeatureRequirement on workflow outputs.

1.2.4.3 InitialWorkDirRequirement §

  • iwd-subdir: Test emitting a subdirectory from the initial working directory.

1.3 Introduction to the CWL Command Line Tool standard v1.2 §

Since the v1.1 release, v1.2 introduces the following updates to the CWL Command Line Tool standard. Documents should use cwlVersion: v1.2 to make use of new syntax and features introduced in v1.2. Existing v1.1 documents should be trivially updatable by changing cwlVersion, however CWL documents that relied on previously undefined or underspecified behavior may have slightly different behavior in v1.2.

1.4 Changelog §

  • coresMin and coresMax of ResourceRequirement may now request fractional CPUs.
  • ramMin, ramMax, tmpdirMin, tmpdirMax, outdirMin, and outdirMax of ResourceRequirement now accept floating point values.
  • CommandLineTool can now express intent with an identifier for the type of computational operation.
  • Added conformance tests for order of operations evaluating secondaryFiles on input and ensure that input and output secondaryFiles expressions can return a File object.
  • Clarify there are no limits on the size of file literal contents.
  • When using loadContents it now must fail when attempting to load a file greater than 64 KiB instead of silently truncating the data.
  • Objects, arrays and numbers returned by parameter references or expressions in Dirent.entry that are not a File or Directory object (or array of such) are now specified to be JSON serialized to produce file contents.
  • Note that only enum and record types can be typedef-ed
  • Added conformance tests for order of operations evaluating secondaryFiles on input and ensure that input and output secondaryFiles expressions can return a File object.
  • Escaping in string interpolation has been added to the specification along with conformance tests.
  • It is now possible to have an absolute path in the entryname field in InitialWorkDirRequirement when running in a mandatory container. Together with DockerRequirement.dockerOutputDirectory this it possible to control the locations of both input and output files when running in containers.
  • Specify default success/fail interpretation of exit codes when not given.

See also the CWL Workflow Description, v1.2 changelog. For other changes since CWL v1.0, see the CWL Command Line Tool Description, v1.1 changelog

1.5 Purpose §

Standalone programs are a flexible and interoperable form of code reuse. Unlike monolithic applications, applications and analysis workflows which are composed of multiple separate programs can be written in multiple languages and execute concurrently on multiple hosts. However, POSIX does not dictate computer-readable grammar or semantics for program input and output, resulting in extremely diverse command line grammar and input/output semantics among programs. This is a particular problem in distributed computing (multi-node compute clusters) and virtualized environments (such as Docker containers) where it is often necessary to provision resources such as input files before executing the program.

Often this gap is filled by hard coding program invocation and implicitly assuming requirements will be met, or abstracting program invocation with wrapper scripts or descriptor documents. Unfortunately, where these approaches are application or platform specific it creates a significant barrier to reproducibility and portability, as methods developed for one platform must be manually ported to be used on new platforms. Similarly, it creates redundant work, as wrappers for popular tools must be rewritten for each application or platform in use.

The Common Workflow Language Command Line Tool Description is designed to provide a common standard description of grammar and semantics for invoking programs used in data-intensive fields such as Bioinformatics, Chemistry, Physics, Astronomy, and Statistics. This specification attempts to define a precise data and execution model for Command Line Tools that can be implemented on a variety of computing platforms, ranging from a single workstation to cluster, grid, cloud, and high performance computing platforms. Details related to execution of these programs not laid out in this specification are open to interpretation by the computing platform implementing this specification.

1.6 References to other specifications §

Javascript Object Notation (JSON): http://json.org

JSON Linked Data (JSON-LD): http://json-ld.org

YAML: http://yaml.org

Avro: https://avro.apache.org/docs/1.8.1/spec.html

Internationalized Resource Identifiers (IRIs): https://tools.ietf.org/html/rfc3987

Portable Operating System Interface (POSIX.1-2008): http://pubs.opengroup.org/onlinepubs/9699919799/

Resource Description Framework (RDF): http://www.w3.org/RDF/

XDG Base Directory Specification: https://specifications.freedesktop.org/basedir-spec/basedir-spec-0.6.html

1.7 Scope §

This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations.

1.8 Terminology §

The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of a CWL implementation:

may: Conforming CWL documents and CWL implementations are permitted but not required to behave as described.

must: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error.

error: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it.

fatal error: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error.

at user option: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described.

deprecated: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification.

1.9 Glossary §

Opaque strings: Opaque strings (or opaque identifiers, opaque values) are nonsensical values that are swapped out with a real value later in the evaluation process. Workflow and tool expressions should not rely on it nor try to parse it.

2. Data model §

2.1 Data concepts §

An object is a data structure equivalent to the "object" type in JSON, consisting of an unordered set of name/value pairs (referred to here as fields) and where the name is a string and the value is a string, number, boolean, array, or object.

A document is a file containing a serialized object, or an array of objects.

A process is a basic unit of computation which accepts input data, performs some computation, and produces output data. Examples include CommandLineTools, Workflows, and ExpressionTools.

An input object is an object describing the inputs to an invocation of a process. The fields of the input object are referred to as "input parameters".

An output object is an object describing the output resulting from an invocation of a process. The fields of the output object are referred to as "output parameters".

An input schema describes the valid format (required fields, data types) for an input object.

An output schema describes the valid format for an output object.

Metadata is information about workflows, tools, or input items.

2.2 Syntax §

CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the Semantic Annotations for Linked Avro Data (SALAD) Specification. An implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/cwl-v1.2/

The official IANA media-type for CWL documents is application/cwl for either JSON or YAML format. For JSON formatted CWL documents, application/cwl+json can be used. For specifying a YAML formatted CWL document, one can use application/cwl+yaml but that is not an official IANA media-type yet; as of 2023-07-23 the +yaml suffix has yet to be approved.

CWL documents commonly reference other CWL documents. Each document must declare the cwlVersion of that document. Implementations must validate against the document's declared version. Implementations should allow workflows to reference documents of both newer and older CWL versions (up to the highest version of CWL supported by that implementation). Where the runtime environment or runtime behavior has changed between versions, for that portion of the execution an implementation must provide runtime environment and behavior consistent with the document's declared version. An implementation must not expose a newer feature when executing a document that specifies an older version that does not include that feature.

2.2.1 map §

Note: This section is non-normative.

type: array<ComplexType> | map<key_field, ComplexType>

The above syntax in the CWL specifications means there are two or more ways to write the given value.

Option one is an array and is the most verbose option.

Option one generic example:

some_cwl_field:
  - key_field: a_complex_type1
    field2: foo
    field3: bar
  - key_field: a_complex_type2
    field2: foo2
    field3: bar2
  - key_field: a_complex_type3

Option one specific example using Workflow.inputs:

array<InputParameter> | map<id, type | InputParameter>

inputs:
  - id: workflow_input01
    type: string
  - id: workflow_input02
    type: File
    format: http://edamontology.org/format_2572

Option two is enabled by the map<…> syntax. Instead of an array of entries we use a mapping, where one field of the ComplexType (here named key_field) becomes the key in the map, and its value is the rest of the ComplexType without the key field. If all of the other fields of the ComplexType are optional and unneeded, then we can indicate this with an empty mapping as the value: a_complex_type3: {}

Option two generic example:

some_cwl_field:
  a_complex_type1:  # this was the "key_field" from above
    field2: foo
    field3: bar
  a_complex_type2:
    field2: foo2
    field3: bar2
  a_complex_type3: {}  # we accept the default values for "field2" and "field3"

Option two specific example using Workflow.inputs:

array<InputParameter> | map<id, type | InputParameter>

inputs:
  workflow_input01:
    type: string
  workflow_input02:
    type: File
    format: http://edamontology.org/format_2572

Option two specific example using SoftwareRequirement.packages:

array<SoftwarePackage> | map<package, specs | SoftwarePackage>

hints:
  SoftwareRequirement:
    packages:
      sourmash:
        specs: [ https://doi.org/10.21105/joss.00027 ]
      screed:
        version: [ "1.0" ]
      python: {}

Sometimes we have a third and even more compact option denoted like this:

type: array<ComplexType> | map<key_field, field2 | ComplexType>

For this example, if we only need the key_field and field2 when specifying our ComplexTypes (because the other fields are optional and we are fine with their default values) then we can abbreviate.

Option three generic example:

some_cwl_field:
  a_complex_type1: foo   # we accept the default value for field3
  a_complex_type2: foo2  # we accept the default value for field3
  a_complex_type3: {}    # we accept the default values for "field2" and "field3"

Option three specific example using Workflow.inputs:

array<InputParameter> | map<id, type | InputParameter>

inputs:
  workflow_input01: string
  workflow_input02: File  # we accept the default of no File format

Option three specific example using SoftwareRequirement.packages:

array<SoftwarePackage> | map<package, specs | SoftwarePackage>

hints:
  SoftwareRequirement:
    packages:
      sourmash: [ https://doi.org/10.21105/joss.00027 ]
      python: {}

What if some entries we want to mix the option 2 and 3? You can!

Mixed option 2 and 3 generic example:

some_cwl_field:
  my_complex_type1: foo   # we accept the default value for field3
  my_complex_type2:
    field2: foo2
    field3: bar2          # we did not accept the default value for field3
                          # so we had to use the slightly expanded syntax
  my_complex_type3: {}    # as before, we accept the default values for both
                          # "field2" and "field3"

Mixed option 2 and 3 specific example using Workflow.inputs:

array<InputParameter> | map<id, type | InputParameter>

inputs:
  workflow_input01: string
  workflow_input02:     # we use the longer way
    type: File          # because we want to specify the "format" too
    format: http://edamontology.org/format_2572

Mixed option 2 and 3 specific example using SoftwareRequirement.packages:

array<SoftwarePackage> | map<package, specs | SoftwarePackage>

hints:
  SoftwareRequirement:
    packages:
      sourmash: [ https://doi.org/10.21105/joss.00027 ]
      screed:
        specs: [ https://github.com/dib-lab/screed ]
        version: [ "1.0" ]
      python: {}

Note: The map<…> (compact) versions are optional for users, the verbose option #1 is always allowed, but for presentation reasons option 3 and 2 may be preferred by human readers. Consumers of CWL must support all three options.

The normative explanation for these variations, aimed at implementors, is in the Schema Salad specification.

2.3 Identifiers §

If an object contains an id field, that is used to uniquely identify the object in that document. The value of the id field must be unique over the entire document. Identifiers may be resolved relative to either the document base and/or other identifiers following the rules are described in the Schema Salad specification.

An implementation may choose to only honor references to object types for which the id field is explicitly listed in this specification.

2.4 Document preprocessing §

An implementation must resolve $import and $include directives as described in the Schema Salad specification.

Another transformation defined in Schema salad is simplification of data type definitions. Type <T> ending with ? should be transformed to [<T>, "null"]. Type <T> ending with [] should be transformed to {"type": "array", "items": <T>}

2.5 Extensions and metadata §

Input metadata (for example, a sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata.

Implementation extensions not required for correct execution (for example, fields related to GUI presentation) and metadata about the tool or workflow itself (for example, authorship for use in citations) may be provided as additional fields on any object. Such extensions fields must use a namespace prefix listed in the $namespaces section of the document as described in the Schema Salad specification.

It is recommended that concepts from schema.org are used whenever possible. For the $schemas field we recommend their RDF encoding: https://schema.org/version/latest/schemaorg-current-https.rdf

Implementation extensions which modify execution semantics must be listed in the requirements field.

2.6 Packed documents §

A "packed" CWL document is one that contains multiple process objects. This makes it possible to store and transmit a Workflow together with the processes of each of its steps in a single file.

There are two methods to create packed documents: embedding and $graph. These can be both appear in the same document.

"Embedding" is where the entire process object is copied into the run field of a workflow step. If the step process is a subworkflow, it can be processed recursively to embed the processes of the subworkflow steps, and so on. Embedded process objects may optionally include id fields.

A "$graph" document does not have a process object at the root. Instead, there is a $graph field which consists of a list of process objects. Each process object must have an id field. Workflow run fields cross-reference other processes in the document $graph using the id of the process object.

All process objects in a packed document must validate and execute as the cwlVersion appearing the top level. A cwlVersion field appearing anywhere other than the top level must be ignored.

When executing a packed document, the reference to the document may include a fragment identifier. If present, the fragment identifier specifies the id of the process to execute.

If the reference to the packed document does not include a fragment identifier, the runner must choose the top-level process object as the entry point. If there is no top-level process object (as in the case of $graph) then the runner must choose the process object with an id of #main. If there is no #main object, the runner must return an error.

3. Execution model §

3.1 Execution concepts §

A parameter is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation.

A CommandLineTool is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates.

A workflow is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acyclic graph, and independent steps may run concurrently.

A runtime environment is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the specific Python interpreter or the specific Java virtual machine), libraries, modules, packages, utilities, and data files required to run the tool.

A workflow platform is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output.

A data link is a connection from a "Source" parameter to a "Sink" parameter. A data link expresses that when a value becomes available for the source parameter, that value should be copied to the "sink" parameter. Reflecting the direction of data flow, a data link is described as "outgoing" from the source and "inbound" to the sink.

A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification.

It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include:

  • Data security and permissions
  • Scheduling tool invocations on remote cluster or cloud compute nodes.
  • Using virtual machines or operating system containers to manage the runtime (except as described in DockerRequirement).
  • Using remote or distributed file systems to manage input and output files.
  • Transforming file paths.
  • Pausing, resuming or checkpointing processes or workflows.

Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of process requirements.

3.2 Generic execution process §

The generic execution sequence of a CWL process (including workflows and command line tools) is as follows. Processes are modeled as functions that consume an input object and produce an output object.

  1. Load input object.
  2. Load, process and validate a CWL document, yielding one or more process objects. The $namespaces present in the CWL document are also used when validating and processing the input object.
  3. If there are multiple process objects (due to $graph) and which process object to start with is not specified in the input object (via a cwl:tool entry) or by any other means (like a URL fragment) then choose the process with the id of "#main" or "main".
  4. Validate the input object against the inputs schema for the process.
  5. Validate process requirements are met.
  6. Perform any further setup required by the specific process type.
  7. Execute the process.
  8. Capture results of process execution into the output object.
  9. Validate the output object against the outputs schema for the process (with the exception of ExpressionTool outputs, which are always considered valid).
  10. Report the output object to the process caller.

3.3 Requirements and hints §

A process requirement modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

A hint is similar to a requirement; however, it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied.

Optionally, implementations may allow requirements to be specified in the input object document as an array of requirements under the field name cwl:requirements. If implementations allow this, then such requirements should be combined with any requirements present in the corresponding Process as if they were specified there.

Requirements specified in a parent Workflow are inherited by step processes if they are valid for that step. If the substep is a CommandLineTool only the InlineJavascriptRequirement, SchemaDefRequirement, DockerRequirement, SoftwareRequirement, InitialWorkDirRequirement, EnvVarRequirement, ShellCommandRequirement, ResourceRequirement, LoadListingRequirement, WorkReuse, NetworkAccess, InplaceUpdateRequirement, ToolTimeLimit are valid.

As good practice, it is best to have process requirements be self-contained, such that each process can run successfully by itself.

If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in requirements on a process implementation such as CommandLineTool will take precedence over an entry in requirements specified in a workflow step, and an entry in requirements on a workflow step takes precedence over the workflow. Entries in hints are resolved the same way.

Requirements override hints. If a process implementation provides a process requirement in hints which is also provided in requirements by an enclosing workflow or workflow step, the enclosing requirements takes precedence.

3.4 Parameter references §

Parameter references are denoted by the syntax $(...) and may be used in any field permitting the pseudo-type Expression, as specified by this document. Conforming implementations must support parameter references. Parameter references use the following subset of Javascript/ECMAScript 5.1 syntax, but they are designed to not require a Javascript engine for evaluation.

In the following BNF grammar, character classes and grammar rules are denoted in {}, - denotes exclusion from a character class, (()) denotes grouping, | denotes alternates, trailing * denotes zero or more repeats, + denotes one or more repeats, and all other characters are literal values.

symbol ::= {Unicode alphanumeric}+
singleq ::= [' (( {character - { | \ ' \} } ))* ']
doubleq ::= [" (( {character - { | \ " \} } ))* "]
index ::= [ {decimal digit}+ ]
segment ::= . {symbol} | {singleq} | {doubleq} | {index}
parameter reference ::= ( {symbol} {segment}*)

Use the following algorithm to resolve a parameter reference:

  1. Match the leading symbol as the key
  2. If the key is the special value 'null' then the value of the parameter reference is 'null'. If the key is 'null' it must be the only symbol in the parameter reference.
  3. Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context.
  4. If there are no subsequent segments, terminate and return current value
  5. Else, match the next segment
  6. Extract the symbol, string, or index from the segment as the key
  7. Look up the key in current value and assign as new current value.
  8. If the key is a symbol or string, the current value must be an object.
  9. If the key is an index, the current value must be an array or string.
  10. If the next key is the last key and it has the special value 'length' and the current value is an array, the value of the parameter reference is the length of the array. If the value 'length' is encountered in other contexts, normal evaluation rules apply.
  11. It is an error if the key does not match the required type, or the key is not found or out of range.
  12. Repeat steps 3-8

The root namespace is the parameter context. The following parameters must be provided:

  • inputs: The input object to the current Process.
  • self: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'.
  • runtime: An object containing configuration details. Specific to the process type. An implementation may provide opaque strings for any or all fields of runtime. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents, except where noted otherwise.

If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type.

3.4.1 String interpolation §

If the value of a field has non-whitespace leading or trailing characters around a parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters, followed by the string value of the parameter reference, followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules:

  • Strings are replaced the literal text of the string, any escaped characters replaced by the literal characters they represent, and there are no leading or trailing quotes.
  • Objects entries are sorted by key

Multiple parameter references may appear in a single field. This case must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value.

When text embedded in a CWL file represents code for another programming language, the use of $(...) (and ${...} in the case of expressions) may conflict with the syntax of that language. For example, when writing shell scripts, $(...) is used to execute a command in a subshell and replace a portion of the command line with the standard output of that command.

The following escaping rules apply. The scanner makes a single pass from start to end with 3-character lookahead. After performing a replacement scanning resumes at the next character following the replaced substring.

  1. The substrings \$( and \${ are replaced by $( and ${ respectively. No parameter or expression evaluation interpolation occurs.
  2. A double backslash \\ is replaced by a single backslash \.
  3. A substring starting with a backslash that does not match one of the previous rules is left unchanged.

3.5 Expressions (Optional) §

An expression is a fragment of Javascript/ECMAScript 5.1 code evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow.

Expressions in CWL are an optional feature and are not required to be implemented by all consumers of CWL documents. They should be used sparingly, when there is no other way to achieve the desired outcome. Excessive use of expressions may be a signal that other refactoring of the tools or workflows would benefit the author, runtime, and users of the CWL document in question.

To declare the use of expressions, the document must include the process requirement InlineJavascriptRequirement. Expressions may be used in any field permitting the pseudo-type Expression, as specified by this document.

Expressions are denoted by the syntax $(...) or ${...}.

A code fragment wrapped in the $(...) syntax must be evaluated as a ECMAScript expression.

A code fragment wrapped in the ${...} syntax must be evaluated as a ECMAScript function body for an anonymous, zero-argument function. This means the code will be evaluated as (function() { ... })().

Expressions must return a valid JSON data type: one of null, string, number, boolean, array, object. Other return values must result in a permanentFailure. Implementations must permit any syntactically valid Javascript and account for nesting of parenthesis or braces and that strings that may contain parenthesis or braces when scanning for expressions.

The runtime must include any code defined in the "expressionLib" field of InlineJavascriptRequirement prior to executing the actual expression.

Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above.

The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field.

Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in Javascript strict mode.

The order in which expressions are evaluated is undefined except where otherwise noted in this document.

An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means.

Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document.

Javascript exceptions thrown from a CWL expression must result in a permanentFailure of the CWL process.

3.6 Executing CWL documents as scripts §

By convention, a CWL document may begin with #!/usr/bin/env cwl-runner and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide cwl-runner as an alias for the platform's CWL implementation.

A CWL input object document may similarly begin with #!/usr/bin/env cwl-runner and be marked as executable. In this case, the input object must include the field cwl:tool supplying an IRI to the default CWL document that should be executed using the fields of the input object as input parameters.

The cwl-runner interface is required for conformance testing and is documented in cwl-runner.cwl.

3.7 Discovering CWL documents on a local filesystem §

To discover CWL documents look in the following locations:

For each value in the XDG_DATA_DIRS environment variable (which is a : colon separated list), check the ./commonwl subdirectory. If XDG_DATA_DIRS is unset or empty, then check using the default value for XDG_DATA_DIRS: /usr/local/share/:/usr/share/ (That is to say, check /usr/share/commonwl/ and /usr/local/share/commonwl/)

Then check $XDG_DATA_HOME/commonwl/.

If the XDG_DATA_HOME environment variable is unset, its default value is $HOME/.local/share (That is to say, check $HOME/.local/share/commonwl)

$XDG_DATA_HOME and $XDG_DATA_DIRS are from the XDG Base Directory Specification

4. Running a Command §

To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, a CommandLineTool defines an "input binding" that describes how to translate abstract input parameters to a concrete program invocation, and an "output binding" that describes how to generate output parameters from program output.

4.1 Input binding §

The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an input parameter using the inputBinding field, or separately using the arguments field of the CommandLineTool.

The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding.

  1. Collect CommandLineBinding objects from arguments. Assign a sorting key [position, i] where position is CommandLineBinding.position and i is the index in the arguments list.

  2. Collect CommandLineBinding objects from the inputs schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested CommandLineBinding objects and associating them with values from the input object.

  3. Create a sorting key by taking the value of the position field at each level leading to each leaf binding object. If position is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding.

  4. Sort elements using the assigned sorting keys. Numeric entries sort before strings.

  5. In the sorted order, apply the rules defined in CommandLineBinding to convert bindings to actual command line elements.

  6. Insert elements from baseCommand at the beginning of the command line.

4.2 Runtime environment §

All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download to the host. Implementations may choose not to provide access to files not explicitly specified in the input object or process requirements.

Output files produced by tool execution must be written to the designated output directory. The initial current working directory when executing the tool must be the designated output directory. The designated output directory should be empty, except for files or directories specified using InitialWorkDirRequirement.

Files may also be written to the designated temporary directory. This directory must be isolated and not shared with other processes. Any files written to the designated temporary directory may be automatically deleted by the workflow platform immediately after the tool terminates.

For compatibility, files may be written to the system temporary directory which must be located at /tmp. Because the system temporary directory may be shared with other processes on the system, files placed in the system temporary directory are not guaranteed to be deleted automatically. A tool must not use the system temporary directory as a back-channel communication with other tools. It is valid for the system temporary directory to be the same as the designated temporary directory.

When executing the tool, the tool must execute in a new, empty environment with only the environment variables described below; the child process must not inherit environment variables from the parent process except as specified or at user option.

  • HOME must be set to the designated output directory.
  • TMPDIR must be set to the designated temporary directory.
  • PATH may be inherited from the parent process, except when run in a container that provides its own PATH.
  • Variables defined by EnvVarRequirement
  • The default environment of the container, such as when using DockerRequirement

An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory, system temporary directory, and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The designated temporary directory, system temporary directory and designated output directory may each reside on different mount points on different file systems.

An implementation may forbid the tool from directly accessing network resources. Correct tools must not assume any network access unless they have the 'networkAccess' field of a 'NetworkAccess' requirement set to true but even then this does not imply a publicly routable IP address or the ability to accept inbound connections.

The runtime section available in parameter references and expressions contains the following fields. As noted earlier, an implementation may perform deferred resolution of runtime fields by providing opaque strings for any or all of the following fields; parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents.

  • runtime.outdir: an absolute path to the designated output directory
  • runtime.tmpdir: an absolute path to the designated temporary directory
  • runtime.cores: number of CPU cores reserved for the tool process
  • runtime.ram: amount of RAM in mebibytes (2**20) reserved for the tool process
  • runtime.outdirSize: reserved storage space available in the designated output directory
  • runtime.tmpdirSize: reserved storage space available in the designated temporary directory

For cores, ram, outdirSize and tmpdirSize, if an implementation can't provide the actual number of reserved resources during the expression evaluation time, it should report back the minimal requested amount.

See ResourceRequirement for details on how to describe the hardware resources required by a tool.

The standard input stream, the standard output stream, and/or the standard error stream may be redirected as described in the stdin, stdout, and stderr fields.

4.3 Execution §

Once the command line is built and the runtime environment is created, the actual tool is executed.

The standard error stream and standard output stream may be captured by platform logging facilities for storage and reporting. If there are multiple commands logically chained (e.g. echo a && echo b) implementations must capture the output of all the commands, and not only the output of the last command (i.e. the following is incorrect echo a && echo b > captured, as the output of echo a is not included in captured).

Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion.

The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields successCodes, temporaryFailCodes, and permanentFailCodes. An implementation may choose to default unspecified non-zero exit codes to either temporaryFailure or permanentFailure.

The exit code of the process is available to expressions in outputEval as runtime.exitCode.

4.4 Output binding §

If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. In this case, the output object should still be type-checked against the outputs section, but outputBinding is ignored.

For Files and Directories, if the value of path is a relative path pattern (does not begin with a slash '/') then it is resolved relative to the output directory. If the value of the "path" is an absolute path pattern (it does begin with a slash '/') then it must refer to a path within the output directory. It is an error for "path" to refer outside the output directory.

Similarly, if a File or Directory "cwl.output.json" contains location, it is resolved as relative reference IRI with a base IRI representing the output directory. If location contains some other absolute IRI with a scheme supported by the implementation, the implementation may choose to accept it.

If both path and location are provided on a File or Directory in "cwl.output.json", path takes precedence.

If there is no "cwl.output.json", the output object must be generated by walking the parameters listed in outputs and applying output bindings to the tool output. Output bindings are associated with output parameters using the outputBinding field. See CommandOutputBinding for details.

5. CommandLineTool §

This defines the schema of the CWL Command Line Tool Description document.

Fields

field
required
type
description
inputs
required

Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object.

When accepting an input object, all input parameters must have a value. If an input parameter is missing from the input object, it must be assigned a value of null (or the value of default for that parameter, if provided) for the purposes of validation and evaluation of expressions.

outputs
required

Defines the parameters representing the output of the process. May be used to generate and/or validate the output object.

class
required
constant value CommandLineTool
id
optional

The unique identifier for this object.

Only useful for $graph at Process level. Should not be exposed to users in graphical or terminal user interfaces.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

requirements
optional

Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option.

hints
optional

Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning.

cwlVersion
optional

CWL document version. Always required at the document root. Not required for a Process embedded inside another Process.

intent
optional
array<string>

An identifier for the type of computational operation, of this Process. Especially useful for Operation, but can also be used for CommandLineTool, Workflow, or ExpressionTool.

If provided, then this must be an IRI of a concept node that represents the type of operation, preferably defined within an ontology.

For example, in the domain of bioinformatics, one can use an IRI from the EDAM Ontology's Operation concept nodes, like Alignment, or Clustering; or a more specific Operation concept like Split read mapping.

baseCommand
optional
string | array<string>

Specifies the program to execute. If an array, the first element of the array is the command to execute, and subsequent elements are mandatory command line arguments. The elements in baseCommand must appear before any command line bindings from inputBinding or arguments.

If baseCommand is not provided or is an empty array, the first element of the command line produced after processing inputBinding or arguments must be used as the program to execute.

If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the $PATH variable in the runtime environment of the workflow runner find the absolute path of the executable.

arguments
optional

Command line bindings which are not directly associated with input parameters. If the value is a string, it is used as a string literal argument. If it is an Expression, the result of the evaluation is used as an argument.

stdin
optional

A path to a file whose contents must be piped into the command's standard input stream.

stderr
optional

Capture the command's standard error stream to a file written to the designated output directory.

If stderr is a string, it specifies the file name to use.

If stderr is an expression, the expression is evaluated and must return a string with the file name to use to capture stderr. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator /) it is an error.

stdout
optional

Capture the command's standard output stream to a file written to the designated output directory.

If the CommandLineTool contains logically chained commands (e.g. echo a && echo b) stdout must include the output of every command.

If stdout is a string, it specifies the file name to use.

If stdout is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator /) it is an error.

successCodes
optional
array<int>

Exit codes that indicate the process completed successfully.

If not specified, only exit code 0 is considered success.

temporaryFailCodes
optional
array<int>

Exit codes that indicate the process failed due to a possibly temporary condition, where executing the process with the same runtime environment and inputs may produce different results.

If not specified, no exit codes are considered temporary failure.

permanentFailCodes
optional
array<int>

Exit codes that indicate the process failed due to a permanent logic error, where executing the process with the same runtime environment and same inputs is expected to always fail. If not specified, all exit codes except 0 are considered permanent failure.

5.1 CommandInputParameter §

An input parameter for a CommandLineTool.

Fields

field
required
type
description
type
required

Specify valid types of data that may be assigned to this parameter.

label
optional

A short, human-readable label of this object.

secondaryFiles
optional

Only valid when type: File or is an array of items: File.

Provides a pattern or expression specifying files or directories that should be included alongside the primary file. Secondary files may be required or optional. When not explicitly specified, secondary files specified for inputs are required and outputs are optional. An implementation must include matching Files and Directories in the secondaryFiles property of the primary file. These Files and Directories must be transferred and staged alongside the primary file. An implementation may fail workflow execution if a required secondary file does not exist.

If the value is an expression, the value of self in the expression must be the primary input or output File object to which this binding applies. The basename, nameroot and nameext fields must be present in self. For CommandLineTool outputs the path field must also be present. The expression must return a filename string relative to the path to the primary File, a File or Directory object with either path or location and basename fields set, or an array consisting of strings or File or Directory objects. It is legal to reference an unchanged File or Directory object taken from input as a secondaryFile. The expression may return "null" in which case there is no secondaryFile from that expression.

To work on non-filename-preserving storage systems, portable tool descriptions should avoid constructing new values from location, but should construct relative references using basename or nameroot instead.

If a value in secondaryFiles is a string that is not an expression, it specifies that the following pattern should be applied to the path of the primary file to yield a filename relative to the primary File:

  1. If string ends with ? character, remove the last ? and mark the resulting secondary file as optional.
  2. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  3. Append the remainder of the string to the end of the file path.
streamable
optional

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

format
optional

Only valid when type: File or is an array of items: File.

This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

loadContents
optional

Only valid when type: File or is an array of items: File.

If true, the file (or each file in the array) must be a UTF-8 text file 64 KiB or smaller, and the implementation must read the entire contents of the file (or file array) and place it in the contents field of the File object for use by expressions. If the size of the file is greater than 64 KiB, the implementation must raise a fatal error.

loadListing
optional

Only valid when type: Directory or is an array of items: Directory.

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

The order of precedence for loadListing is:

  1. loadListing on an individual parameter
  2. Inherited from LoadListingRequirement
  3. By default: no_listing
default
optional

The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is null. Default values are applied before evaluating expressions (e.g. dependent valueFrom fields).

inputBinding
optional

Describes how to turn the input parameters of a process into command line arguments.

5.1.1 CommandLineBinding §

When listed under inputBinding in the input schema, the term "value" refers to the corresponding value in the input object. For binding objects listed in CommandLineTool.arguments, the term "value" refers to the effective value after evaluating valueFrom.

The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value.

  • string: Add prefix and the string to the command line.

  • number: Add prefix and decimal representation to command line.

  • boolean: If true, add prefix to the command line. If false, add nothing.

  • File: Add prefix and the value of File.path to the command line.

  • Directory: Add prefix and the value of Directory.path to the command line.

  • array: If itemSeparator is specified, add prefix and the join the array into a single string with itemSeparator separating the items. Otherwise, first add prefix, then recursively process individual elements. If the array is empty, it does not add anything to command line.

  • object: Add prefix only, and recursively add object fields for which inputBinding is specified.

  • null: Add nothing.

Fields

field
required
type
description
loadContents
optional

Use of loadContents in InputBinding is deprecated. Preserved for v1.0 backwards compatibility. Will be removed in CWL v2.0. Use InputParameter.loadContents instead.

position
optional

The sorting key. Default position is 0. If a CWL Parameter Reference or CWL Expression is used and if the inputBinding is associated with an input parameter, then the value of self will be the value of the input parameter. Input parameter defaults (as specified by the InputParameter.default field) must be applied before evaluating the expression. Expressions must return a single value of type int or a null.

prefix
optional

Command line prefix to add before the value.

separate
optional

If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument.

itemSeparator
optional

Join the array elements into a single string with the elements separated by itemSeparator.

valueFrom
optional

If valueFrom is a constant string value, use this as the value and apply the binding rules above.

If valueFrom is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of self in the expression will be the value of the input parameter. Input parameter defaults (as specified by the InputParameter.default field) must be applied before evaluating the expression.

If the value of the associated input parameter is null, valueFrom is not evaluated and nothing is added to the command line.

When a binding is part of the CommandLineTool.arguments field, the valueFrom field is required.

shellQuote
optional

If ShellCommandRequirement is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use shellQuote: false to inject metacharacters for operations such as pipes.

If shellQuote is true or not provided, the implementation must not permit interpretation of any shell metacharacters or directives.

5.1.2 Expression §

'Expression' is not a real type. It indicates that a field must allow runtime parameter references. If InlineJavascriptRequirement is declared and supported by the platform, the field must also allow Javascript expressions.

Symbols

symboldescription
ExpressionPlaceholder

5.1.3 SecondaryFileSchema §

Secondary files are specified using the following micro-DSL for secondary files:

  • If the value is a string, it is transformed to an object with two fields pattern and required
  • By default, the value of required is null (this indicates default behavior, which may be based on the context)
  • If the value ends with a question mark ? the question mark is stripped off and the value of the field required is set to False
  • The remaining value is assigned to the field pattern

For implementation details and examples, please see this section in the Schema Salad specification.

Fields

field
required
type
description
pattern
required

Provides a pattern or expression specifying files or directories that should be included alongside the primary file.

If the value is an expression, the value of self in the expression must be the primary input or output File object to which this binding applies. The basename, nameroot and nameext fields must be present in self. For CommandLineTool inputs the location field must also be present. For CommandLineTool outputs the path field must also be present. If secondary files were included on an input File object as part of the Process invocation, they must also be present in secondaryFiles on self.

The expression must return either: a filename string relative to the path to the primary File, a File or Directory object (class: File or class: Directory) with either location (for inputs) or path (for outputs) and basename fields set, or an array consisting of strings or File or Directory objects as previously described.

It is legal to use location from a File or Directory object passed in as input, including location from secondary files on self. If an expression returns a File object with the same location but a different basename as a secondary file that was passed in, the expression result takes precedence. Setting the basename with an expression this way affects the path where the secondary file will be staged to in the CommandLineTool.

The expression may return "null" in which case there is no secondary file from that expression.

To work on non-filename-preserving storage systems, portable tool descriptions should treat location as an opaque identifier and avoid constructing new values from location, but should construct relative references using basename or nameroot instead, or propagate location from defined inputs.

If a value in secondaryFiles is a string that is not an expression, it specifies that the following pattern should be applied to the path of the primary file to yield a filename relative to the primary File:

  1. If string ends with ? character, remove the last ? and mark the resulting secondary file as optional.
  2. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  3. Append the remainder of the string to the end of the file path.
required
optional

An implementation must not fail workflow execution if required is set to false and the expected secondary file does not exist. Default value for required field is true for secondary files on input and false for secondary files on output.

5.1.4 LoadListingEnum §

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

Symbols

symboldescription
no_listing Do not load the directory listing.
shallow_listing Only load the top level listing, do not recurse into subdirectories.
deep_listing Load the directory listing and recursively load all subdirectories as well.

5.1.5 File §

Represents a file (or group of files when secondaryFiles is provided) that will be accessible by tools using standard POSIX file system call API such as open(2) and read(2).

Files are represented as objects with class of File. File objects have a number of properties that provide metadata about the file.

The location property of a File is a IRI that uniquely identifies the file. Implementations must support the file:// IRI scheme and may support other schemes such as http:// and https://. The value of location may also be a relative reference, in which case it must be resolved relative to the IRI of the document it appears in. Alternately to location, implementations must also accept the path property on File, which must be a filesystem path available on the same host as the CWL runner (for inputs) or the runtime environment of a command line tool execution (for command line tool outputs).

If no location or path is specified, a file object must specify contents with the UTF-8 text content of the file. This is a "file literal". File literals do not correspond to external resources, but are created on disk with contents with when needed for executing a tool. Where appropriate, expressions can return file literals to define new files on a runtime. The maximum size of contents is 64 kilobytes.

The basename property defines the filename on disk where the file is staged. This may differ from the resource name. If not provided, basename must be computed from the last path part of location and made available to expressions.

The secondaryFiles property is a list of File or Directory objects that must be staged in the same directory as the primary file. It is an error for file names to be duplicated in secondaryFiles.

The size property is the size in bytes of the File. It must be computed from the resource and made available to expressions. The checksum field contains a cryptographic hash of the file content for use it verifying file contents. Implementations may, at user option, enable or disable computation of the checksum field for performance or other reasons. However, the ability to compute output checksums is required to pass the CWL conformance test suite.

When executing a CommandLineTool, the files and secondary files may be staged to an arbitrary directory, but must use the value of basename for the filename. The path property must be file path in the context of the tool execution runtime (local to the compute node, or within the executing container). All computed properties should be available to expressions. File literals also must be staged and path must be set.

When collecting CommandLineTool outputs, glob matching returns file paths (with the path property) and the derived properties. This can all be modified by outputEval. Alternately, if the file cwl.output.json is present in the output, outputBinding is ignored.

File objects in the output must provide either a location IRI or a path property in the context of the tool execution runtime (local to the compute node, or within the executing container).

When evaluating an ExpressionTool, file objects must be referenced via location (the expression tool does not have access to files on disk so path is meaningless) or as file literals. It is legal to return a file object with an existing location but a different basename. The loadContents field of ExpressionTool inputs behaves the same as on CommandLineTool inputs, however it is not meaningful on the outputs.

An ExpressionTool may forward file references from input to output by using the same value for location.

Fields

field
required
type
description
class
required
constant value File

Must be File to indicate this object describes a file.

location
optional

An IRI that identifies the file resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource; the implementation must use the IRI to retrieve file content. If an implementation is unable to retrieve the file content stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error.

If the location field is not provided, the contents field must be provided. The implementation must assign a unique identifier for the location field.

If the path field is provided but the location field is not, an implementation may assign the value of the path field to location, then follow the rules above.

path
optional

The local host path where the File is available when a CommandLineTool is executed. This field must be set by the implementation. The final path component must match the value of basename. This field must not be used in any other context. The command line tool being executed must be able to access the file at path using the POSIX open(2) syscall.

As a special case, if the path field is provided but the location field is not, an implementation may assign the value of the path field to location, and remove the path field.

If the path contains POSIX shell metacharacters (|,&, ;, <, >, (,), $,`, \, ", ', <space>, <tab>, and <newline>) or characters not allowed for Internationalized Domain Names for Applications then implementations may terminate the process with a permanentFailure.

basename
optional

The base name of the file, that is, the name of the file without any leading directory path. The base name must not contain a slash /.

If not provided, the implementation must set this field based on the location field by taking the final path component after parsing location as an IRI. If basename is provided, it is not required to match the value from location.

When this file is made available to a CommandLineTool, it must be named with basename, i.e. the final component of the path field must match basename.

dirname
optional

The name of the directory containing file, that is, the path leading up to the final slash in the path such that dirname + '/' + basename == path.

The implementation must set this field based on the value of path prior to evaluating parameter references or expressions in a CommandLineTool document. This field must not be used in any other context.

nameroot
optional

The basename root such that nameroot + nameext == basename, and nameext is empty or begins with a period and contains at most one period. For the purposes of path splitting leading periods on the basename are ignored; a basename of .cshrc will have a nameroot of .cshrc.

The implementation must set this field automatically based on the value of basename prior to evaluating parameter references or expressions.

nameext
optional

The basename extension such that nameroot + nameext == basename, and nameext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; a basename of .cshrc will have an empty nameext.

The implementation must set this field automatically based on the value of basename prior to evaluating parameter references or expressions.

checksum
optional

Optional hash code for validating file integrity. Currently, must be in the form "sha1$ + hexadecimal string" using the SHA-1 algorithm.

size
optional

Optional file size (in bytes)

secondaryFiles
optional
array<File | Directory>

A list of additional files or directories that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in secondaryFiles may itself include secondaryFiles for which the same rules apply.

format
optional

The format of the file: this must be an IRI of a concept node that represents the file format, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

Reasoning about format compatibility must be done by checking that an input file format is the same, owl:equivalentClass or rdfs:subClassOf the format required by the input parameter. owl:equivalentClass is transitive with rdfs:subClassOf, e.g. if <B> owl:equivalentClass <C> and <B> owl:subclassOf <A> then infer <C> owl:subclassOf <A>.

File format ontologies may be provided in the "$schemas" metadata at the root of the document. If no ontologies are specified in $schemas, the runtime may perform exact file format matches.

contents
optional

File contents literal.

If neither location nor path is provided, contents must be non-null. The implementation must assign a unique identifier for the location field. When the file is staged as input to CommandLineTool, the value of contents must be written to a file.

If contents is set as a result of a Javascript expression, an entry in InitialWorkDirRequirement, or read in from cwl.output.json, there is no specified upper limit on the size of contents. Implementations may have practical limits on the size of contents based on memory and storage available to the workflow runner or other factors.

If the loadContents field of an InputParameter or OutputParameter is true, and the input or output File object location is valid, the file must be a UTF-8 text file 64 KiB or smaller, and the implementation must read the entire contents of the file and place it in the contents field. If the size of the file is greater than 64 KiB, the implementation must raise a fatal error.

5.1.5.1 Directory §

Represents a directory to present to a command line tool.

Directories are represented as objects with class of Directory. Directory objects have a number of properties that provide metadata about the directory.

The location property of a Directory is a IRI that uniquely identifies the directory. Implementations must support the file:// IRI scheme and may support other schemes such as http://. Alternately to location, implementations must also accept the path property on Directory, which must be a filesystem path available on the same host as the CWL runner (for inputs) or the runtime environment of a command line tool execution (for command line tool outputs).

A Directory object may have a listing field. This is a list of File and Directory objects that are contained in the Directory. For each entry in listing, the basename property defines the name of the File or Subdirectory when staged to disk. If listing is not provided, the implementation must have some way of fetching the Directory listing at runtime based on the location field.

If a Directory does not have location, it is a Directory literal. A Directory literal must provide listing. Directory literals must be created on disk at runtime as needed.

The resources in a Directory literal do not need to have any implied relationship in their location. For example, a Directory listing may contain two files located on different hosts. It is the responsibility of the runtime to ensure that those files are staged to disk appropriately. Secondary files associated with files in listing must also be staged to the same Directory.

When executing a CommandLineTool, Directories must be recursively staged first and have local values of path assigned.

Directory objects in CommandLineTool output must provide either a location IRI or a path property in the context of the tool execution runtime (local to the compute node, or within the executing container).

An ExpressionTool may forward file references from input to output by using the same value for location.

Name conflicts (the same basename appearing multiple times in listing or in any entry in secondaryFiles in the listing) is a fatal error.

Fields

field
required
type
description
class
required
constant value Directory

Must be Directory to indicate this object describes a Directory.

location
optional

An IRI that identifies the directory resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource. If the listing field is not set, the implementation must use the location IRI to retrieve directory listing. If an implementation is unable to retrieve the directory listing stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error.

If the location field is not provided, the listing field must be provided. The implementation must assign a unique identifier for the location field.

If the path field is provided but the location field is not, an implementation may assign the value of the path field to location, then follow the rules above.

path
optional

The local path where the Directory is made available prior to executing a CommandLineTool. This must be set by the implementation. This field must not be used in any other context. The command line tool being executed must be able to access the directory at path using the POSIX opendir(2) syscall.

If the path contains POSIX shell metacharacters (|,&, ;, <, >, (,), $,`, \, ", ', <space>, <tab>, and <newline>) or characters not allowed for Internationalized Domain Names for Applications then implementations may terminate the process with a permanentFailure.

basename
optional

The base name of the directory, that is, the name of the file without any leading directory path. The base name must not contain a slash /.

If not provided, the implementation must set this field based on the location field by taking the final path component after parsing location as an IRI. If basename is provided, it is not required to match the value from location.

When this file is made available to a CommandLineTool, it must be named with basename, i.e. the final component of the path field must match basename.

listing
optional
array<File | Directory>

List of files or subdirectories contained in this directory. The name of each file or subdirectory is determined by the basename field of each File or Directory object. It is an error if a File shares a basename with any other entry in listing. If two or more Directory object share the same basename, this must be treated as equivalent to a single subdirectory with the listings recursively merged.

5.1.6 Any §

The Any type validates for any non-null value.

Symbols

symboldescription
Any

5.1.7 CWLType §

Extends primitive types with the concept of a file and directory as a builtin type.

Symbols

symboldescription
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence
null no value
boolean a binary value
int 32-bit signed integer
long 64-bit signed integer
float single precision (32-bit) IEEE 754 floating-point number
double double precision (64-bit) IEEE 754 floating-point number
string Unicode character sequence
File A File object
Directory A Directory object

5.1.8 stdin §

Only valid as a type for a CommandLineTool input with no inputBinding set. stdin must not be specified at the CommandLineTool level.

The following

inputs:
   an_input_name:
   type: stdin

is equivalent to

inputs:
  an_input_name:
    type: File
    streamable: true

stdin: $(inputs.an_input_name.path)

Symbols

symboldescription
stdin

5.1.9 CommandInputRecordSchema §

Fields

field
required
type
description
type
required
constant value record

Must be record

fields
optional

Defines the fields of the record.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

name
optional

The identifier for this type

inputBinding
optional

Describes how to turn this object into command line arguments.

5.1.9.1 CommandInputRecordField §

Fields

field
required
type
description
name
required

The name of the field

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

label
optional

A short, human-readable label of this object.

secondaryFiles
optional

Only valid when type: File or is an array of items: File.

Provides a pattern or expression specifying files or directories that should be included alongside the primary file. Secondary files may be required or optional. When not explicitly specified, secondary files specified for inputs are required and outputs are optional. An implementation must include matching Files and Directories in the secondaryFiles property of the primary file. These Files and Directories must be transferred and staged alongside the primary file. An implementation may fail workflow execution if a required secondary file does not exist.

If the value is an expression, the value of self in the expression must be the primary input or output File object to which this binding applies. The basename, nameroot and nameext fields must be present in self. For CommandLineTool outputs the path field must also be present. The expression must return a filename string relative to the path to the primary File, a File or Directory object with either path or location and basename fields set, or an array consisting of strings or File or Directory objects. It is legal to reference an unchanged File or Directory object taken from input as a secondaryFile. The expression may return "null" in which case there is no secondaryFile from that expression.

To work on non-filename-preserving storage systems, portable tool descriptions should avoid constructing new values from location, but should construct relative references using basename or nameroot instead.

If a value in secondaryFiles is a string that is not an expression, it specifies that the following pattern should be applied to the path of the primary file to yield a filename relative to the primary File:

  1. If string ends with ? character, remove the last ? and mark the resulting secondary file as optional.
  2. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  3. Append the remainder of the string to the end of the file path.
streamable
optional

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

format
optional

Only valid when type: File or is an array of items: File.

This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferably defined within an ontology. If no ontology is available, file formats may be tested by exact match.

loadContents
optional

Only valid when type: File or is an array of items: File.

If true, the file (or each file in the array) must be a UTF-8 text file 64 KiB or smaller, and the implementation must read the entire contents of the file (or file array) and place it in the contents field of the File object for use by expressions. If the size of the file is greater than 64 KiB, the implementation must raise a fatal error.

loadListing
optional

Only valid when type: Directory or is an array of items: Directory.

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

The order of precedence for loadListing is:

  1. loadListing on an individual parameter
  2. Inherited from LoadListingRequirement
  3. By default: no_listing
inputBinding
optional

Describes how to turn this object into command line arguments.

5.1.9.1.1 CommandInputEnumSchema §

Fields

field
required
type
description
symbols
required
array<string>

Defines the set of valid symbols.

type
required
constant value enum

Must be enum

name
optional

The identifier for this type

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

inputBinding
optional

Describes how to turn this object into command line arguments.

5.1.9.1.2 CommandInputArraySchema §

Fields

field
required
type
description
type
required
constant value array

Must be array

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

name
optional

The identifier for this type

inputBinding
optional

Describes how to turn this object into command line arguments.

5.2 CommandOutputParameter §

An output parameter for a CommandLineTool.

Fields

field
required
type
description
type
required

Specify valid types of data that may be assigned to this parameter.

label
optional

A short, human-readable label of this object.

secondaryFiles
optional

Only valid when type: File or is an array of items: File.

Provides a pattern or expression specifying files or directories that should be included alongside the primary file. Secondary files may be required or optional. When not explicitly specified, secondary files specified for inputs are required and outputs are optional. An implementation must include matching Files and Directories in the secondaryFiles property of the primary file. These Files and Directories must be transferred and staged alongside the primary file. An implementation may fail workflow execution if a required secondary file does not exist.

If the value is an expression, the value of self in the expression must be the primary input or output File object to which this binding applies. The basename, nameroot and nameext fields must be present in self. For CommandLineTool outputs the path field must also be present. The expression must return a filename string relative to the path to the primary File, a File or Directory object with either path or location and basename fields set, or an array consisting of strings or File or Directory objects. It is legal to reference an unchanged File or Directory object taken from input as a secondaryFile. The expression may return "null" in which case there is no secondaryFile from that expression.

To work on non-filename-preserving storage systems, portable tool descriptions should avoid constructing new values from location, but should construct relative references using basename or nameroot instead.

If a value in secondaryFiles is a string that is not an expression, it specifies that the following pattern should be applied to the path of the primary file to yield a filename relative to the primary File:

  1. If string ends with ? character, remove the last ? and mark the resulting secondary file as optional.
  2. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  3. Append the remainder of the string to the end of the file path.
streamable
optional

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

id
optional

The unique identifier for this object.

format
optional

Only valid when type: File or is an array of items: File.

This is the file format that will be assigned to the output File object.

outputBinding
optional

Describes how to generate this output object based on the files produced by a CommandLineTool

5.2.1 stdout §

Only valid as a type for a CommandLineTool output with no outputBinding set.

The following

outputs:
  an_output_name:
    type: stdout

stdout: a_stdout_file

is equivalent to

outputs:
  an_output_name:
    type: File
    streamable: true
    outputBinding:
      glob: a_stdout_file

stdout: a_stdout_file

If there is no stdout name provided, a random filename will be created. For example, the following

outputs:
  an_output_name:
    type: stdout

is equivalent to

outputs:
  an_output_name:
    type: File
    streamable: true
    outputBinding:
      glob: random_stdout_filenameABCDEFG

stdout: random_stdout_filenameABCDEFG

If the CommandLineTool contains logically chained commands (e.g. echo a && echo b) stdout must include the output of every command.

Symbols

symboldescription
stdout

5.2.2 stderr §

Only valid as a type for a CommandLineTool output with no outputBinding set.

The following

outputs:
  an_output_name:
  type: stderr

stderr: a_stderr_file

is equivalent to

outputs:
  an_output_name:
    type: File
    streamable: true
    outputBinding:
      glob: a_stderr_file

stderr: a_stderr_file

If there is no stderr name provided, a random filename will be created. For example, the following

outputs:
  an_output_name:
    type: stderr

is equivalent to

outputs:
  an_output_name:
    type: File
    streamable: true
    outputBinding:
      glob: random_stderr_filenameABCDEFG

stderr: random_stderr_filenameABCDEFG

Symbols

symboldescription
stderr

5.2.3 CommandOutputRecordSchema §

Fields

field
required
type
description
type
required
constant value record

Must be record

fields
optional

Defines the fields of the record.

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

name
optional

The identifier for this type

5.2.4 CommandOutputRecordField §

Fields

field
required
type
description
name
required

The name of the field

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

label
optional

A short, human-readable label of this object.

secondaryFiles
optional

Only valid when type: File or is an array of items: File.

Provides a pattern or expression specifying files or directories that should be included alongside the primary file. Secondary files may be required or optional. When not explicitly specified, secondary files specified for inputs are required and outputs are optional. An implementation must include matching Files and Directories in the secondaryFiles property of the primary file. These Files and Directories must be transferred and staged alongside the primary file. An implementation may fail workflow execution if a required secondary file does not exist.

If the value is an expression, the value of self in the expression must be the primary input or output File object to which this binding applies. The basename, nameroot and nameext fields must be present in self. For CommandLineTool outputs the path field must also be present. The expression must return a filename string relative to the path to the primary File, a File or Directory object with either path or location and basename fields set, or an array consisting of strings or File or Directory objects. It is legal to reference an unchanged File or Directory object taken from input as a secondaryFile. The expression may return "null" in which case there is no secondaryFile from that expression.

To work on non-filename-preserving storage systems, portable tool descriptions should avoid constructing new values from location, but should construct relative references using basename or nameroot instead.

If a value in secondaryFiles is a string that is not an expression, it specifies that the following pattern should be applied to the path of the primary file to yield a filename relative to the primary File:

  1. If string ends with ? character, remove the last ? and mark the resulting secondary file as optional.
  2. If string begins with one or more caret ^ characters, for each caret, remove the last file extension from the path (the last period . and all following characters). If there are no file extensions, the path is unchanged.
  3. Append the remainder of the string to the end of the file path.
streamable
optional

Only valid when type: File or is an array of items: File.

A value of true indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: false.

format
optional

Only valid when type: File or is an array of items: File.

This is the file format that will be assigned to the output File object.

outputBinding
optional

Describes how to generate this output object based on the files produced by a CommandLineTool

5.2.4.1 CommandOutputEnumSchema §

Fields

field
required
type
description
symbols
required
array<string>

Defines the set of valid symbols.

type
required
constant value enum

Must be enum

name
optional

The identifier for this type

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

5.2.4.2 CommandOutputArraySchema §

Fields

field
required
type
description
type
required
constant value array

Must be array

label
optional

A short, human-readable label of this object.

doc
optional
string | array<string>

A documentation string for this object, or an array of strings which should be concatenated.

name
optional

The identifier for this type

5.2.4.3 CommandOutputBinding §

Describes how to generate an output parameter based on the files produced by a CommandLineTool.

The output parameter value is generated by applying these operations in the following order:

  • glob
  • loadContents
  • outputEval
  • secondaryFiles

Fields

field
required
type
description
loadContents
optional

Only valid when type: File or is an array of items: File.

If true, the file (or each file in the array) must be a UTF-8 text file 64 KiB or smaller, and the implementation must read the entire contents of the file (or file array) and place it in the contents field of the File object for use by expressions. If the size of the file is greater than 64 KiB, the implementation must raise a fatal error.

loadListing
optional

Only valid when type: Directory or is an array of items: Directory.

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

The order of precedence for loadListing is:

  1. loadListing on an individual parameter
  2. Inherited from LoadListingRequirement
  3. By default: no_listing
glob
optional

Find files or directories relative to the output directory, using POSIX glob(3) pathname matching. If an array is provided, find files or directories that match any pattern in the array. If an expression is provided, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files/directories which actually exist.

If the value of glob is a relative path pattern (does not begin with a slash '/') then it is resolved relative to the output directory. If the value of the glob is an absolute path pattern (it does begin with a slash '/') then it must refer to a path within the output directory. It is an error if any glob resolves to a path outside the output directory. Specifically this means globs that resolve to paths outside the output directory are illegal.

A glob may match a path within the output directory which is actually a symlink to another file. In this case, the expected behavior is for the resulting File/Directory object to take the basename (and corresponding nameroot and nameext) of the symlink. The location of the File/Directory is implementation dependent, but logically the File/Directory should have the same content as the symlink target. Platforms may stage output files/directories to cloud storage that lack the concept of a symlink. In this case file content and directories may be duplicated, or (to avoid duplication) the File/Directory location may refer to the symlink target.

It is an error if a symlink in the output directory (or any symlink in a chain of links) refers to any file or directory that is not under an input or output directory.

Implementations may shut down a container before globbing output, so globs and expressions must not assume access to the container filesystem except for declared input and output.

outputEval
optional

Evaluate an expression to generate the output value. If glob was specified, the value of self must be an array containing file objects that were matched. If no files were matched, self must be a zero length array; if a single file was matched, the value of self is an array of a single element. The exit code of the process is available in the expression as runtime.exitCode.

Additionally, if loadContents is true, the file must be a UTF-8 text file 64 KiB or smaller, and the implementation must read the entire contents of the file (or file array) and place it in the contents field of the File object for use in outputEval. If the size of the file is greater than 64 KiB, the implementation must raise a fatal error.

If a tool needs to return a large amount of structured data to the workflow, loading the output object from cwl.output.json bypasses outputEval and is not subject to the 64 KiB loadContents limit.

5.3 InlineJavascriptRequirement §

Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolation.

Fields

field
required
type
description
class
required
constant value InlineJavascriptRequirement

Always 'InlineJavascriptRequirement'

expressionLib
optional
array<string>

Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions.

5.4 SchemaDefRequirement §

This field consists of an array of type definitions which must be used when interpreting the inputs and outputs fields. When a type field contains a IRI, the implementation must check if the type is defined in schemaDefs and use that definition. If the type is not found in schemaDefs, it is an error. The entries in schemaDefs must be processed in the order listed such that later schema definitions may refer to earlier schema definitions.

  • Type definitions are allowed for enum and record types only.
  • Type definitions may be shared by defining them in a file and then $include-ing them in the types field.
  • A file can contain a list of type definitions

Fields

field
required
type
description
class
required
constant value SchemaDefRequirement

Always 'SchemaDefRequirement'

types
required

The list of type definitions.

5.5 LoadListingRequirement §

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

Fields

field
required
type
description
class
required
constant value LoadListingRequirement

Always 'LoadListingRequirement'

loadListing
optional

5.6 DockerRequirement §

Indicates that a workflow component should be run in a Docker or Docker-compatible (such as Singularity and udocker) container environment and specifies how to fetch or build the image.

If a CommandLineTool lists DockerRequirement under hints (or requirements), it may (or must) be run in the specified Docker container.

The platform must first acquire or install the correct Docker image as specified by dockerPull, dockerImport, dockerLoad or dockerFile.

The platform must execute the tool in the container using docker run with the appropriate Docker image and tool command line.

The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform should rewrite file paths in the input object to correspond to the Docker bind mounted locations. That is, the platform should rewrite values in the parameter context such as runtime.outdir, runtime.tmpdir and others to be valid paths within the container. The platform must ensure that runtime.outdir and runtime.tmpdir are distinct directories.

When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container.

A container image may specify an ENTRYPOINT and/or CMD. Command line arguments will be appended after all elements of ENTRYPOINT, and will override all elements specified using CMD (in other words, CMD is only used when the CommandLineTool definition produces an empty command line).

Use of implicit ENTRYPOINT or CMD are discouraged due to reproducibility concerns of the implicit hidden execution point (For further discussion, see https://doi.org/10.12688/f1000research.15140.1). Portable CommandLineTool wrappers in which use of a container is optional must not rely on ENTRYPOINT or CMD. CommandLineTools which do rely on ENTRYPOINT or CMD must list DockerRequirement in the requirements section.

Interaction with other requirements §

If EnvVarRequirement is specified alongside a DockerRequirement, the environment variables must be provided to Docker using --env or --env-file and interact with the container's preexisting environment as defined by Docker.

Fields

field
required
type
description
class
required
constant value DockerRequirement

Always 'DockerRequirement'

dockerPull
optional

Specify a Docker image to retrieve using docker pull. Can contain the immutable digest to ensure an exact container is used: dockerPull: ubuntu@sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2

dockerLoad
optional

Specify an HTTP URL from which to download a Docker image using docker load.

dockerFile
optional

Supply the contents of a Dockerfile which will be built using docker build.

dockerImport
optional

Provide HTTP URL to download and gunzip a Docker images using `docker import.

dockerImageId
optional

The image id that will be used for docker run. May be a human-readable image name or the image identifier hash. May be skipped if dockerPull is specified, in which case the dockerPull image id must be used.

dockerOutputDirectory
optional

Set the designated output directory to a specific location inside the Docker container.

5.7 SoftwareRequirement §

A list of software packages that should be configured in the environment of the defined process.

Fields

field
required
type
description
class
required
constant value SoftwareRequirement

Always 'SoftwareRequirement'

packages
required
array<SoftwarePackage> |
map<packagespecs | SoftwarePackage>

The list of software to be configured.

5.8 SoftwarePackage §

Fields

field
required
type
description
package
required

The name of the software to be made available. If the name is common, inconsistent, or otherwise ambiguous it should be combined with one or more identifiers in the specs field.

version
optional
array<string>

The (optional) versions of the software that are known to be compatible.

specs
optional
array<string>

One or more IRIs identifying resources for installing or enabling the software named in the package field. Implementations may provide resolvers which map these software identifier IRIs to some configuration action; or they can use only the name from the package field on a best effort basis.

For example, the IRI https://packages.debian.org/bowtie could be resolved with apt-get install bowtie. The IRI https://anaconda.org/bioconda/bowtie could be resolved with conda install -c bioconda bowtie.

IRIs can also be system independent and used to map to a specific software installation or selection mechanism. Using RRID as an example: https://identifiers.org/rrid/RRID:SCR_005476 could be fulfilled using the above-mentioned Debian or bioconda package, a local installation managed by Environment Modules, or any other mechanism the platform chooses. IRIs can also be from identifier sources that are discipline specific yet still system independent. As an example, the equivalent ELIXIR Tools and Data Service Registry IRI to the previous RRID example is https://bio.tools/tool/bowtie2/version/2.2.8. If supported by a given registry, implementations are encouraged to query these system independent software identifier IRIs directly for links to packaging systems.

A site specific IRI can be listed as well. For example, an academic computing cluster using Environment Modules could list the IRI https://hpc.example.edu/modules/bowtie-tbb/1.22 to indicate that module load bowtie-tbb/1.1.2 should be executed to make available bowtie version 1.1.2 compiled with the TBB library prior to running the accompanying Workflow or CommandLineTool. Note that the example IRI is specific to a particular institution and computing environment as the Environment Modules system does not have a common namespace or standardized naming convention.

This last example is the least portable and should only be used if mechanisms based off of the package field or more generic IRIs are unavailable or unsuitable. While harmless to other sites, site specific software IRIs should be left out of shared CWL descriptions to avoid clutter.

5.9 InitialWorkDirRequirement §

Define a list of files and subdirectories that must be staged by the workflow platform prior to executing the command line tool. Normally files are staged within the designated output directory. However, when running inside containers, files may be staged at arbitrary locations, see discussion for Dirent.entryname. Together with DockerRequirement.dockerOutputDirectory it is possible to control the locations of both input and output files when running in containers.

Fields

field
required
type
description
class
required
constant value InitialWorkDirRequirement

InitialWorkDirRequirement

listing
required

The list of files or subdirectories that must be staged prior to executing the command line tool.

Return type of each expression must validate as ["null", File, Directory, Dirent, {type: array, items: [File, Directory]}].

Each File or Directory that is returned by an Expression must be added to the designated output directory prior to executing the tool.

Each Dirent record that is listed or returned by an expression specifies a file to be created or staged in the designated output directory prior to executing the tool.

Expressions may return null, in which case they have no effect.

Files or Directories which are listed in the input parameters and appear in the InitialWorkDirRequirement listing must have their path set to their staged location. If the same File or Directory appears more than once in the InitialWorkDirRequirement listing, the implementation must choose exactly one value for path; how this value is chosen is undefined.

5.9.1 Dirent §

Define a file or subdirectory that must be staged to a particular place prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template.

Usually files are staged within the designated output directory. However, under certain circumstances, files may be staged at arbitrary locations, see discussion for entryname.

Fields

field
required
type
description
entry
required

If the value is a string literal or an expression which evaluates to a string, a new text file must be created with the string as the file contents.

If the value is an expression that evaluates to a File or Directory object, or an array of File or Directory objects, this indicates the referenced file or directory should be added to the designated output directory prior to executing the tool.

If the value is an expression that evaluates to null, nothing is added to the designated output directory, the entry has no effect.

If the value is an expression that evaluates to some other array, number, or object not consisting of File or Directory objects, a new file must be created with the value serialized to JSON text as the file contents. The JSON serialization behavior should match the behavior of string interpolation of Parameter references.

entryname
optional

The "target" name of the file or subdirectory. If entry is a File or Directory, the entryname field overrides the value of basename of the File or Directory object.

  • Required when entry evaluates to file contents only
  • Optional when entry evaluates to a File or Directory object with a basename
  • Invalid when entry evaluates to an array of File or Directory objects.

If entryname is a relative path, it specifies a name within the designated output directory. A relative path starting with ../ or that resolves to location above the designated output directory is an error.

If entryname is an absolute path (starts with a slash /) it is an error unless the following conditions are met:

  • DockerRequirement is present in requirements
  • The program is will run inside a software container where, from the perspective of the program, the root filesystem is not shared with any other user or running program.

In this case, and the above conditions are met, then entryname may specify the absolute path within the container where the file or directory must be placed.

writable
optional

If true, the File or Directory (or array of Files or Directories) declared in entry must be writable by the tool.

Changes to the file or directory must be isolated and not visible by any other CommandLineTool process. This may be implemented by making a copy of the original file or directory.

Disruptive changes to the referenced file or directory must not be allowed unless InplaceUpdateRequirement.inplaceUpdate is true.

Default false (files and directories read-only by default).

A directory marked as writable: true implies that all files and subdirectories are recursively writable as well.

If writable is false, the file may be made available using a bind mount or file system link to avoid unnecessary copying of the input file. Command line tools may receive an error on attempting to rename or delete files or directories that are not explicitly marked as writable.

5.10 EnvVarRequirement §

Define a list of environment variables which will be set in the execution environment of the tool. See EnvironmentDef for details.

Fields

field
required
type
description
class
required
constant value EnvVarRequirement

Always 'EnvVarRequirement'

envDef
required
array<EnvironmentDef> |
map<envNameenvValue | EnvironmentDef>

The list of environment variables.

5.11 EnvironmentDef §

Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input.

Fields

field
required
type
description
envName
required

The environment variable name

envValue
required

The environment variable value

5.12 ShellCommandRequirement §

Modify the behavior of CommandLineTool to generate a single string containing a shell command line. Each item in the arguments list must be joined into a string separated by single spaces and quoted to prevent interpretation by the shell, unless CommandLineBinding for that argument contains shellQuote: false. If shellQuote: false is specified, the argument is joined into the command string without quoting, which allows the use of shell metacharacters such as | for pipes.

Fields

field
required
type
description
class
required
constant value ShellCommandRequirement

Always 'ShellCommandRequirement'

5.13 ResourceRequirement §

Specify basic hardware resource requirements.

"min" is the minimum amount of a resource that must be reserved to schedule a job. If "min" cannot be satisfied, the job should not be run.

"max" is the maximum amount of a resource that the job shall be allocated. If a node has sufficient resources, multiple jobs may be scheduled on a single node provided each job's "max" resource requirements are met. If a job attempts to exceed its resource allocation, an implementation may deny additional resources, which may result in job failure.

If both "min" and "max" are specified, an implementation may choose to allocate any amount between "min" and "max", with the actual allocation provided in the runtime object.

If "min" is specified but "max" is not, then "max" == "min" If "max" is specified by "min" is not, then "min" == "max".

It is an error if max < min.

It is an error if the value of any of these fields is negative.

If neither "min" nor "max" is specified for a resource, use the default values below.

Fields

field
required
type
description
class
required
constant value ResourceRequirement

Always 'ResourceRequirement'

coresMin
optional

Minimum reserved number of CPU cores (default is 1).

May be a fractional value to indicate to a scheduling algorithm that one core can be allocated to multiple jobs. For example, a value of 0.25 indicates that up to 4 jobs may run in parallel on 1 core. A value of 1.25 means that up to 3 jobs can run on a 4 core system (4/1.25 ≈ 3).

Processes can only share a core allocation if the sum of each of their ramMax, tmpdirMax, and outdirMax requests also do not exceed the capacity of the node.

Processes sharing a core must have the same level of isolation (typically a container or VM) that they would normally have.

The reported number of CPU cores reserved for the process, which is available to expressions on the CommandLineTool as runtime.cores, must be a non-zero integer, and may be calculated by rounding up the cores request to the next whole number.

Scheduling systems may allocate fractional CPU resources by setting quotas or scheduling weights. Scheduling systems that do not support fractional CPUs may round up the request to the next whole number.

coresMax
optional

Maximum reserved number of CPU cores.

See coresMin for discussion about fractional CPU requests.

ramMin
optional

Minimum reserved RAM in mebibytes (2**20) (default is 256)

May be a fractional value. If so, the actual RAM request must be rounded up to the next whole number. The reported amount of RAM reserved for the process, which is available to expressions on the CommandLineTool as runtime.ram, must be a non-zero integer.

ramMax
optional

Maximum reserved RAM in mebibytes (2**20)

See ramMin for discussion about fractional RAM requests.

tmpdirMin
optional

Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) (default is 1024)

May be a fractional value. If so, the actual storage request must be rounded up to the next whole number. The reported amount of storage reserved for the process, which is available to expressions on the CommandLineTool as runtime.tmpdirSize, must be a non-zero integer.

tmpdirMax
optional

Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20)

See tmpdirMin for discussion about fractional storage requests.

outdirMin
optional

Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) (default is 1024)

May be a fractional value. If so, the actual storage request must be rounded up to the next whole number. The reported amount of storage reserved for the process, which is available to expressions on the CommandLineTool as runtime.outdirSize, must be a non-zero integer.

outdirMax
optional

Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20)

See outdirMin for discussion about fractional storage requests.

5.14 WorkReuse §

For implementations that support reusing output from past work (on the assumption that same code and same input produce same results), control whether to enable or disable the reuse behavior for a particular tool or step (to accommodate situations where that assumption is incorrect). A reused step is not executed but instead returns the same output as the original execution.

If WorkReuse is not specified, correct tools should assume it is enabled by default.

Fields

field
required
type
description
class
required
constant value WorkReuse

Always 'WorkReuse'

enableReuse
required

5.15 NetworkAccess §

Indicate whether a process requires outgoing IPv4/IPv6 network access. Choice of IPv4 or IPv6 is implementation and site specific, correct tools must support both.

If networkAccess is false or not specified, tools must not assume network access, except for localhost (the loopback device).

If networkAccess is true, the tool must be able to make outgoing connections to network resources. Resources may be on a private subnet or the public Internet. However, implementations and sites may apply their own security policies to restrict what is accessible by the tool.

Enabling network access does not imply a publicly routable IP address or the ability to accept inbound connections.

Fields

field
required
type
description
class
required
constant value NetworkAccess

Always 'NetworkAccess'

networkAccess
required

5.16 InplaceUpdateRequirement §

If inplaceUpdate is true, then an implementation supporting this feature may permit tools to directly update files with writable: true in InitialWorkDirRequirement. That is, as an optimization, files may be destructively modified in place as opposed to copied and updated.

An implementation must ensure that only one workflow step may access a writable file at a time. It is an error if a file which is writable by one workflow step file is accessed (for reading or writing) by any other workflow step running independently. However, a file which has been updated in a previous completed step may be used as input to multiple steps, provided it is read-only in every step.

Workflow steps which modify a file must produce the modified file as output. Downstream steps which further process the file must use the output of previous steps, and not refer to a common input (this is necessary for both ordering and correctness).

Workflow authors should provide this in the hints section. The intent of this feature is that workflows produce the same results whether or not InplaceUpdateRequirement is supported by the implementation, and this feature is primarily available as an optimization for particular environments.

Users and implementers should be aware that workflows that destructively modify inputs may not be repeatable or reproducible. In particular, enabling this feature implies that WorkReuse should not be enabled.

Fields

field
required
type
description
class
required
constant value InplaceUpdateRequirement

Always 'InplaceUpdateRequirement'

inplaceUpdate
required

5.17 ToolTimeLimit §

Set an upper limit on the execution time of a CommandLineTool. A CommandLineTool whose execution duration exceeds the time limit may be preemptively terminated and considered failed. May also be used by batch systems to make scheduling decisions. The execution duration excludes external operations, such as staging of files, pulling a docker image etc, and only counts wall-time for the execution of the command line itself.

Fields

field
required
type
description
class
required
constant value ToolTimeLimit

Always 'ToolTimeLimit'

timelimit
required

The time limit, in seconds. A time limit of zero means no time limit. Negative time limits are an error.

5.18 CWLVersion §

Version symbols for published CWL document versions.

Symbols

symboldescription
draft-2
draft-3.dev1
draft-3.dev2
draft-3.dev3
draft-3.dev4
draft-3.dev5
draft-3
draft-4.dev1
draft-4.dev2
draft-4.dev3
v1.0.dev4
v1.0
v1.1.0-dev1
v1.1
v1.2.0-dev1
v1.2.0-dev2
v1.2.0-dev3
v1.2.0-dev4
v1.2.0-dev5
v1.2