To trace software back to the source and define the moving parts in a complex supply chain, provenance needs to be there from the very beginning. It’s the verifiable information about software artifacts describing where, when and how something was produced. For higher SLSA levels and more resilient integrity guarantees, provenance requirements are stricter and need a deeper, more technical understanding of the predicate.

This document defines the following predicate type within the in-toto attestation framework:

"predicateType": "https://slsa.dev/provenance/v0.2"

Important: Always use the above string for predicateType rather than what is in the URL bar. The predicateType URI will always resolve to the latest minor version of this specification. See parsing rules for more information.

Purpose

Describe how an artifact or set of artifacts was produced.

This predicate is the recommended way to satisfy the SLSA provenance requirements.

Model

Provenance is an attestation that some entity (builder) produced one or more software artifacts (the subject of an in-toto attestation Statement) by executing some invocation, using some other artifacts as input (materials). The invocation in turn runs the buildConfig, which is a record of what was executed. The builder is trusted to have faithfully recorded the provenance; there is no option but to trust the builder. However, the builder may have performed this operation at the request of some external, possibly untrusted entity. These untrusted parameters are captured in the invocation’s parameters and some of the materials. Finally, the build may have depended on various environmental parameters (environment) that are needed for reproducing the build but that are not under external control.

See Example for a concrete example.

Model Diagram

Schema

{
  // Standard attestation fields:
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [{ ... }],

  // Predicate:
  "predicateType": "https://slsa.dev/provenance/v0.2",
  "predicate": {
    "builder": {
      "id": "<URI>"
    },
    "buildType": "<URI>",
    "invocation": {
      "configSource": {
        "uri": "<URI>",
        "digest": { /* DigestSet */ },
        "entryPoint": "<STRING>"
      },
      "parameters": { /* object */ },
      "environment": { /* object */ }
    },
    "buildConfig": { /* object */ },
    "metadata": {
      "buildInvocationId": "<STRING>",
      "buildStartedOn": "<TIMESTAMP>",
      "buildFinishedOn": "<TIMESTAMP>",
      "completeness": {
        "parameters": true/false,
        "environment": true/false,
        "materials": true/false
      },
      "reproducible": true/false
    },
    "materials": [
      {
        "uri": "<URI>",
        "digest": { /* DigestSet */ }
      }
    ]
  }
}

Parsing rules

This predicate follows the in-toto attestation parsing rules. Summary:

  • Consumers MUST ignore unrecognized fields.
  • The predicateType URI includes the major version number and will always change whenever there is a backwards incompatible change.
  • Minor version changes are always backwards compatible and “monotonic.” Such changes do not update the predicateType.
  • Producers MAY add extension fields using field names that are URIs.
  • Optional fields MAY be unset or null, and should be treated equivalently. Both are equivalent to empty for object or array values.

Fields

NOTE: This section describes the fields within predicate. For a description of the other top-level fields, such as subject, see Statement.

builder object, required

Identifies the entity that executed the invocation, which is trusted to have correctly performed the operation and populated this provenance.

The identity MUST reflect the trust base that consumers care about. How detailed to be is a judgement call. For example, GitHub Actions supports both GitHub-hosted runners and self-hosted runners. The GitHub-hosted runner might be a single identity because it’s all GitHub from the consumer’s perspective. Meanwhile, each self-hosted runner might have its own identity because not all runners are trusted by all consumers.

Consumers MUST accept only specific (signer, builder) pairs. For example, “GitHub” can sign provenance for the “GitHub Actions” builder, and “Google” can sign provenance for the “Google Cloud Build” builder, but “GitHub” cannot sign for the “Google Cloud Build” builder.

Design rationale: The builder is distinct from the signer because one signer may generate attestations for more than one builder, as in the GitHub Actions example above. The field is required, even if it is implicit from the signer, to aid readability and debugging. It is an object to allow additional fields in the future, in case one URI is not sufficient.

builder.id string (TypeURI), required

URI indicating the builder’s identity.

buildType string (TypeURI), required

URI indicating what type of build was performed. It determines the meaning of invocation, buildConfig and materials.

invocation object, optional

Identifies the event that kicked off the build. When combined with materials, this SHOULD fully describe the build, such that re-running this invocation results in bit-for-bit identical output (if the build is reproducible).

MAY be unset/null if unknown, but this is DISCOURAGED.

invocation.configSource object, optional

Describes where the config file that kicked off the build came from. This is effectively a pointer to the source where buildConfig came from.

invocation.configSource.uri string (ResourceURI), optional

URI indicating the identity of the source of the config.

invocation.configSource.digest object (DigestSet), optional

Collection of cryptographic digests for the contents of the artifact specified by invocation.configSource.uri.

invocation.configSource.entryPoint string, optional

String identifying the entry point into the build. This is often a path to a configuration file and/or a target label within that file. The syntax and meaning are defined by buildType. For example, if the buildType were “make”, then this would reference the directory in which to run make as well as which target to use.

Consumers SHOULD accept only specific invocation.entryPoint values. For example, a policy might only allow the “release” entry point but not the “debug” entry point.

MAY be omitted if the buildType specifies a default value.

Design rationale: The entryPoint is distinct from parameters to make it easier to write secure policies without having to parse parameters.

invocation.parameters object, optional

Collection of all external inputs that influenced the build on top of invocation.configSource. For example, if the invocation type were “make”, then this might be the flags passed to make aside from the target, which is captured in invocation.configSource.entryPoint.

Consumers SHOULD accept only “safe” invocation.parameters. The simplest and safest way to achieve this is to disallow any parameters altogether.

This is an arbitrary JSON object with a schema defined by buildType.

This is considered to be incomplete unless metadata.completeness.parameters is true.

invocation.environment object, optional

Any other builder-controlled inputs necessary for correctly evaluating the build. Usually only needed for reproducing the build but not evaluated as part of policy.

This SHOULD be minimized to only include things that are part of the public API, that cannot be recomputed from other values in the provenance, and that actually affect the evaluation of the build. For example, this might include variables that are referenced in the workflow definition, but it SHOULD NOT include a dump of all environment variables or include things like the hostname (assuming hostname is not part of the public API).

This is an arbitrary JSON object with a schema defined by buildType.

This is considered to be incomplete unless metadata.completeness.environment is true.

metadata object, optional

Other properties of the build.

metadata.buildInvocationId string, optional

Identifies this particular build invocation, which can be useful for finding associated logs or other ad-hoc analysis. The exact meaning and format is defined by builder.id; by default it is treated as opaque and case-sensitive. The value SHOULD be globally unique.

metadata.buildStartedOn string (Timestamp), optional

The timestamp of when the build started.

metadata.buildFinishedOn string (Timestamp), optional

The timestamp of when the build completed.

metadata.completeness object, optional

Indicates that the builder claims certain fields in this message to be complete.

metadata.completeness.parameters boolean, optional

If true, the builder claims that invocation.parameters is complete, meaning that all external inputs are properly captured in invocation.parameters.

metadata.completeness.environment boolean, optional

If true, the builder claims that invocation.environment is complete.

metadata.completeness.materials boolean, optional

If true, the builder claims that materials is complete, usually through some controls to prevent network access. Sometimes called “hermetic”.

metadata.reproducible boolean, optional

If true, the builder claims that running invocation on materials will produce bit-for-bit identical output.

buildConfig object, optional

Lists the steps in the build. If invocation.configSource is not available, buildConfig can be used to verify information about the build.

This is an arbitrary JSON object with a schema defined by buildType.

materials array of objects, optional

The collection of artifacts that influenced the build including sources, dependencies, build tools, base images, and so on.

This is considered to be incomplete unless metadata.completeness.materials is true.

materials[*].uri string (ResourceURI), optional

The method by which this artifact was referenced during the build.

TODO: Should we differentiate between the “referenced” URI and the “resolved” URI, e.g. “latest” vs “3.4.1”?

TODO: Should wrap in a locator object to allow for extensibility, in case we add other types of URIs or other non-URI locators?

materials[*].digest object (DigestSet), optional

Collection of cryptographic digests for the contents of this artifact.

Example

WARNING: This is just for demonstration purposes.

Suppose the builder downloaded example-1.2.3.tar.gz, extracted it, and ran make -C src foo CFLAGS=-O3, resulting in a file with hash 5678.... Then the provenance might look like this:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  // Output file; name is "_" to indicate "not important".
  "subject": [{"name": "_", "digest": {"sha256": "5678..."}}],
  "predicateType": "https://slsa.dev/provenance/v0.2",
  "predicate": {
    "buildType": "https://example.com/Makefile",
    "builder": { "id": "mailto:person@example.com" },
    "invocation": {
      "configSource": {
        "uri": "https://example.com/example-1.2.3.tar.gz",
        "digest": {"sha256": "1234..."},
        "entryPoint": "src:foo",                // target "foo" in directory "src"
      },
      "parameters": {"CFLAGS": "-O3"}           // extra args to `make`
    },
    "materials": [{
      "uri": "https://example.com/example-1.2.3.tar.gz",
      "digest": {"sha256": "1234..."}
    }]
  }
}

More examples

GitHub Actions

WARNING: This is only for demonstration purposes. The GitHub Actions team has not yet reviewed or approved this design, and it is not yet implemented. Details are subject to change!

If GitHub is the one to generate provenance, and the runner is GitHub-hosted, then the builder would be as follows:

"builder": {
  "id": "https://github.com/Attestations/GitHubHostedActions@v1"
}

Self-hosted runner: Not yet supported. We need to figure out a URI scheme that represents what system hosted the runner, or perhaps add additional properties in builder.

GitHub Actions Workflow

"buildType": "https://github.com/Attestations/GitHubActionsWorkflow@v1",
"invocation": {
  "configSource": {
    "entryPoint": "build.yaml:build",
    // The git repo that contains the build.yaml referenced in the entrypoint.
    "uri": "git+https://github.com/foo/bar.git",
    // The resolved git commit hash reflecting the version of the repo used
    // for this build.
    "digest": {"sha1": "abc..."}
  },
  // The only possible user-defined parameters that can affect the build are the
  // "inputs" to a workflow_dispatch event. This is unset/null for all other
  // events.
  "parameters": {
    "inputs": { ... }
  },
  // Other variables that are required to reproduce the build and that cannot be
  // recomputed using existing information. (Documentation would explain how to
  // recompute the rest of the fields.)
  "environment": {
    // The architecture of the runner.
    "arch": "amd64",
    // Environment variables. These are always set because it is not possible
    // to know whether they were referenced or not.
    "env": {
      "GITHUB_RUN_ID": "1234",
      "GITHUB_RUN_NUMBER": "5678",
      "GITHUB_EVENT_NAME": "push"
    },
    // The context values that were referenced in the workflow definition.
    // Secrets are set to the empty string.
    "context": {
      "github": {
        "run_id": "abcd1234"
      },
      "runner": {
        "os": "Linux",
        "temp": "/tmp/tmp.iizj8l0XhS",
      }
    }
  }
}
"materials": [{
  // The git repo that contains the build.yaml referenced above.
  "uri": "git+https://github.com/foo/bar.git",
  // The resolved git commit hash reflecting the version of the repo used
  // for this build.
  "digest": {"sha1": "abc..."}
}]

GitLab CI

The GitLab CI team has implemented an artifact attestation capability in their GitLab Runner 15.1 release.

If GitLab is the one to generate provenance, and the runner is GitLab-hosted or self-hosted, then the builder would be as follows:

"builder": {
  "id": "https://gitlab.com/foo/bar/-/runners/12345678"
}

GitLab CI Job

"buildType": "https://gitlab.com/gitlab-org/gitlab-runner/-/blob/v15.1.0/PROVENANCE.md",
"invocation": {
  "configSource": {
    // the git repo that contains the GitLab CI job referenced in the entrypoint
    "uri": "https://gitlab.com//foo/bar",
    // The resolved git commit hash reflecting the version of the repo used
    // for this build.
    "digest": {
        "sha256": "abc..."
    },
    // the name of the CI job that triggered the build
    "entryPoint": "build"
  },
  // Other variables that are required to reproduce the build and that cannot be
  // recomputed using existing information. (Documentation would explain how to
  // recompute the rest of the fields.)
  "environment": {
      // Name of the GitLab runner
      "name": "hosted-gitlab-runner",
      // The runner executor
      "executor": "kubernetes",
      // The architecture on which the CI job is run
      "architecture": "amd64"
  },
  // Collection of all external inputs (CI variables) related to the job
  "parameters": {
      "CI_PIPELINE_ID": "",
      "CI_PIPELINE_URL": "",
      // All other CI variable names are listed here. Values are always represented as empty strings to avoid leaking secrets.
  }
},
"metadata": {
  "buildStartedOn": "2022-06-17T00:47:27+03:00",
  "buildFinishedOn": "2022-06-17T00:47:28+03:00",
  "completeness": {
      "parameters": true,
      "environment": true,
      "materials": false
  },
  "reproducible": false
}

Google Cloud Build

WARNING: This is only for demonstration purposes. The Google Cloud Build team has not yet reviewed or approved this design, and it is not yet implemented. Details are subject to change!

If Google is the one to generate provenance, and the worker is Google-hosted, then the builder would be as follows:

"builder": {
  "id": "https://cloudbuild.googleapis.com/GoogleHostedWorker@v1"
}

Custom worker: Not yet supported. We need to figure out a URI scheme that represents what system hosted the worker, or perhaps add additional properties in builder.

Cloud Build config-as-code

Here entryPoint references the filename from the CloudBuild BuildTrigger.

"buildType": "https://cloudbuild.googleapis.com/CloudBuildYaml@v1",
"invocation": {
  // ... in the git repo described by `materials[0]` ...
  "configSource": {
    "entryPoint": "path/to/cloudbuild.yaml",
    // The git repo that contains the cloudbuild.yaml referenced above.
    "uri": "git+https://source.developers.google.com/p/foo/r/bar",
    // The resolved git commit hash reflecting the version of the repo used
    // for this build.
    "digest": {"sha1": "abc..."}
  },
  // The only possible user-defined parameters that can affect a BuildTrigger
  // are the subtitutions in the BuildTrigger.
  "parameters": {
    "substitutions": {...}
  }
}
"buildConfig": {
  // each step in the recipe corresponds to a step in the cloudbuild.yaml
  // the format of this is determined by `buildType`
  "steps": [
    {
      "image": "pkg:docker/make@sha256:244fd47e07d1004f0aed9c",
      "arguments": ["build"]
    }
  ]
}
"materials": [{
  // The git repo that contains the cloudbuild.yaml referenced above.
  "uri": "git+https://source.developers.google.com/p/foo/r/bar",
  // The resolved git commit hash reflecting the version of the repo used
  // for this build.
  "digest": {"sha1": "abc..."}
}]

Cloud Build RPC

Here we list the steps defined in a trigger or over RPC:

"buildType": "https://cloudbuild.googleapis.com/CloudBuildSteps@v1",
"invocation": {
  // Build steps were provided as an argument. No `configSource`
  "parameters": {
    // The substitutions in the build trigger.
    "substitutions": {...}
    // TODO: Any other arguments?
  }
}
"buildConfig": {
  // The steps that were performed. (Format TBD.)
  "steps": [...]
}

Explicitly run commands

WARNING: This is just a proof-of-concept. It is not yet standardized.

Execution of arbitrary commands:

"buildType": "https://example.com/ManuallyRunCommands@v1",
// There was no entry point, and the commands were run in an ad-hoc fashion.
// There is no `configSource`.
"invocation": null,
"buildConfig": {
    // The list of commands that were executed.
    "commands": [
      "tar xvf foo-1.2.3.tar.gz",
      "cd foo-1.2.3",
      "./configure --enable-some-feature",
      "make foo.zip"
    ],
    // Indicates how to parse the strings in `commands`.
    "shell": "bash"
}

Migrating from 0.1

To migrate from version 0.1 (old):

{
  "builder": old.builder,  // (unchanged)
  "buildType": old.recipe.type,
  "invocation": {
    "configSource": {
      "uri": old.materials[old.recipe.definedInMaterial].uri,
      "digest": old.materials[old.recipe.definedInMaterial].digest,
      "entrypoint": old.recipe.entryPoint
    },
    "parameters": old.recipe.arguments,
    "environment": old.recipe.environment   // (unchanged)
  },
  "buildConfig": null,  // no equivalent in 0.1
  "metadata": {
    "buildInvocationId": old.metadata.buildInvocationId,    // (unchanged)
    "buildStartedOn": old.metadata.buildStartedOn,          // (unchanged)
    "buildFinishedOn": old.metadata.buildFinishedOn,        // (unchanged)
    "completeness": {
      "parameters": old.metadata.completeness.arguments,
      "environment": old.metadata.completeness.environment, // (unchanged)
      "materials": old.metadata.completeness.materials,     // (unchanged)
    },
    "reproducible": old.metadata.reproducible               // (unchanged)
  },
  "materials": old.materials  // optionally removing the configSource
}

Change history

  • 0.2: Refactored to aid clarity and added buildConfig. The model is unchanged.
    • Replaced definedInMaterial and entryPoint with configSource.
    • Renamed recipe to invocation.
    • Moved invocation.type to top-level buildType.
    • Renamed arguments to parameters.
    • Added buildConfig, which can be used as an alternative to configSource to validate the configuration.
  • Renamed to “slsa.dev/provenance”.
  • 0.1.1: Added metadata.buildInvocationId.
  • 0.1: Initial version, named “in-toto.io/Provenance”