Kargo Deep Dive: Verification, Soak Times, and Reusable Promotion Tasks

In my introduction to Kargo, I covered the fundamentals: Warehouses detect new artifacts, Freight packages them into promotable units, and Stages define how changes flow through your environments. If you followed along, you have the building blocks for a basic promotion pipeline. But a production pipeline needs more than just the ability to push changes forward. It needs quality gates that prevent bad releases from reaching users, time-based controls that let deployments bake before moving downstream, and reusable promotion logic that does not require copy-pasting YAML across every Stage.

This post covers the three features that turn a basic Kargo pipeline into a production-grade one: verification with AnalysisTemplates, soak time requirements, and PromotionTasks for reusable promotion workflows. These features shipped across Kargo v1.0 through v1.4 and are all stable enough for production use.

Post-Promotion Verification

The simplest Kargo pipeline promotes Freight through Stages without checking whether the promotion actually worked. Argo CD will eventually reconcile the desired state and report health status, but that only tells you whether the Kubernetes resources are running. It does not tell you whether the application is behaving correctly. Verification closes that gap.

How Verification Works

After a successful Promotion, a Stage enters the Verifying phase. Kargo spawns an AnalysisRun based on the AnalysisTemplates referenced in the Stage's spec.verification field. The AnalysisRun executes whatever checks you have defined, whether that is running integration tests in a Job, querying Prometheus for error rates, or hitting a health endpoint. When the AnalysisRun completes successfully, the Freight is marked as verified in that Stage and becomes eligible for promotion downstream. If it fails, the Freight stays unverified and cannot move forward.

One important constraint: while a Stage is verifying, no other Promotions to that Stage will execute. This prevents a race condition where a new version could overwrite the one being tested. Verification must complete (or be manually aborted) before the Stage accepts new work.

Implicit vs Explicit Verification

If your Stage references Argo CD Applications but does not define any spec.verification, Kargo still performs a lightweight check. It waits for the referenced Applications to reach a Healthy state before marking the Freight as verified. This is implicit verification, and it provides a baseline safety net. The Application must finish syncing and all its resources (Deployments, StatefulSets, Jobs) must report healthy before the pipeline moves on.

Explicit verification gives you much more control. You define AnalysisTemplates that run specific tests against the deployed application. Here is a minimal example that runs a containerized integration test:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: integration-test
  namespace: my-project
spec:
  metrics:
  - name: integration-test
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: test-runner
                image: my-registry/integration-tests:latest
                env:
                - name: TARGET_URL
                  value: "http://my-app.my-project.svc.cluster.local"
              restartPolicy: Never
          backoffLimit: 1

This template tells Kargo to run a Job that executes your integration test suite against the application's in-cluster service URL. If the Job exits with a zero status code, verification passes. If it fails, the Freight is not verified.

Referencing AnalysisTemplates from a Stage

To wire this up, add a verification block to your Stage spec:

apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: uat
  namespace: my-project
spec:
  requestedFreight:
    - sources:
        stages:
          - test
      origin:
        kind: Warehouse
        name: my-warehouse
  promotionTemplate:
    spec:
      steps:
        - uses: git-clone
          config:
            repoURL: https://github.com/example/deploy-config.git
            checkout:
              - branch: main
                path: ./src
        - uses: kustomize-set-image
          config:
            path: ./src/environments/uat
            images:
              - image: my-registry/my-app
        - uses: git-commit
          config:
            path: ./src
            messageFromSteps:
              - kustomize-set-image
        - uses: git-push
          config:
            path: ./src
        - uses: argocd-update
          config:
            apps:
              - name: my-app-uat
  verification:
    analysisTemplates:
      - name: integration-test

After the promotion steps complete and Argo CD syncs the application, Kargo spawns the AnalysisRun. You can monitor its progress in the Kargo UI under the Stage's Verifications tab.

Querying Monitoring Systems

Job-based metrics are the most flexible option, but Kargo also supports querying monitoring systems directly through Argo Rollouts metric providers. If you run Prometheus, you can define an AnalysisTemplate that checks error rates after a promotion:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
  namespace: my-project
spec:
  metrics:
  - name: error-rate
    interval: 30s
    count: 5
    successCondition: result[0] < 0.05
    failureLimit: 2
    provider:
      prometheus:
        address: http://prometheus.monitoring.svc.cluster.local:9090
        query: |
          sum(rate(http_requests_total{status=~"5.*",app="my-app",namespace="uat"}[5m]))
          /
          sum(rate(http_requests_total{app="my-app",namespace="uat"}[5m]))

This template queries Prometheus every 30 seconds for five iterations. If the 5xx error rate exceeds 5% in more than two of those checks, verification fails. You can combine this with the integration test template by referencing both from the same Stage:

verification:
  analysisTemplates:
    - name: integration-test
    - name: error-rate-check

Both AnalysisRuns must succeed for the Freight to be marked as verified.

Passing Dynamic Values to AnalysisTemplates

Your verification often needs context about the specific Freight being verified. Kargo supports passing arguments from the Stage to the AnalysisTemplate using expressions:

verification:
  analysisTemplates:
    - name: smoke-test
  args:
    - name: image-tag
      value: ${{ imageFrom("my-registry/my-app").Tag }}
    - name: commit-sha
      value: ${{ commitFrom("https://github.com/example/repo.git").ID }}

Note the ${{ }} expression syntax used in Stage resources. The corresponding AnalysisTemplate declares these as arguments using the standard {{ }} syntax from Argo Rollouts:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: smoke-test
  namespace: my-project
spec:
  args:
    - name: image-tag
    - name: commit-sha
  metrics:
  - name: smoke
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: smoke
                image: my-registry/smoke-tests:latest
                env:
                - name: IMAGE_TAG
                  value: "{{ args.image-tag }}"
                - name: COMMIT_SHA
                  value: "{{ args.commit-sha }}"
              restartPolicy: Never
          backoffLimit: 0

This lets your test suite know exactly which version it is testing, which is useful for reporting and for tests that need to validate version-specific behavior.

ClusterAnalysisTemplates for Shared Verification

If you have verification logic that applies across multiple projects, use a ClusterAnalysisTemplate instead. These are cluster-scoped resources that any Stage in any project can reference:

verification:
  analysisTemplates:
    - name: org-wide-security-scan
      kind: ClusterAnalysisTemplate

This is particularly useful for organization-wide security scans, compliance checks, or baseline health validations that every application must pass before promotion.

Soak Times

Verification tells you whether a deployment is working immediately after promotion. But some problems only surface after the application has been running under real traffic for a while. Memory leaks, connection pool exhaustion, and slow cache invalidation are all examples of issues that pass initial health checks but cause incidents hours later.

Soak times address this by requiring Freight to remain in a Stage for a minimum duration before it becomes eligible for downstream promotion. Even if verification passes immediately, the Freight cannot move forward until the soak period expires.

Configuring Soak Times

Soak times are configured on the downstream Stage's requestedFreight field, not on the upstream Stage itself. This makes sense because the downstream Stage is the one imposing the requirement:

apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: production
  namespace: my-project
spec:
  requestedFreight:
    - origin:
        kind: Warehouse
        name: my-warehouse
      sources:
        stages:
          - uat
        requiredSoakTime: 2h

With this configuration, Freight promoted to UAT must remain there for at least two hours before it can be promoted to production. The soak timer starts when the Freight is successfully verified in the upstream Stage. If verification takes 15 minutes and the soak time is 2 hours, the total wait is 2 hours and 15 minutes.

Valid duration formats include 180s, 30m, 48h, or combinations. Both automated and manual promotions respect soak times, so even an operator manually promoting Freight will be blocked until the period elapses. The one exception is manually approving Freight for a Stage, which bypasses both verification and soak time requirements. This is your escape hatch for emergencies.

Combining Verification and Soak Times

The real power comes from combining these features. Consider a three-stage pipeline:

sequenceDiagram
  participant W@{ "type" : "collections" } as New Freight 
  participant D@{ "type" : "queue" } as Dev
  participant S@{ "type" : "queue" } as Staging
  participant P@{ "type" : "queue" } as Production

  W->>D: Auto-promoted
  Note over D: No verification
  D->>S: Auto-promoted
  Note over S: Integration Tests
  Note over S: Prometheus Checks
  Note over S: 1h Soak
  S->>P: Manually promoted
  Note over P: Integration Tests
  Note over P: Prometheus Checks

When a new image arrives, it flows immediately to dev. Once Argo CD reports the dev application as healthy (implicit verification), it becomes eligible for staging. In staging, Kargo runs integration tests and Prometheus checks. If those pass, the soak timer starts. The Freight must remain healthy in staging for a full hour before it can be promoted to production. When someone manually triggers the production promotion, the same verification suite runs against the production deployment.

This layered approach catches different classes of failures at each stage. Compilation and startup errors are caught in dev. Integration issues are caught by verification in staging. Time-dependent issues surface during the soak period. And the production verification confirms the deployment is healthy in the final environment.

PromotionTasks: Reusable Promotion Workflows

If you look at promotion templates across multiple Stages, you will notice a pattern: the sequence of steps is usually identical, with only a few values changing between environments. The dev Stage clones the same repo, runs the same Kustomize commands, and pushes to the same branch as the production Stage. The only differences are the environment path and the Argo CD application name.

Copy-pasting this YAML across every Stage creates a maintenance burden. When you need to add a step (say, running a linter before commit), you have to update every Stage individually. PromotionTasks solve this by letting you define a reusable sequence of promotion steps that accepts parameters.

Defining a PromotionTask

A PromotionTask is a namespaced Kubernetes resource that declares variables and steps:

apiVersion: kargo.akuity.io/v1alpha1
kind: PromotionTask
metadata:
  name: kustomize-promote
  namespace: my-project
spec:
  vars:
    - name: repoURL
    - name: environment
    - name: argocdApp
    - name: targetBranch
      value: main
  steps:
    - uses: git-clone
      as: clone
      config:
        repoURL: ${{ vars.repoURL }}
        checkout:
          - branch: ${{ vars.targetBranch }}
            path: ./src
    - uses: kustomize-set-image
      as: set-image
      config:
        path: ./src/environments/${{ vars.environment }}
        images:
          - image: my-registry/my-app
    - uses: kustomize-build
      as: build
      config:
        path: ./src/environments/${{ vars.environment }}
        outPath: ./out/${{ vars.environment }}
    - uses: git-commit
      as: commit
      config:
        path: ./src
        messageFromSteps:
          - set-image
    - uses: git-push
      as: push
      config:
        path: ./src
    - uses: argocd-update
      as: sync
      config:
        apps:
          - name: ${{ vars.argocdApp }}

The vars field declares the parameters. Variables without a default value must have a value provided when the PromotionTask is referenced in a Stage, while those with a value field have a default that can be optionally overridden by the stage. Steps reference variables using the $ syntax.

Using a PromotionTask in a Stage

Instead of listing individual steps in the Stage's promotion template, you reference the task:

apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: uat
  namespace: my-project
spec:
  requestedFreight:
    - sources:
        stages:
          - test
      origin:
        kind: Warehouse
        name: my-warehouse
  promotionTemplate:
    spec:
      steps:
        - task:
            name: kustomize-promote
          vars:
            - name: repoURL
              value: https://github.com/example/deploy-config.git
            - name: environment
              value: uat
            - name: argocdApp
              value: my-app-uat
  verification:
    analysisTemplates:
      - name: integration-test

The promotion logic is now defined in one place. If you need to add a step, you update the PromotionTask and every Stage that references it picks up the change.

ClusterPromotionTasks for Cross-Project Reuse

If your organization has a standard promotion workflow that applies across multiple projects, use a ClusterPromotionTask. The syntax is identical to a PromotionTask but the resource is cluster-scoped:

apiVersion: kargo.akuity.io/v1alpha1
kind: ClusterPromotionTask
metadata:
  name: standard-kustomize-promote
spec:
  vars:
    - name: repoURL
    - name: environment
    - name: argocdApp
  steps:
    - uses: git-clone
      as: clone
      config:
        repoURL: ${{ vars.repoURL }}
        checkout:
          - branch: main
            path: ./src
    # ... remaining steps

Reference it from a Stage by specifying the kind:

steps:
  - task:
      name: standard-kustomize-promote
      kind: ClusterPromotionTask
    vars:
      - name: repoURL
        value: https://github.com/example/deploy-config.git
      - name: environment
        value: uat
      - name: argocdApp
        value: my-app-uat

This is powerful for platform teams that want to provide a standardized promotion workflow while still letting application teams customize the specifics through variables.

Task Outputs and Chaining

PromotionTask steps can reference outputs from preceding steps within the same task using task.outputs. A common pattern is opening a pull request and then waiting for it to merge:

steps:
  - uses: git-open-pr
    as: open-pr
    config:
      repoURL: ${{ vars.repoURL }}
      sourceBranch: ${{ vars.sourceBranch }}
      targetBranch: ${{ vars.targetBranch }}
  - uses: git-wait-for-pr
    as: wait-for-pr
    config:
      repoURL: ${{ vars.repoURL }}
      prNumber: ${{ task.outputs['open-pr'].pr.id }}

Tasks can also expose outputs to the parent promotion template using the compose-output step. This lets you chain multiple PromotionTasks together, with downstream tasks consuming outputs from upstream ones.

One constraint to keep in mind: PromotionTask steps cannot reference other PromotionTasks. This prevents circular dependencies and keeps the execution model straightforward. If you need composition, structure it at the promotion template level by sequencing multiple task references.

Operational Tips

A few things I have learned from running these pipelines:

Start without verification and add it incrementally. Get the basic promotion flow working first. Add implicit Argo CD health checks, then soak times, then explicit AnalysisTemplates. Each layer builds on the previous one.

Keep AnalysisTemplate Jobs fast. Verification blocks the entire Stage from accepting new Promotions. If your integration test suite takes 30 minutes, that is 30 minutes where no other version can be promoted to that Stage. Consider running a targeted smoke test for verification and leaving the full suite for CI.

Use ClusterAnalysisTemplates for cross-cutting concerns. Security scans, compliance checks, and baseline health validations are good candidates for cluster-scoped templates. Application-specific tests should stay in project-scoped AnalysisTemplates.

Set soak times based on your observability window. If your monitoring dashboards need 15 minutes of traffic to show meaningful trends, a 15-minute soak is the minimum that makes sense. On the Stage before production, err on the side of longer soak periods.

Use manual Freight approval as an emergency bypass. If you need to push a hotfix through and cannot wait for soak times, manually approving the Freight for a Stage skips both verification and soak requirements. Use this sparingly, but know it exists.

What's Next

Kargo continues to ship features at a steady pace. Version 1.7 introduced oci-download and http-download promotion steps, letting you pull OCI artifacts or remote files directly into your promotion workflows. Version 1.8 added expression-based Freight creation criteria on Warehouses, which solves a real pain point for multi-subscription Warehouses by preventing Freight from being created with incompatible artifact combinations. Most recently, v1.9 shipped live log streaming for verification runs directly in the UI, making it far easier to debug failed AnalysisRuns without leaving the Kargo dashboard.

If you are building promotion pipelines and want to go deeper, the Kargo documentation covers every built-in promotion step and its configuration options. For help designing pipelines for your organization, get in touch.