Proof capsule

The product north star for AGILAB is a portable proof capsule: a reviewable bundle that lets another operator verify what ran, where it ran, which artifacts were produced, and how the work can be replayed or handed off.

AGILAB now ships a first proof-pack layer around run_manifest.json. It is a directory of plain JSON evidence, not yet a signed .agipack archive.

Why this matters

Most AI/ML tools can track metrics, launch pipelines, or host notebooks. The harder product gap is a compact handoff object for experimental work: code, runtime context, visible UI evidence, generated artifacts, dependency state, and supply-chain evidence kept together with enough metadata to audit or rerun the work later.

AGILAB’s strongest long-term position is that handoff layer:

  • MLflow tracks experiment runs and artifacts.

  • AGILAB turns notebooks and scripts into controlled executable applications.

  • A proof capsule should preserve the evidence needed to review, compare, replay, or promote that application outside the original developer session.

Capsule contents

A complete proof capsule should contain these parts:

Layer

Capsule content

Current AGILAB building block

Execution

App path, command, runtime mode, platform, Python version, duration, success status, and failure diagnostics.

agilab first-proof --json and run_manifest.json.

Application snapshot

Stage contract, app metadata, selected settings, and safe paths needed to rerun the application.

lab_stages.toml, app settings seeds, and exported run manifests.

Notebook bridge

Imported notebook provenance or exported runnable agi-core notebook for handoff.

WORKFLOW notebook import/export and notebook export manifests.

Tracking handoff

MLflow run identifiers or exported tracking metadata when MLflow is enabled.

Optional MLflow integration and run artifact handoff.

Visible evidence

Screenshots, UI robot progress logs, failure bundles, traces, HAR, and video when captured by the validation robot.

UI robot evidence, visual baselines, and failure replay artifacts.

Artifact inventory

Output files, hashes, schema labels, summaries, and comparison metadata.

ANALYSIS artifacts, release-decision evidence, and run-diff reports.

Environment

Dependency lock information, wheel hashes, package versions, platform markers, and optional extras actually used.

Release proof, profile supply-chain scans, and package metadata.

Supply chain

SBOM, pip-audit output, PyPI provenance, GitHub release assets, and attestation references.

Release workflow SBOM, audit, trusted publishing, and provenance checks.

Human summary

A short machine-readable and human-readable conclusion: what passed, what failed, what is out of scope, and what to do next.

Adoption reports, release proof, compatibility matrix, and security checks.

Target CLI shape

The shipped first layer operates on a run manifest:

agilab prove ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack
agilab verify ~/log/execute/flight_telemetry/run_manifest.json --strict
agilab replay ~/log/execute/flight_telemetry/run_manifest.json
agilab export-lineage ~/log/execute/flight_telemetry/run_manifest.json --format all --output-dir proof-pack
agilab policy-check ~/log/execute/flight_telemetry/run_manifest.json --strict
agilab cards ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack
agilab metadata-store ~/log/execute/flight_telemetry/run_manifest.json --store ~/.agilab/metadata-store.json

The proof pack includes:

  • a verification report

  • a small policy report

  • OpenLineage-shaped JSON

  • RO-Crate metadata

  • OpenTelemetry-shaped trace JSON

  • a local metadata-store entry

  • model, dataset, prompt, and evaluation cards generated from available manifest evidence

Replay is safe by default: agilab replay prints the recorded command and requires --execute before launching it.

The reserved archive shape remains roadmap work:

agilab prove . --profile audit --export proof.agipack
agilab verify proof.agipack
agilab replay proof.agipack

Until a signed archive verifier exists, keep using the existing first-proof and adoption commands as the entry evidence:

agilab first-proof --json --with-ui
agilab adoption-report
agilab security-check --profile shared --json

Roadmap boundary

The following items remain planned work, not shipped capability:

  • signed .agipack archives with detached hashes and Sigstore/SLSA references

  • transport to an external OpenLineage backend

  • native OpenTelemetry SDK/OTLP spans across UI, worker build, distributed execution, notebook export, MLflow handoff, and agent runs

  • durable ML metadata storage and query APIs

  • app-authored model/data/prompt/eval cards with domain metadata

  • richer policy-as-code, including potential OPA/Rego-compatible gates

  • capability-based sandboxing for generated code, notebooks, and agent runs

  • first-class agent eval traces and replayable scoring

  • production monitoring, drift, RBAC, secrets-backend, and tenant-isolation integrations

Adoption rule

A proof capsule is promotion evidence, not a production certification. It should make a controlled experiment reviewable and repeatable; production serving, monitoring, RBAC, multi-tenant isolation, and regulated audit trails remain responsibilities of the hardened platform AGILAB hands off to.