Proof capsule
The product north star for AGILAB is a portable proof capsule: a reviewable bundle that lets another operator verify what ran, where it ran, which artifacts were produced, and how the work can be replayed or handed off.
AGILAB now ships a first proof-pack layer around run_manifest.json. It is a
directory of plain JSON evidence, not yet a signed .agipack archive.
Why this matters
Most AI/ML tools can track metrics, launch pipelines, or host notebooks. The harder product gap is a compact handoff object for experimental work: code, runtime context, visible UI evidence, generated artifacts, dependency state, and supply-chain evidence kept together with enough metadata to audit or rerun the work later.
AGILAB’s strongest long-term position is that handoff layer:
MLflow tracks experiment runs and artifacts.
AGILAB turns notebooks and scripts into controlled executable applications.
A proof capsule should preserve the evidence needed to review, compare, replay, or promote that application outside the original developer session.
Capsule contents
A complete proof capsule should contain these parts:
Layer |
Capsule content |
Current AGILAB building block |
|---|---|---|
Execution |
App path, command, runtime mode, platform, Python version, duration, success status, and failure diagnostics. |
|
Application snapshot |
Stage contract, app metadata, selected settings, and safe paths needed to rerun the application. |
|
Notebook bridge |
Imported notebook provenance or exported runnable |
WORKFLOW notebook import/export and notebook export manifests. |
Tracking handoff |
MLflow run identifiers or exported tracking metadata when MLflow is enabled. |
Optional MLflow integration and run artifact handoff. |
Visible evidence |
Screenshots, UI robot progress logs, failure bundles, traces, HAR, and video when captured by the validation robot. |
UI robot evidence, visual baselines, and failure replay artifacts. |
Artifact inventory |
Output files, hashes, schema labels, summaries, and comparison metadata. |
ANALYSIS artifacts, release-decision evidence, and run-diff reports. |
Environment |
Dependency lock information, wheel hashes, package versions, platform markers, and optional extras actually used. |
Release proof, profile supply-chain scans, and package metadata. |
Supply chain |
SBOM, |
Release workflow SBOM, audit, trusted publishing, and provenance checks. |
Human summary |
A short machine-readable and human-readable conclusion: what passed, what failed, what is out of scope, and what to do next. |
Adoption reports, release proof, compatibility matrix, and security checks. |
Target CLI shape
The shipped first layer operates on a run manifest:
agilab prove ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack
agilab verify ~/log/execute/flight_telemetry/run_manifest.json --strict
agilab replay ~/log/execute/flight_telemetry/run_manifest.json
agilab export-lineage ~/log/execute/flight_telemetry/run_manifest.json --format all --output-dir proof-pack
agilab policy-check ~/log/execute/flight_telemetry/run_manifest.json --strict
agilab cards ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack
agilab metadata-store ~/log/execute/flight_telemetry/run_manifest.json --store ~/.agilab/metadata-store.json
The proof pack includes:
a verification report
a small policy report
OpenLineage-shaped JSON
RO-Crate metadata
OpenTelemetry-shaped trace JSON
a local metadata-store entry
model, dataset, prompt, and evaluation cards generated from available manifest evidence
Replay is safe by default: agilab replay prints the recorded command and
requires --execute before launching it.
The reserved archive shape remains roadmap work:
agilab prove . --profile audit --export proof.agipack
agilab verify proof.agipack
agilab replay proof.agipack
Until a signed archive verifier exists, keep using the existing first-proof and adoption commands as the entry evidence:
agilab first-proof --json --with-ui
agilab adoption-report
agilab security-check --profile shared --json
Roadmap boundary
The following items remain planned work, not shipped capability:
signed
.agipackarchives with detached hashes and Sigstore/SLSA referencestransport to an external OpenLineage backend
native OpenTelemetry SDK/OTLP spans across UI, worker build, distributed execution, notebook export, MLflow handoff, and agent runs
durable ML metadata storage and query APIs
app-authored model/data/prompt/eval cards with domain metadata
richer policy-as-code, including potential OPA/Rego-compatible gates
capability-based sandboxing for generated code, notebooks, and agent runs
first-class agent eval traces and replayable scoring
production monitoring, drift, RBAC, secrets-backend, and tenant-isolation integrations
Adoption rule
A proof capsule is promotion evidence, not a production certification. It should make a controlled experiment reviewable and repeatable; production serving, monitoring, RBAC, multi-tenant isolation, and regulated audit trails remain responsibilities of the hardened platform AGILAB hands off to.