Proof capsule ============= The product north star for AGILAB is a portable proof capsule: a reviewable bundle that lets another operator verify what ran, where it ran, which artifacts were produced, and how the work can be replayed or handed off. AGILAB now ships a first proof-pack layer around ``run_manifest.json``. It is a directory of plain JSON evidence, not yet a signed ``.agipack`` archive. Why this matters ---------------- Most AI/ML tools can track metrics, launch pipelines, or host notebooks. The harder product gap is a compact handoff object for experimental work: code, runtime context, visible UI evidence, generated artifacts, dependency state, and supply-chain evidence kept together with enough metadata to audit or rerun the work later. AGILAB's strongest long-term position is that handoff layer: * MLflow tracks experiment runs and artifacts. * AGILAB turns notebooks and scripts into controlled executable applications. * A proof capsule should preserve the evidence needed to review, compare, replay, or promote that application outside the original developer session. Capsule contents ---------------- A complete proof capsule should contain these parts: .. list-table:: :header-rows: 1 * - Layer - Capsule content - Current AGILAB building block * - Execution - App path, command, runtime mode, platform, Python version, duration, success status, and failure diagnostics. - ``agilab first-proof --json`` and ``run_manifest.json``. * - Application snapshot - Stage contract, app metadata, selected settings, and safe paths needed to rerun the application. - ``lab_stages.toml``, app settings seeds, and exported run manifests. * - Notebook bridge - Imported notebook provenance or exported runnable ``agi-core`` notebook for handoff. - WORKFLOW notebook import/export and notebook export manifests. * - Tracking handoff - MLflow run identifiers or exported tracking metadata when MLflow is enabled. - Optional MLflow integration and run artifact handoff. * - Visible evidence - Screenshots, UI robot progress logs, failure bundles, traces, HAR, and video when captured by the validation robot. - UI robot evidence, visual baselines, and failure replay artifacts. * - Artifact inventory - Output files, hashes, schema labels, summaries, and comparison metadata. - ANALYSIS artifacts, release-decision evidence, and run-diff reports. * - Environment - Dependency lock information, wheel hashes, package versions, platform markers, and optional extras actually used. - Release proof, profile supply-chain scans, and package metadata. * - Supply chain - SBOM, ``pip-audit`` output, PyPI provenance, GitHub release assets, and attestation references. - Release workflow SBOM, audit, trusted publishing, and provenance checks. * - Human summary - A short machine-readable and human-readable conclusion: what passed, what failed, what is out of scope, and what to do next. - Adoption reports, release proof, compatibility matrix, and security checks. Target CLI shape ---------------- The shipped first layer operates on a run manifest: .. code-block:: bash agilab prove ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack agilab verify ~/log/execute/flight_telemetry/run_manifest.json --strict agilab replay ~/log/execute/flight_telemetry/run_manifest.json agilab export-lineage ~/log/execute/flight_telemetry/run_manifest.json --format all --output-dir proof-pack agilab policy-check ~/log/execute/flight_telemetry/run_manifest.json --strict agilab cards ~/log/execute/flight_telemetry/run_manifest.json --output-dir proof-pack agilab metadata-store ~/log/execute/flight_telemetry/run_manifest.json --store ~/.agilab/metadata-store.json The proof pack includes: * a verification report * a small policy report * OpenLineage-shaped JSON * RO-Crate metadata * OpenTelemetry-shaped trace JSON * a local metadata-store entry * model, dataset, prompt, and evaluation cards generated from available manifest evidence Replay is safe by default: ``agilab replay`` prints the recorded command and requires ``--execute`` before launching it. The reserved archive shape remains roadmap work: .. code-block:: bash agilab prove . --profile audit --export proof.agipack agilab verify proof.agipack agilab replay proof.agipack Until a signed archive verifier exists, keep using the existing first-proof and adoption commands as the entry evidence: .. code-block:: bash agilab first-proof --json --with-ui agilab adoption-report agilab security-check --profile shared --json Roadmap boundary ---------------- The following items remain planned work, not shipped capability: * signed ``.agipack`` archives with detached hashes and Sigstore/SLSA references * transport to an external OpenLineage backend * native OpenTelemetry SDK/OTLP spans across UI, worker build, distributed execution, notebook export, MLflow handoff, and agent runs * durable ML metadata storage and query APIs * app-authored model/data/prompt/eval cards with domain metadata * richer policy-as-code, including potential OPA/Rego-compatible gates * capability-based sandboxing for generated code, notebooks, and agent runs * first-class agent eval traces and replayable scoring * production monitoring, drift, RBAC, secrets-backend, and tenant-isolation integrations Adoption rule ------------- A proof capsule is promotion evidence, not a production certification. It should make a controlled experiment reviewable and repeatable; production serving, monitoring, RBAC, multi-tenant isolation, and regulated audit trails remain responsibilities of the hardened platform AGILAB hands off to.