Architecture scorecard

This page records AGILAB’s architecture self-assessment from repository evidence. It is not a production MLOps certification, not a security certification, and not a multi-tenant production platform score.

Current supported score

4.7 / 5 for the evidence-first workbench architecture.

The scope is deliberately narrow: AGILAB has an excellent architecture for turning AI/ML experiments, notebooks, app runs, and agent-assisted workflows into replayable evidence. For this score, hardened shared/team use is go when explicit gates pass: strict security-check evidence, per-profile SBOM and vulnerability scans, reviewed external apps, bounded resources, and deployment-specific secrets, network, and UI controls. Multi-tenant production use remains outside this score because tenant isolation, enterprise auth, RBAC, production rollback, and regulated serving remain deployment responsibilities outside the current AGILAB core.

Executable scorecard

Run the scorecard locally:

uv --preview-features extra-build-dependencies run python tools/architecture_scorecard.py --compact

The production-readiness profile also consumes this scorecard:

uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile production-readiness

What must stay true

Architecture dimension	Excellent means	Evidence gate
Control, payload, and evidence planes	UI, CLI, notebooks, manager runtime, worker runtime, artifacts, and proof files have clear responsibilities.	`architecture_plane_boundaries`
Runtime guardrails	Known bad states fail closed for public UI binds, cluster shares, missing manifests, notebook imports, service health, and routes.	`architecture_runtime_guardrails`
Supply chain and release proof	Release artifacts, provenance, SBOM/audit planning, and release proof are checked by tooling rather than prose.	`architecture_supply_chain_release_proof`
Remote execution hardening	Dynamic SSH command fragments such as worker paths, scheduler addresses, and PID paths are quoted by a central builder.	`architecture_remote_execution_hardening`
Capacity model trust boundary	The optional pickle capacity predictor is loaded only from the trusted resources root, world-writable files are refused, and the model hash is verified from a sidecar manifest before deserialization.	`architecture_capacity_model_trust_boundary`
Hardening gap register	The remaining reasons the architecture is not scored as a general multi-tenant production platform are recorded in a machine-readable register with evidence requirements.	`architecture_hardening_gap_register`
Claim boundary	Public wording says exactly what the architecture proves and does not promote roadmap or production-platform claims as shipped features.	`architecture_claim_boundary`

Remaining hardening register

The score is intentionally below 5 / 5. The checked gap register is stored in docs/source/data/architecture_hardening_gaps.json and covers the remaining production-hardening surfaces: tenant isolation, enterprise auth and RBAC, rollback semantics, and regulated serving. The former capacity-model hash control is kept in the register as shipped evidence so regressions are visible.

This makes the score harder to inflate accidentally. A future score increase requires moving one of those entries from conditional evidence to shipped, tested evidence and updating the register in the same change.

Score movement rule

The score can increase only when a repository check links the claim to an executable report, test, manifest, workflow, or public proof artifact. It must decrease or become conditional when the evidence is missing, advisory-only, or depends on manual trust.

Use this page as the architecture evidence index. Use Architecture in 5 minutes for the mental model, AGILab Architecture for the full stack reference, and AGILab in the MLOps Toolchain for the production boundary.