AGILab future work
This page tracks planned work only.
For current shipped capabilities, see Features.
For toolchain fit and framework comparison, see AGILab in the MLOps Toolchain.
The goal here is to rank future work, not to restate the current feature set.
Professional target
AGILab should feel professional when a new team can trust the release, complete the first proof, import or export notebook work, diagnose failures, understand the security boundary, and hand evidence to another tool or reviewer without needing the original developer.
The roadmap therefore prioritizes trust, clarity, and maintainability before adding larger product surfaces.
Professional scorecard
Use this scorecard before promoting a release, demo, app, or major feature as professional-ready.
Area |
Professional bar |
Primary proof |
|---|---|---|
Release trust |
Clean install, release proof, badges, PyPI, docs, and demo text agree |
Release guard and release proof |
First-run UX |
A newcomer can complete one local proof without guessing page actions |
First-proof smoke and UI robot path |
Notebook bridge |
Work can enter from notebooks and leave as reusable notebooks |
Import/export round-trip evidence |
Failure clarity |
Common failures are classified before raw tracebacks |
Diagnostic tests and user-facing errors |
Security boundary |
Shared, public, sensitive, and production use limits are explicit |
Security check and adoption docs |
Team runtime |
Cluster/share/service routes fail fast and explain remediation |
Cluster/share/service health gates |
Evidence handoff |
Runs, artifacts, compatibility, and promotion decisions are portable |
Evidence bundle and release decision |
Maintainability |
New features extend tested contracts instead of page-specific glue |
Contract tests and pattern guardrails |
Ecosystem quality |
Published apps are named, documented, installable, runnable, and scoped |
App package smoke and README checks |
Phase plan
Use phases as the product sequence. Dates can move; ordering should not move unless a higher-priority item is explicitly accepted as a risk.
Phase |
Focus |
Exit gate |
|---|---|---|
Phase 0 |
Release trust and docs alignment |
Clean release lane, fresh docs mirror, green badge guard, no stale public claims |
Phase 1 |
Newcomer first proof and notebook parity |
Built-in and notebook first proofs install, execute, and open analysis predictably |
Phase 2 |
Diagnostics, security, and team readiness |
Failures are classified; shared/team/cluster use has explicit checks and limits |
Phase 3 |
Evidence and data integration |
Promotion evidence, run diff, connectors, and provenance are consumable outside the UI |
Phase 4 |
Maintainable extension model |
Apps, pages, notebooks, connectors, reducers, and evidence reports follow stable contracts |
Phase 5 |
Product expansion |
Multi-app DAG, operator mode, observability, and MLOps handoff build on the stable baseline |
Sequencing rules
Fix release trust before adding feature breadth.
Fix first-run UX before asking users to try clusters or service mode.
Prove notebook import parity before advertising notebook migration broadly.
Add diagnostics before broadening team and cluster validation.
Add security and supply-chain gates before shared, exposed, or sensitive use.
Standardize evidence schemas before adding dashboards.
Stabilize extension contracts before publishing more apps.
Productize multi-app DAGs only after the first-run, runtime, evidence, and contract layers are stable.
Professionalization priority order
Use this order when the goal is to make AGILab feel professional, adoptable, and maintainable rather than just richer in features.
P0. Release and runtime integrity
Goal:
every public release can be installed, launched, and validated from a clean public environment without relying on the developer checkout
Concrete items:
keep the release guard as the mandatory pre-tag path: install smoke, first-proof, security check, docs mirror check, badge freshness, dependency policy, and trusted-publisher contract
keep the imported-notebook release smoke in the mandatory release-proof profile, not as a separate best-effort demo path
keep the exported-notebook handoff smoke in the same release-proof lane so the no-lock-in claim is guarded by manifest, source-cell, runtime-role, and artifact-reference checks
require each shipped notebook sample to keep creating an installable and runnable app equivalent to its packaged example
keep PyPI, GitHub release proof, public docs, and Hugging Face demo text aligned before publication
keep the package-aware
pypi-publishreuse gate healthy: detect expected wheel/sdist artifacts before build, skip build, Trusted Publishing auth, and upload when PyPI already exposes them, download reused files back into the GitHub Release distribution bundle, and record their hashes in release distribution evidencefail fast on local-path, stale-worker, missing-share, or stale-app-repository states instead of silently degrading
Done means:
a clean install can run the default first proof and the guarded imported notebook project end to end
release proof points to the exact version, commands, evidence, and known limitations
no release badge, docs mirror, dependency-policy, or trusted-publisher guard is knowingly stale
P1. First-run product experience
Goal:
a new user understands what to click, what will happen, and how to recover if it fails
Concrete items:
keep the landing page focused on the first proof and remove redundant call-to-action clutter
make every wizard action direct: install really installs, execute really runs, analysis opens the right result page, notebook import creates the project without asking the user to locate packaged files
keep PROJECT sidebars and advanced controls out of the default path unless they are needed for the current task
add deterministic error messages for install/run/delete/import flows and keep spinners scoped to the action that is still running
keep examples small enough to finish locally before users attempt cluster, service mode, or external app repositories
Done means:
a user can complete the built-in first proof or the notebook first proof without reading source code, finding hidden files, or guessing page actions
P2. Notebook interop and no-lock-in
Goal:
teams can enter AGILab from notebooks and leave AGILab back to notebooks without losing the useful work
Concrete items:
provide one importable notebook for every public packaged example that is suitable for notebook import
preserve explicit manager/worker role metadata while still requiring clear cell-by-cell review when metadata is missing or ambiguous
name imported projects predictably, for example
flight-telemetry-from-notebook-projectkeep notebook export positioned as the exit and handoff path, not just a convenience download
round-trip the stage order, code, runtime hints, artifacts, and provenance enough for review and reuse outside the AGILab UI
Done means:
import and export are documented as a reversible adoption bridge: notebooks can become AGILab projects, and AGILab work can be handed back as notebooks when the workbench is no longer needed
P3. Security and supply-chain posture
Goal:
AGILab is safe by default for controlled R&D and explicit about what remains outside the default threat model
Concrete items:
keep public UI binding local by default and document the reverse-proxy, authentication, TLS, and network controls required for exposure
treat apps, notebooks, generated snippets, and external repositories as executable code that needs review, allowlisting, and isolation
keep secrets out of command lines, logs, committed files, generated notebooks, and release evidence
regenerate SBOM and dependency audit evidence for the actual install profiles being adopted
keep PyPI trusted publishing and action pinning as mandatory release gates
Done means:
the docs do not imply production, multi-tenant, regulated-data, or public exposure readiness without the external controls required to make that true
P4. Team and cluster operation
Goal:
shared-team and cluster use is diagnosable, bounded, and repeatable
Concrete items:
make cluster share setup explicit and refuse cluster mode when no usable shared path is configured
keep SSH, SSHFS, LAN discovery, remote path, and share-sentinel diagnostics actionable from both CLI and UI
provide a small validation matrix for local, bare-metal cluster, VM-based cluster, AI Lightning, Hugging Face, and cloud targets when evidence exists
add service-health gates for long-running service mode: idle policy, unhealthy limit, restart-rate threshold, and machine-readable status
separate single-user convenience from multi-user isolation, quotas, and account policy
Done means:
a team can distinguish local failure, share failure, worker dependency failure, scheduler failure, and service-health failure without reading tracebacks first
P5. Evidence-driven MLOps bridge
Goal:
AGILab stays a workbench, but hands clean evidence to the MLOps and platform systems that own production
converge the existing evidence pieces into a portable proof capsule that can be verified, compared, replayed, and handed to another tool without relying on the original developer workspace
Concrete items:
strengthen run evidence, release decisions, run diff, artifact provenance, and compatibility profiles as first-class outputs
keep the first public proof-pack CLI layer small and verifiable:
agilab prove,agilab verify,agilab replay,agilab sign,agilab export-lineage,agilab export-traces,agilab policy-check,agilab cards, andagilab metadata-storeoperate onrun_manifest.jsonor.agipackevidence where appropriate, write plain JSON evidence, export a hash-verifiable.agipackarchive, and support optional detached Ed25519 signatures plus local trust-policy verification before AGILab claims external Sigstore/SLSA attestation verificationintroduce an Evidence Core contract that bundles the run manifest, workflow snapshot, environment lock, artifact hashes, notebook import/export manifest, optional MLflow references, policy checks, and verifier results as one portable audit surface
define the lifecycle around that contract: build, test, version, compare, promote, roll back, and archive evidence bundles with stable hashes and explicit compatibility metadata
derive a graph-shaped evidence view that links claims, code, data, models, environments, runs, notebooks, artifacts, cards, MLflow handoff, and publication targets without requiring an external graph service by default
support reviewer-oriented evidence queries such as unsupported claims, stale data sources, changed dependencies, policy failures, missing cards, and baseline-vs-candidate deltas
keep MLflow integration focused on tracking, artifacts, model registry handoff, and comparison rather than replacing AGILab execution
define promotion-ready evidence bundles for apps, imported notebooks, and cluster runs
add hooks for monitoring, drift, feature stores, orchestration engines, and serving platforms without claiming those systems are built into AGILab
Current shipped baseline:
proof-pack directory export from a run manifest with verification report, policy report, OpenLineage-shaped JSON, RO-Crate metadata, OpenTelemetry-shaped trace JSON, local metadata-store entry, and model/dataset/prompt/eval cards
hash-verifiable
.agipackarchive export with optional detached Ed25519 signatures and local JSON/TOML trust-policy verificationreplay is safe by default: it prints the recorded command unless the operator explicitly passes
--executepolicy-as-code starts with a small JSON/TOML gate over the manifest checks instead of a full external policy engine
Excel-shaped proof preview from packaged examples: a workbook, refresh-friendly CSV folder, and JSON hash evidence, intentionally before any Office add-in or arbitrary workbook import claim
Voila-shaped notebook proof preview from packaged examples: a dashboard notebook, hide-code manifest, widget-to-args migration hints, app-view plan, static HTML preview, and JSON hash evidence, intentionally before any Voila server integration or
agilab[voila]claim
Spreadsheet adoption bridge:
keep the shipped preview dependency-light and file-based until there is user pull for deeper Excel integration
add a focused
agilab excel-proof --workbook <file> --sheet <name> --out <file>command only after the preview validates the user storywrite an
AGILAB Evidenceworksheet, refreshable CSV/parquet outputs, and a JSON evidence bundle for workbook runskeep Office add-ins, macro execution, tenant deployment, and formula-preserving arbitrary workbook import as later product-expansion work, not current claims
Notebook dashboard adoption bridge:
keep the shipped preview dependency-light and file-based until there is user pull for deeper Voila integration
add a focused
agilab notebook-proof dashboard.ipynb --app-name <name>command only after the preview validates the user storypromote stable ipywidgets to app arguments while keeping app-specific dashboard code inside the app project
keep a future
agilab[voila]extra, server launch, and multi-user dashboard deployment as later product-expansion work, not current claims
Evidence Core roadmap:
define stable evidence node and edge types for
claim,code,data,model,environment,run,notebook,artifact,policy,card, andpublicationgenerate deterministic JSON, JSON-LD, or GraphML exports from proof-pack evidence so reviewers can inspect lineage and audit claims outside the UI
include source manifests, schema or ontology mappings, evidence policies, card metadata, reducer summaries, connector provenance, and imported-notebook metadata when the originating app can provide them
design inspection and verification surfaces around the existing proof commands instead of adding a second execution runtime
make the lifecycle explicit in CLI and UI language: a bundle can be built, checked, diffed, promoted, archived, or rejected without implying production certification
keep the baseline local-first and file-based; graph databases, vector indexes, message buses, or workflow-control services should remain optional adapters until the portable evidence contract is stable
Context-engineering gaps to close:
reusable workflow blueprints that capture stages, runtime parameters, prompt/tool settings, connector requirements, expected artifacts, evidence outputs, and smoke-test expectations
versioned prompt, tool, and agent configuration manifests that can be reviewed and replayed with the same rigor as app code
schema and ontology mapping hooks for apps that transform unstructured or semi-structured data into reviewable evidence, while keeping domain-specific extraction optional
lightweight deployment or run-profile generation that writes the commands, environment assumptions, and validation plan for local, notebook, cluster, service, or demo runs without becoming a Kubernetes platform
observability-lite evidence for execution latency, failure source, queue or worker backlog, artifact volume, token/cost metrics when an LLM is involved, and service-health state when a long-running service is used
Remaining state-of-the-art scope:
external Sigstore/SLSA attestation for signed
.agipackarchives and a third-party verifier path that validates provenance without a local trust root or source checkoutOpenLineage transport integration to emit events to an external lineage backend, not only write an interoperable JSON payload
native OpenTelemetry SDK/OTLP instrumentation across Streamlit actions, worker build, distributed execution, notebook export, MLflow handoff, and agent runs
durable ML metadata backend, for example SQLite/Postgres/MLMD-compatible storage, with query APIs for datasets, models, prompts, runs, artifacts, and lineage
app-declared model cards, data cards, prompt cards, and evaluation cards with domain metadata rather than evidence-only placeholders
richer policy-as-code, potentially OPA/Rego-compatible, for adoption gates, promotion gates, release gates, and sensitive-data gates
capability-based sandboxing for generated code, notebooks, and agent runs: explicit filesystem, network, secret, and subprocess scopes
first-class agent eval traces beyond the shipped local agent-run cards: prompt/tool/file timeline detail, diff evidence, replay, scoring, and safety policy results
monitoring and drift handoff adapters for production systems without turning AGILab into the production control plane
enterprise controls for shared deployments: secrets backend integration, authentication, RBAC, audit logs, and tenant isolation
State-of-the-art upgrade backlog:
Priority |
Missing capability |
First shippable AGILAB contract |
Done when |
|---|---|---|---|
P0 |
External proof-capsule attestation |
Sigstore / SLSA provenance sidecars bound to the signed |
A reviewer can verify archive integrity, signer identity, build provenance, and allowed issuer without the original source checkout or a local-only trust root. |
P0 |
Native OpenTelemetry and GenAI traces |
|
A run has correlated trace IDs across manager, worker, notebook, and agent events, and an OTLP collector can ingest them without a custom converter. |
P0 |
OpenLineage transport |
|
Marquez/OpenLineage-compatible backends can show AGILAB datasets, artifacts, and job hierarchy from emitted events, not only from local JSON files. |
P1 |
Continuous eval and LLMOps loop |
|
A prompt, model, agent, or notebook change can fail promotion because quality, cost, latency, safety, or regression scores moved outside policy. |
P1 |
Durable evidence metadata backend |
Local SQLite first, then Postgres-compatible storage for runs, artifacts, datasets, models, prompts, cards, lineage, and policy reports |
|
P1 |
Production serving handoff |
KServe/Ray Serve/Seldon-style manifest export with model URI, environment, health checks, canary/rollback hints, and prediction-log evidence contract |
AGILAB can hand a reproducible candidate to a serving platform while staying out of production traffic control. |
P2 |
App-authored cards |
App-owned model, data, prompt, and eval card schemas with required domain fields and evidence links |
Cards are no longer placeholders inferred from run manifests; apps declare meaningful review metadata. |
P2 |
Policy-as-code and capability sandboxing |
OPA/Rego-compatible gates plus filesystem, network, secret, subprocess, and package capability declarations for generated code, notebooks, and agents |
Unsafe execution paths are blocked before run, and the proof bundle records the policy decision and granted capabilities. |
P2 |
Enterprise shared-deployment controls |
Secrets-backend integration, authentication, RBAC, audit logs, and tenant-isolation hooks |
Shared deployments can be evaluated with explicit platform controls instead of relying on local-workbench assumptions. |
This backlog is intentionally adapter-first. AGILAB should produce portable contracts and evidence that platform tools can consume; it should not hide production serving, governance, or observability responsibilities inside the workbench.
Agent skills and resource evidence hardening:
keep the public agent discovery surface generated from the repo-managed skill source of truth:
AGENT_SKILLS.md,llms.txt,llms-full.txt, and theSkills/Standard/Works withbadges must remain generated artifacts, not hand-maintained marketing copyadd explicit compatibility metadata to each repo-managed skill: supported agents, required tools, write/network/subprocess expectations, local service assumptions, secret/environment requirements, and expected evidence outputs
close the current skill-scan hygiene gaps by replacing private absolute paths with placeholders and by documenting deliberate network or environment access in skill metadata rather than leaving it implicit in instructions
mature the skill security scanner from a deterministic local guard into a review surface: baseline or allow-list support, SARIF output, sticky PR comments, severity policy, and optional comparison with external agent-skill scanners before enforcing stronger gates
attach a
resource_snapshot.jsonto agent runs, first-proof runs, heavy page proofs, and release evidence when resource state explains scheduling, reproducibility, or performance choicesfeed resource snapshots and cluster inventory into scheduler recommendations and future autoscale decisions, while keeping autoscale behavior explicit and auditable rather than silently changing execution topology
build on the shipped
agilab agent-runevidence commands (list,handoff,next,context, andlineage) by adding skill identity, skill version, resource snapshot, changed files, richer command timeline, and resulting proof-artifact links so multi-agent work can be replayed or reviewed togetherpublish agent-surface validation as release evidence: generated catalogs, badge freshness, changed-skill scan reports, and resource snapshot checks should be archived alongside SBOM,
pip-audit, hashes, and provenance
Done means:
an experiment can be reviewed, compared, promoted, or rejected from evidence that is versioned, portable, and honest about its execution environment
the same evidence can be exported as a single proof capsule without claiming production certification
P6. Extension architecture and maintainability
Goal:
new apps, pages, connectors, and workflow features follow stable patterns instead of accumulating one-off glue
Concrete items:
keep public APIs, app templates, page metadata, connector models, reducer contracts, and workflow stage contracts explicit and tested
add flow blueprint contracts for reusable workflow presets, including the app/page inputs, parameter schema, prompt/tool configuration, connector requirements, expected artifacts, and evidence smoke
use design patterns to separate UI, orchestration, runtime execution, artifacts, and evidence generation
keep prompts, agent tools, MCP-style connectors, and runtime-tunable settings versioned and reviewable instead of hidden in page state
add pattern-gated checks before new workflow or notebook-import behavior can bypass existing contracts
keep strict typing and focused tests on shared helpers that affect many apps
document deprecations with migration paths and removal dates
Done means:
future features can be added by extending clear contracts, not by duplicating page-specific or app-specific behavior
P7. Ecosystem and distribution
Goal:
AGILab is easy to adopt incrementally through public packages, app packages, demos, and external repositories without locking users into one layout
Concrete items:
keep PyPI packages for publishable apps small, named consistently, and backed by trusted publishing
keep Hugging Face and public demos aligned with the same release evidence as the repository
provide clear app repository update, install, rename, and migration behavior instead of compatibility aliases for stale local copies
keep public app packaging and private app validation as separate lanes: public
agi-app-*packages use PyPI, entry points, wheel/sdist metadata, and provenance checks; private or non-public apps still need a pinned validation model that does not vendor private code into the public repositoryevaluate a private-side app validation manifest that records the external app repository origin, commit SHA, app path, runtime/package constraints, and expected validation commands while preserving
APPS_REPOSITORYsymlinks as the lightweight local-development shortcutpublish only examples that meet content-quality, install/run, README, and notebook-import criteria
Done means:
users can adopt one app, one notebook import, one demo, or the full workbench without discovering different contracts for each path
Professional execution backlog
Treat this as the delivery order. Lower-priority feature work should not displace higher-priority adoption, release, and safety work unless there is an explicit product decision.
Priority 1. Clean release lane
Ship only when the public package, public docs, release proof, coverage badges, trusted publishing, Hugging Face copy, and first-proof commands all describe the same release.
Acceptance gate:
./dev releasepasses from a clean checkout, starting with its strict AGILAB audit/review gate before impact, PyPI, docs, typing, and badge checksthe GitHub release workflow mirrors the same audit-first validation order
./dev docsand the release proof report pass from a clean checkoutthe release proof names the exact version, validation routes, and known non-certified environments
the publish workflow shows which split packages were uploaded and which were intentionally reused because their wheel/sdist artifacts were unchanged
no manual release note, README, public docs page, or demo copy contradicts the package that was published
Why first:
professional adoption starts by trusting the published artifact, not by trusting the developer machine
Priority 2. Notebook import parity
Every public example that is advertised as importable from a notebook must ship
with a notebook sample, deterministic metadata, and an imported-project smoke
that proves INSTALL and EXECUTE behave like the original app.
Acceptance gate:
each supported sample creates a predictably named
<example>-from-notebook-projectmanager/worker cell roles are explicit or force review before project creation
the release smoke keeps at least one imported notebook project in the guarded create -> install -> execute -> analysis path
Why now:
notebook import is a unique adoption bridge only if users can prove that the imported project still runs
Priority 3. First-run wizard contract
The default UI path must be direct: buttons perform the action they promise, and the next required user action is visible before navigation.
Acceptance gate:
built-in first proof: select demo, install, execute, and analysis are all direct and recoverable
notebook first proof: create from packaged notebook does not require the user to find a hidden file
spinners, success messages, and failure messages are scoped to the action that actually ran
Why now:
first-run confusion makes the product feel experimental even when the backend works
Priority 4. Runtime failure diagnostics
Failures must classify themselves before showing raw tracebacks.
Acceptance gate:
install/run/delete/import failures distinguish dependency, path, archive, project-state, cluster-share, worker-copy, and scheduler failures
stale local app directories and stale worker environments produce actionable remediation
corrupted archives and invalid imported notebooks fail fast with a concise cause and a safe next step
Why now:
professional users can tolerate failures; they cannot tolerate unclear failures
Priority 6. Cluster and team operation
Cluster mode should be a supported team workflow, not a best-effort advanced demo.
Acceptance gate:
cluster requests fail when no usable shared path exists
SSH, SSHFS, LAN discovery, remote path, and share-sentinel checks are exposed through CLI and UI diagnostics
validation evidence covers local, bare-metal cluster, VM cluster, AI Lightning, Hugging Face, and cloud targets only where each route has actually been tested
Why now:
distributed execution is a core differentiator only when setup and failure modes are operationally clear
Priority 7. Evidence and promotion workflow
AGILab should make it easy to decide whether a run, app, notebook import, or cluster validation is ready to reuse, publish, or hand off.
Acceptance gate:
run evidence, release decisions, compatibility reports, run diff, artifact provenance, and supply-chain evidence share stable schemas
evidence bundle lifecycle actions are explicit: build, verify, diff, promote, archive, reject, and roll back
evidence bundles expose a graph-shaped index that links the run manifest, artifacts, notebook exports, optional MLflow references, policy results, cards, and human-readable claims
schema/ontology mappings and source manifests are captured when an app creates derived evidence from structured, semi-structured, or unstructured inputs
evidence bundles can be consumed outside AGILab by reviewers, CI, MLflow, or platform teams
external graph, vector, streaming, or workflow-control infrastructure remains optional; the local proof pack stays the portable baseline
promotion decisions state what passed, what failed, and what is out of scope
Why now:
this is the bridge between a useful workbench and professional engineering governance
Priority 8. Connector-backed data access
Move data access from repeated path settings to declarative connectors.
Acceptance gate:
SQL, OpenSearch/ELK, object storage, local paths, and simulation backends use connector definitions instead of page-specific path glue where practical
connector health checks stay operator-triggered and do not leak credentials
import/export provenance names the connector and artifact source
Why now:
professional workflows fail when data paths are machine-specific or invisible
Priority 9. Extension and design-pattern guardrails
New app, page, workflow, notebook, connector, and reducer behavior should extend stable contracts instead of adding special cases.
Acceptance gate:
public app templates, page metadata, pipeline stages, notebook import roles, reducers, connectors, and evidence reports have focused tests
flow blueprints can be validated without launching the full UI, and they name their prompts, tools, connectors, expected artifacts, and evidence reports
pattern-gated checks block new workflow behavior that bypasses the shared contracts
deprecations include a migration path and removal target
Why now:
long-term maintenance depends more on repeatable patterns than on another feature page
Priority 10. Curated app ecosystem
Publish fewer apps, but make every published app useful, named well, documented, installable, runnable, and importable when it claims notebook support.
Acceptance gate:
app packages use consistent
agi-app-*names, trusted publishing, and clean metadataprivate/non-public app validation has a reproducible pinned-revision option for CI and release checks, while the local
APPS_REPOSITORYsymlink workflow remains available for day-to-day developmentexample READMEs explain purpose, inputs, outputs, install/run path, notebook import status, and limitations
app repository update behavior wins over stale local copies without hidden compatibility aliases
Why now:
app quality is the most visible proof that the platform contract works
Priority 11. Multi-app DAG productization
Productize multi-app orchestration only after the release, first-run, notebook, diagnostic, and evidence layers are stable.
Acceptance gate:
WORKFLOWcan show, validate, and execute a product-level DAG with persisted operator-visible stateretry, partial rerun, dependency visualization, and artifact handoff are visible in the same operator surface
the shipped two-app executable DAG remains the regression baseline before broader DAG coverage is claimed
Why later:
multi-app DAGs are high-value, but they amplify every weak contract beneath them
Priority 12. Observability and MLOps handoff
Integrate with observability and MLOps platforms without claiming to replace them.
Acceptance gate:
MLflow remains the tracking and registry handoff path
OpenSearch/Grafana/Superset-style integrations consume AGILab evidence and telemetry instead of duplicating app logic
AGILab first emits a small, stable telemetry envelope for run latency, failure source, worker backlog, artifact volume, LLM token/cost metrics when present, and service-health state before adding dashboards
profile generators can write local, notebook, cluster, service, and demo run instructions with expected validation commands and evidence outputs
production serving, drift detection, feature stores, and enterprise governance are framed as external platform integrations
Why later:
observability is most useful after run evidence and operational status are already consistent
Explicit non-priorities until the above is stable
broad public OS, GPU, cloud, or network certification without matching run evidence
production multi-tenant claims without external identity, isolation, quotas, secrets management, audit, and monitoring controls
generic dashboards that are not tied to AGILab runs, artifacts, or decisions
always-on graph/vector/message-bus services as a requirement for local proof generation before the file-based evidence contract is stable
runtime prompt/tool mutation without a versioned manifest, review trail, and replay evidence
new app publishing when the app lacks a clear purpose, deterministic first run, README, evidence, and package metadata
Feature sequencing after the professional baseline
If the goal is near-term product sequencing rather than broad idea collection, use this order after the P0-P2 professionalization gates are under control:
Multi-app DAG orchestration productization
let
WORKFLOWrepresent one orchestrated DAG across the full workflow, not just one app-local execution viewbuild on the shipped multi-app DAG contract, read-only global pipeline DAG report, pending execution-plan report, read-only runner state, and persisted dispatch-state proof plus the two-unit app dispatch smoke, operator-state report, dependency-view report, live-update payload report, operator-action execution report, and operator-UI report
Bidirectional notebook interop
build on the shipped supervisor-notebook export and analysis-page launcher metadata
add notebook-to-pipeline import maturity and optional single-kernel union-environment notebooks when stage environments are compatible
Data connector facility
make SQL, ELK, object storage, and other external data sources first-class connector targets
build on the shipped data connector facility report for SQL, OpenSearch, and object-storage definitions plus the data connector resolution report for connector-aware app/page resolution
add the shipped data connector health report for operator-gated probe planning without live public network checks
add the shipped data connector health actions report for explicit operator-triggered health probe rows
add the shipped data connector runtime adapters report for credentialed runtime bindings without materializing secrets in public evidence
add the shipped data connector UI preview report for static connector state and provenance review
add the shipped data connector live UI report for Release Decision Streamlit integration without connector network probes
add the shipped data connector app catalogs report for app-local connector catalogs across every non-template built-in app
this turns connector work into a practical data-access layer, not just path cleanup
Reduce contract adoption
AGILab already has distributed work-plan execution and an initial shared reducer contract
the public reducer benchmark now validates 8 partials / 80,000 synthetic items in
0.003sagainst a5.0stargetexecution_pandas_projectandexecution_polars_projectnow emit named benchmark reduce artefacts through that contractflight_telemetry_projectnow emits trajectory-summary reduce artefacts through that contractuav_queue_projectnow emits the samereduce_summary_worker_<id>.jsonartifact shape for queue metricsuav_relay_queue_projectnow emits that shared queue-metrics reduce artifact shape tooweather_forecast_projectnow emits forecast-metrics reduce artefactsRelease Decision now surfaces benchmark, flight, weather forecast, and UAV queue-family reduce artefacts as evidence
a repository guardrail now requires every non-template built-in app to expose a reducer contract
minimal_app_projectandmulti_app_dag_projectare the explicit template-only exemptions because they have no concrete merge output yetfuture apps/templates must opt in when they produce durable worker summaries
Intent-first operator mode
valuable, but it benefits from the cleaner evidence, compatibility, and connector contracts above
Elasticity and active mesh optimization
keep the current public claim bounded: a compact Active Mesh Optimization teaching route exists, but it is centralized-policy evidence, not full decentralized MARL certification
harden the shipped route by comparing baseline versus adaptive-network outcomes, then extending the evidence to failure injection and train-then-serve handoff
use moving nodes such as aircraft, UAVs, or satellites as active agents that can adapt trajectory or routing behavior to improve network KPIs
avoid duplicating experiment tracking or model-registry concepts; the differentiator should be closed-loop execution and evidence, not another metrics UI
Why this order:
turn the shipped manifest remediation baseline and CI artifact harvest contract into external evidence import and release indexes before broader onboarding automation
build global orchestration on the shipped cross-app contract and read-only product graph plus pending execution plan instead of claiming runner behavior before it exists
keep notebook interop after the orchestration state model is clearer
stabilize contracts before standardizing distributed reduction
keep operator refinements downstream of the proof/evidence layer
keep any broader MARL claim downstream of reproducible execution, baseline/candidate comparison, failure-injection evidence, service-contract handoff, and the shared evidence contract
Streamlit-inspired AGILab views
The most promising Streamlit-style view patterns for AGILab are not generic gallery clones. They are focused application views that reinforce AGILab’s core value: orchestration, evidence, and domain-specific interaction.
1. Experiment Cockpit
Purpose:
compare runs quickly
inspect KPI summaries
open artefacts and benchmark results from one page
Suggested layout:
KPI cards on top
run filters and selectors on the left
comparison charts in the center
run table and artefact links below
Why it matters:
best value-to-effort ratio
directly useful across many AGILab apps
2. Evidence / Release View
Purpose:
decide whether a run, model, or artefact bundle is promotable
Suggested layout:
release decision banner
pass/fail gate checklist
baseline vs candidate KPI comparison
provenance and reproducibility panel
evidence bundle table
Why it matters:
strong differentiator for AGILab
aligns with evidence-driven engineering and promotion workflows
3. Scenario Playback View
Purpose:
replay a run over time
inspect state, actions, and KPI evolution together
Suggested layout:
run selector and time slider
map or network panel
current decision-state panel
KPI timeline and event log
Why it matters:
strong demonstration value
good fit with existing AGILab map/network views
4. Realtime Analytical and Geospatial Views
Purpose:
inspect dense live data without degrading interaction quality
support higher-frequency analysis for KPI, maps, and network state
Recommended direction:
use Plotly.js/WebGL first for analytical views such as KPI timelines, run comparison, monitoring, and large point clouds
use deck.gl for dense geospatial and network overlays
use Three.js only for specialized 3D mission views where depth is part of the meaning, such as orbital or spatial playback
Why it matters:
gives AGILab a practical realtime analysis layer without committing to custom low-level WebGL infrastructure
fits existing AGILab needs better than a generic “WebGL support” initiative
opens a clear path for performance gains in monitoring and playback views
5. Run Diff / Counterfactual Analysis
Purpose:
compare two runs and explain what changed in a way that is directly useful to engineers and reviewers
turn raw deltas into defensible reasoning about outcomes
Suggested scope:
input and configuration diff
topology and artefact diff
allocation and decision diff
KPI delta summary
candidate-vs-baseline narrative focused on the most material changes
Current shipped baseline:
agilab.run_diff_evidence.v1defines a first no-execution run-diff evidence contract for public reviewtools/run_diff_evidence_report.py --compactcompares static baseline/candidate KPI checks, run manifests, and artifact rows, then emits counterfactual prompts for material deltasthe KPI evidence bundle includes this as
run_diff_evidence_report_contractand verifies zero command, live-execution, and network-probe countstools/revision_traceability_report.py --compactvalidatesagilab.revision_traceability.v1and fingerprints repository HEAD, AGI core package versions, and built-in app manifests without invoking git commands or querying networkstools/public_certification_profile_report.py --compactvalidatesagilab.public_certification_profile.v1and turns the compatibility matrix into abounded_public_evidencecertification profile without production or third-party certification claimstools/supply_chain_attestation_report.py --compactvalidatesagilab.supply_chain_attestation.v1and fingerprints package metadata, lockfile, license, bundled AGI core versions, and built-in app manifests without formal supply-chain attestation claimstools/ci_artifact_harvest_report.py --compactnow defines the no-network external-machine attachment contract for run manifests, KPI bundles, compatibility reports, and promotion decisionsRelease Decision can import
ci_artifact_harvest.json, display harvested artifact status/checksum/provenance rows, block invalid harvests, and exportci_artifact_harvest_summaryplusci_artifact_harvest_evidenceinsidepromotion_decision.jsontools/github_actions_artifact_index.py --archiveconverts downloaded GitHub Actions artifact ZIPs into a harvest-compatibleartifact_index.json, and its opt-in--live-githubpath can query/download workflow-run artifacts when credentials are availabletools/ci_provider_artifact_index.py --provider gitlab_ci --archiveconverts downloaded GitLab CI or generic provider artifact ZIPs into the same harvest-compatibleartifact_index.jsonwithout querying live provider APIsthe same tool supports opt-in
--live-gitlabfor credentialed GitLab CI pipeline artifact queries/downloadstools/compatibility_report.py --artifact-indexcan derive per-release compatibility status from those downloaded artifact indexes or fromci_artifact_harvest.jsonsummariesthe
pypi-publishrelease workflow includes arelease-evidencejob that uploads sample external evidence, retrieves it through the live GitHub Actions artifact API with--live-github, and validates the resulting artifact index through the harvest and compatibility reports before publish jobs proceed
Remaining scope:
add richer domain-specific explanations for allocation, topology, and decision deltas
run non-GitHub live provider API harvests in credentialed operator CI
Why it matters:
high value for debugging, review, and evidence-driven engineering
fits AGILab better than generic BI dashboards because it stays tied to runs, artefacts, and orchestration decisions
creates a strong bridge between experimentation and promotion workflows
6. Multi-app DAG orchestration
Purpose:
extend orchestration from one app flow to DAGs that span multiple apps
make inter-app dependencies explicit instead of hiding them in manual glue
Current shipped baseline:
agilab.multi_app_dag.v1defines the first portable cross-app DAG contractdocs/source/data/multi_app_dag_sample.jsonlinksuav_queue_projecttouav_relay_queue_projectthrough the explicitqueue_metricshandoffdocs/source/data/multi_app_dag_portfolio_sample.jsonbroadens the contract-only sample suite acrossflight_telemetry_project,weather_forecast_project,execution_pandas_project, andexecution_polars_projecttools/multi_app_dag_report.py --compactvalidates schema, checked-in app nodes, acyclic dependencies, docs references, artifact handoffs, and the two-sample DAG suitethe KPI evidence bundle includes this as
multi_app_dag_report_contractthe multi-app DAG report family now covers execution planning, persisted dispatch state, real two-app app-entry smoke execution, operator state, dependency views, live-update payloads, operator actions, and static operator UI proof for the checked-in
queue_baseline -> relay_followupcontract
Remaining scope:
no open report-driven contract gap remains for the shipped two-app executable DAG baseline or the broader contract-only sample suite
future work is broader app coverage, placement in the live product surface, external validation, and production hardening
Why it matters:
the contract closes the first bridge between app-local execution and a product-wide orchestrated workflow
the remaining work is scale and hardening rather than missing public evidence for the shipped two-app baseline
7. Multi-app DAG orchestration productization
Purpose:
turn the checked-in multi-app DAG, execution plan, read-only runner state, and persisted dispatch-state proof into live app execution with persisted operator-visible status
Current shipped baseline:
tools/global_pipeline_dag_report.py --compactassembles one read-only product-level graph fromdocs/source/data/multi_app_dag_sample.jsonthe graph expands
uav_queue_projectanduav_relay_queue_projectthrough their checked-inpipeline_view.dotfilesthe graph preserves the cross-app
queue_metricsartifact edge and reports app nodes, app-local stage nodes, app-local edges, and execution ordertools/global_pipeline_execution_plan_report.py --compactconverts the graph into ordered runnable units inpending/not_executedstate, marksqueue_baselineready, marksrelay_followupblocked onqueue_metrics, and records provenance for the DAG and each app-local pipeline viewtools/global_pipeline_runner_state_report.py --compactprojects the plan into read-only runner state, marksqueue_baselineasrunnable, marksrelay_followupasblocked, and records transition, retry, partial-rerun, operator-message, and provenance metadata without executing appsthe WORKFLOW page now includes an expanded
Workflow graphsurface that can choose project workflow or multi-app DAG scope, edit steps, created outputs, and used outputs through selector-driven workspace drafts and read-only summaries, validate the plan without hand-editing docs files, reset the persisted preview state, show readiness KPIs, optional graph and output details, and preview the next ready step without claiming live app executiontools/global_pipeline_dispatch_state_report.py --compactwrites and reads back a persisted dispatch-state JSON proof, recordsqueue_baselinecompletion, publishesqueue_metrics, marksrelay_followuprunnable, and preserves timestamps, retry counters, partial-rerun flags, operator messages, and provenance without executing appstools/global_pipeline_app_dispatch_smoke_report.py --compactexecutesqueue_baselineandrelay_followupthrough the real checked-inuav_queue_projectanduav_relay_queue_projectmanager/worker entries, writes the actualqueue_metrics,relay_metrics, and reducer artifacts, and persists them in dispatch-state JSONtools/global_pipeline_operator_state_report.py --compactreads that persisted full-DAG dispatch state and exposes completed unit state, queue-to-relay handoffs, available artifacts, and retry/partial-rerun action rows for future operator flowstools/global_pipeline_dependency_view_report.py --compactreads the operator-state proof and exposes upstream/downstream dependency visualization forqueue_baseline -> relay_followup, including thequeue_metricsedge, producer/consumer apps, adjacency lists, and artifact-flow rowstools/global_pipeline_live_state_updates_report.py --compactreads the dependency view and emits deterministic live orchestration-state updates for graph-ready, unit-state, artifact-state, dependency-state, and operator-action refresh payloads; this is an update contract, not a streaming service or UI renderertools/global_pipeline_operator_actions_report.py --compactreads the live-update payloads, acceptsqueue_baseline:retryandrelay_followup:partial_rerun, replays the corresponding queue and relay app entries, and persists action outcomes plus output artifactstools/global_pipeline_operator_ui_report.py --compactreads the action outcomes and renders status, unit-card, dependency-graph, update-timeline, action-control, and artifact-table components into a static HTML proofthe compact KPI bundle includes this as
global_pipeline_dag_report_contract,global_pipeline_execution_plan_report_contract,global_pipeline_runner_state_report_contract, andglobal_pipeline_dispatch_state_report_contract, plusglobal_pipeline_app_dispatch_smoke_report_contractandglobal_pipeline_operator_state_report_contractandglobal_pipeline_dependency_view_report_contractandglobal_pipeline_live_state_updates_report_contractandglobal_pipeline_operator_actions_report_contractandglobal_pipeline_operator_ui_report_contract
Remaining scope for this item:
no open report-driven contract gap remains for the multi-app DAG runner/UI baseline; future work is product hardening, placement, and broader external validation
Why it matters:
the report gives AGILab a clearer product story than isolated per-app pipelines without overclaiming execution
live UI state is still needed before the orchestration layer is fully visible to operators and reviewers
8. Bidirectional notebook interop
Purpose:
complete the bridge between notebooks and AGILab pipelines without hiding per-stage runtime constraints
Current shipped baseline:
WORKFLOWcan already export a supervisor notebook that preserves stage provenance, runtime metadata, and per-stage execution contextexported notebooks can include related analysis-page launcher helpers when an app declares them
tools/notebook_pipeline_import_report.py --compactnow validates the first notebook-to-pipeline import contract from a checked-in.ipynb; it preserves markdown context, code cells, import hints, execution-count metadata, and artifact references asnot_executed_importpipeline-stage evidence, writes a richerlab_stages.tomlpreview, and feeds the existingWORKFLOWupload pathtools/notebook_roundtrip_report.py --compactvalidateslab_stages.toml -> supervisor notebook -> import -> lab_stages previewpreservation for saved stage description, prompt, model, code, runtime, import hints, and artifact referencestools/notebook_union_environment_report.py --compactvalidates asingle-kernel union notebookcandidate only for compatiblerunpy/ current-kernel stages and recordssupervisor_notebook_requiredfor mixed runtime or mixed-environment pipelinesthis is intentionally not the same thing as flattening a multi-venv pipeline into one notebook kernel
packaged examples now include a dependency-light Voila-shaped notebook proof preview that records widget-to-args hints, a hide-code manifest, an app-view plan, and evidence hashes without launching a Voila server
Suggested scope:
harden notebook-to-pipeline import beyond the initial report and upload path, including broader edge cases for exported supervisor notebooks
make notebook-native analysis surfaces or Voilà-style packaging possible without duplicating the current apps-pages logic blindly
preserve enough provenance so the notebook remains explainable
Why it matters:
reduces the gap between exploratory notebook work and reproducible product workflows
gives teams a practical adoption bridge instead of a one-way migration story
Logging modernization
Purpose:
improve developer and operator logging without breaking compatibility across Streamlit, workers, subprocesses, and distributed services
Recommended direction:
keep Python stdlib
loggingplusAgiLoggeras the canonical runtime logging contractadd real child logger support, structured JSON output, and stable context fields such as app id, host, worker, and run id
keep the current colorized human console output as the default local developer mode
treat
loguruas an optional choice only for isolated helper scripts or local tools that do not need full stdlib logging interoperabilitydo not plan a repo-wide migration to
loguruunless stdlib logging becomes a demonstrated blocker for AGILAB runtime requirements
Why it matters:
AGILAB already spans third-party libraries and multi-process surfaces that integrate naturally with stdlib logging
the real missing capability is structured context and better logger hierarchy, not a new logging syntax
this keeps the logging contract stable while still making observability stronger
Backend observability and audit architecture
AGILab should keep application-specific interaction inside the product and move generic observability, search, and fleet-level monitoring into tools designed for that job.
1. Elastic or OpenSearch + Grafana
Best when:
engineering operations and observability are the main priority
Good for:
run health
worker load
stage latency
failures and alerts
SLA-style monitoring
Why it matters:
strongest near-term operational value
clean split between AGILab interaction and backend observability
2. OpenSearch + OpenSearch Dashboards
Best when:
auditability, search, and historical traceability are the main priority
Good for:
log exploration
artefact traceability
historical run search
saved audit dashboards
Why it matters:
lowest friction for Kibana-like usage patterns
3. Postgres + Superset
Best when:
structured KPI analytics and management reporting are the main priority
Good for:
curated dashboards
cross-project reporting
evidence trend analysis
management-facing analytics
Why it matters:
stronger fit than Elastic-native tools for BI-style reporting
Connectors and integration
Connectors should appear explicitly in the roadmap because they are not just implementation detail. They determine how AGILab reaches external systems, resolves artefacts, and keeps app workflows portable.
Audience bridge strategy
The highest-leverage audience bridge is a Quarto / R / notebook bridge, not an R-native worker rewrite. AGILAB should stay the reproducible execution and evidence engine while bridges let each community consume that evidence in its normal workflow.
The dependency-light bridge MVP baseline now exposes these commands:
Quarto / R report bridge:
agilab export quartoandagilab run quartoread-only MCP evidence server and agent evidence cards:
agilab mcp serve --read-only,agilab agent-run list,agilab agent-run handoff,agilab agent-run next,agilab agent-run context,agilab agent-run lineage, andagilab agent-run compare, plusagilab agent-run validateHugging Face Docker Space exporter:
agilab export hf-spaceMLflow JSON handoff:
agilab export mlflowandagilab import mlflowVS Code / devcontainer onboarding:
agilab init vscodeDuckDB SQL bridge:
agilab run duckdbAirflow / Dagster handoff exporters:
agilab export airflow-dagandagilab export dagster-job
The current R-stage smoke app remains the payload-plane proof for external Rscript execution. Remaining roadmap work is to deepen each bridge with community-native packages, richer artifact previews, and production handoff polish while keeping R-native worker changes out of shared core until the app-local contract proves broader value.
See Audience bridges for the detailed bridge ranking, MVP scopes, and implementation order.
1. Connector framework hardening
Purpose:
make connector-backed workflows more predictable and portable
Focus areas:
path portability
artefact resolution
stable source and target contracts
less app-specific path glue
clearer connector diagnostics
Why it matters:
reduces friction across apps
makes automation more reusable
lowers the gap between conceptual workflows and executable stages
Connector integration change request
The concrete change request behind this roadmap item is to replace repeated raw
path settings in app_settings.toml with references to reusable connector
definition files.
Current problem:
pages such as
view_maps_networkrely on many low-level path keysthe same path logic is repeated across settings files
defaults are more machine-specific than they should be
page code must interpret too many raw path parameters directly
Proposed direction:
introduce a declarative
Connectormodelstore connector definitions in plain-text TOML files
let
app_settings.tomlreference those connector files instead of embedding all path details inline
Completed baseline:
tools/data_connector_facility_report.py --compactvalidates first-class SQL, OpenSearch, and object-storage connector definitions without network probestools/data_connector_resolution_report.py --compactresolves connector IDs from an app-settings-style sample, validates connector-aware app/page resolution, and preserveslegacy_path_fallbackrows for migrationtools/data_connector_health_report.py --compactplans SQL, OpenSearch, and object-storage health/status probes behind operator opt-in while keeping public evidence inhealth_probe_plan_onlymodetools/data_connector_health_actions_report.py --compactexposes those probes as operator-triggered action rows inoperator_trigger_contract_onlymodetools/data_connector_runtime_adapters_report.py --compactbinds SQL, OpenSearch, and object-storage connectors to runtime adapter operations while deferring credential values to the operator runtimetools/data_connector_live_endpoint_smoke_report.py --compactadds the operator-gated live endpoint smoke contract and validates the execution path with a local SQLite endpointtools/data_connector_ui_preview_report.py --compactrenders connector state, page bindings, legacy fallbacks, and health-boundary provenance as static JSON+HTML evidencetools/data_connector_live_ui_report.py --compactwires connector state and connector-derived provenance into the Release Decision Streamlit page instreamlit_render_contract_onlymodetools/data_connector_view_surface_report.py --compactverifies the connector-aware Release Decision panels for state/provenance, health boundary, import/export provenance, and external artifact traceability inconnector_view_surface_contract_onlymodetools/data_connector_app_catalogs_report.py --compactvalidates app-local connector catalogs referenced from built-inapp_settings.tomlfiles
First connector model:
idkindlabeldescriptionbasesubpathglobspreferred_file_extmetadata
Recommended file placement:
next to the app settings
for example
src/connectors/*.toml
Recommended resolution rule:
explicit query parameters
current session-state widget values
explicit page-level overrides in
app_settings.tomlconnector references in
app_settings.tomllegacy raw path keys
code-level defaults
Compatibility rule:
keep legacy raw path keys working in phase 1
let connector references win when both are defined
Expected impact:
view_maps_networkis the primary beneficiary
Remaining scope:
run the opt-in smoke against real credentialed operator endpoints
Distributed execution and reduction
AGILab already ships real distributed execution primitives, but the product surface is not yet a fully migrated generic map/reduce layer.
Current state:
apps can build explicit distribution plans
workers execute partitioned plans locally or on Dask-backed clusters
agi_node.reductiondefines a shared reducer contract with partial inputs, merge semantics, validation hooks, and a standard reduce artefact schematools/reduce_contract_benchmark.py --jsonvalidates 8 partials / 80,000 synthetic items in0.003sagainst a5.0stargetexecution_pandas_project,execution_polars_project,flight_telemetry_project,weather_forecast_project,uav_queue_project, anduav_relay_queue_projectwrite worker-scopedreduce_summary_worker_<id>.jsonartefacts through the shared contractRelease Decision surfaces those reduce artefacts with schema validation, reducer name, partial count, artifact path, benchmark row/source/execution fields, flight row/aircraft/speed fields, weather forecast MAE/RMSE/MAPE fields, and UAV queue-family packet/PDR fields when present
aggregation outside the migrated benchmark, flight, weather, and UAV queue-family apps is still mostly app-specific
Current guardrail:
all non-template built-in apps now expose a reducer contract
minimal_app_projectis template-only and intentionally exempt because its worker hooks are placeholders with no concrete merge outputmulti_app_dag_projectis template-preview only and intentionally exempt because it demonstrates cross-app DAG contracts rather than a concrete worker merge outputfuture apps/templates must add
reduction.py, emitreduce_summary_worker_<id>.json, and export a*_REDUCE_CONTRACTonce they produce durable worker summariesdocs should avoid describing AGILab as a full generic map/reduce mechanism beyond the explicit contract and migrated apps
1. Reduce contract adoption
Purpose:
move the current distributed work-plan execution model onto the shared reusable aggregation contract
Focus areas:
reducer adoption in public apps
user-visible reduce artefacts in analysis views
user-visible evidence that a distributed run was merged successfully
Why it matters:
makes the product claim honest and specific
reduces repeated merge logic across apps
improves reviewability of distributed results
gives AGILab a clearer story than “Dask-backed execution exists somewhere in the stack”
Completed slices:
execution_pandas_projectandexecution_polars_projectnow emit namedreduce_summary_worker_<id>.jsonReduceArtifactfiles from worker resultsflight_telemetry_projectnow emits worker-scopedreduce_summary_worker_<id>.jsonReduceArtifactfiles for trajectory summary metricsuav_queue_projectnow emits worker-scopedreduce_summary_worker_<id>.jsonReduceArtifactfiles for queue summary metricsuav_relay_queue_projectnow emits worker-scopedreduce_summary_worker_<id>.jsonReduceArtifactfiles for relay queue summary metricsweather_forecast_projectnow emits worker-scopedreduce_summary_worker_<id>.jsonReduceArtifactfiles for forecast quality metricsRelease Decision now discovers
reduce_summary_worker_*.json, parses it withReduceArtifact.from_dict, displays reducer evidence, and flags invalid JSONa repository guardrail now fails if a non-template built-in app lacks a reducer contract or worker-scoped artifact writer
minimal_app_projectandmulti_app_dag_projectare documented as template-only rather than counted as reducer migration gaps
Next concrete change request:
keep future public apps/templates aligned with the shared reducer contract as they gain concrete merge semantics
extend the surfaced reducer evidence as more non-benchmark apps adopt the same artifact contract
Compatibility rule:
keep current app-owned aggregation working in phase 1
let apps opt into the shared reducer contract incrementally
Expected impact:
cleaner public positioning for distributed execution
easier regression testing of distributed apps
a better foundation for future run-diff and evidence views
PROJECTmust expose connector references clearly enough to stay debuggableWORKFLOWshould remain unchanged in phase 1
Suggested implementation phases:
core connector model, parser, resolver, and validation
connector-aware default resolution in apps-pages
connector preview and navigation support in
PROJECToptional connector references in
WORKFLOWonly if needed later
Acceptance target:
connectors can replace path groups in
app_settings.tomlexisting apps still work without migration
connector definitions remain plain-text and git-friendly
2. Data connector facility
Purpose:
connect AGILab cleanly to external data systems and storage backends
Typical targets:
SQL databases
Elasticsearch or OpenSearch
ELK-backed data sources
object storage
GitHub or GitLab
simulation backends
shared data repositories
Why it matters:
expands AGILab beyond local file-driven workflows
makes observability, reporting, and traceability easier to industrialize
Current shipped baseline:
tools/data_connector_facility_report.py --compactvalidatesagilab.data_connector_facility.v1againstdocs/source/data/data_connectors_sample.tomlthe sample covers SQL, OpenSearch/ELK, and object-storage connector definitions with kind-specific required fields; the current object-storage contract covers AWS S3/S3-compatible stores, Azure Blob Storage, and Google Cloud Storage
remote credentials are represented as
env:references and the report runs incontract_validation_onlymode without live network probestools/data_connector_resolution_report.py --compactvalidatesagilab.data_connector_resolution.v1againstdocs/source/data/data_connector_app_settings_sample.tomlconnector-aware app/page resolution now resolves catalog IDs from app settings while preserving
legacy_path_fallbackrows for raw-path migrationtools/data_connector_health_report.py --compactvalidatesagilab.data_connector_health.v1and plans connector health/status probes behind operator opt-in without executing network checkstools/data_connector_health_actions_report.py --compactvalidatesagilab.data_connector_health_actions.v1and exposes operator-triggered health probe action rows without executing network checkstools/data_connector_runtime_adapters_report.py --compactvalidatesagilab.data_connector_runtime_adapters.v1and binds credentialed connector adapters to runtime operations while deferring credential valuestools/data_connector_live_endpoint_smoke_report.py --compactvalidatesagilab.data_connector_live_endpoint_smoke.v1, keeps default public evidence inlive_endpoint_smoke_plan_onlymode, and proves the opt-in execution path with a local SQLite endpoint without opening external networkstools/data_connector_ui_preview_report.py --compactvalidatesagilab.data_connector_ui_preview.v1and renders static connector state plus connector-derived provenance as JSON+HTML preview evidencetools/data_connector_live_ui_report.py --compactvalidatesagilab.data_connector_live_ui.v1and wires connector state plus connector-derived provenance into the Release Decision Streamlit page without opening connector networkstools/data_connector_view_surface_report.py --compactvalidatesagilab.data_connector_view_surface.v1and checks the Release Decision connector state/provenance panel, health/status boundary, import/export provenance panel, and external artifact traceability panel without opening connector networkstools/data_connector_app_catalogs_report.py --compactvalidatesagilab.data_connector_app_catalogs.v1for app-local connector catalogs across every non-template built-in app
Remaining scope:
run the opt-in smoke against real credentialed SQL/OpenSearch/object-storage endpoints in operator environments
3. Connector-aware views
Purpose:
move the shipped static connector state and connector-derived provenance preview into the live UI pages
Typical views:
import or export provenance panel
connector health/status panel
external artefact traceability panel
Current shipped baseline:
tools/data_connector_view_surface_report.py --compactvalidatesagilab.data_connector_view_surface.v1inconnector_view_surface_contract_onlymodethe report verifies four Release Decision surfaces: connector state/provenance, connector health/status boundary, import/export provenance, and external artifact traceability
the evidence reads local page source plus the connector live-UI render contract, uses the existing Streamlit recorder, and keeps command execution and network probes at zero
the KPI evidence bundle includes this as
data_connector_view_surface_report_contract
Remaining scope:
move the same pattern beyond Release Decision as additional live UI pages need connector-aware panels
run live connector health/status actions only in credentialed operator environments
Why it matters:
makes integrations visible and debuggable
gives users confidence about what data came from where
4. DeepWiki/Open-style repository knowledge layer
Purpose:
make the AGILab codebase easier to explore, onboard, and explain
provide a generated code wiki and Q&A layer across repositories
Recommended scope:
start with controlled local deployments before publishing hosted search
index each repository separately
include code, docs source, runbooks, and
pyproject.tomlexclude generated artefacts, virtualenvs,
build/,dist/, and runtime shares
Guardrail:
treat the generated wiki as an exploration aid, not as the source of truth
keep official product and operator documentation in versioned docs and runbooks
Current shipped baseline:
tools/repository_knowledge_report.py --compactvalidatesagilab.repository_knowledge_index.v1inrepository_knowledge_static_indexmodethe report indexes local code, tools, root tests, official docs, root runbooks, and package/app manifests with SHA-256 fingerprints, lightweight outlines, deterministic file, line, size, kind, and suffix statistics, and ratio, top-category, and largest-file summaries
generated artifacts, virtualenvs, build outputs, and distributions are excluded by contract
the report emits stable onboarding query seeds while explicitly keeping the generated index as an exploration aid and versioned docs as the source of truth
the KPI evidence bundle includes this as
repository_knowledge_report_contract
Remaining scope:
connect this static index to a generated wiki or Q&A service in controlled deployments
extend indexing to external app repositories under the same source-of-truth guardrail
Why it matters:
reduces time spent rediscovering cross-cutting implementation details
helps new contributors navigate AGILab’s multi-repo, multi-app structure
complements agent workflows with repository-level context and diagrams
Decision guidance
Use this rule of thumb:
if the goal is professionalization, use the ordered list from Professionalization priority order first
if the professional baseline is already under control and the goal is feature sequencing, use Feature sequencing after the professional baseline
choose Experiment Cockpit if the next need is better daily usability for engineers comparing runs
choose Evidence / Release View if the next need is promotion readiness and defensible evidence
choose Scenario Playback View if the next need is time-based explanation and demonstration
choose Realtime Analytical and Geospatial Views if the next need is denser live analysis, faster interaction, and higher-volume visual playback
choose Run Diff / Counterfactual Analysis if the next need is faster debugging, clearer run review, and defensible explanation of KPI changes
choose Multi-app DAG orchestration if the next need is broader app coverage beyond the shipped two-app dependency contract
choose Multi-app DAG orchestration productization if the next need is to execute the shipped product-visible graph in
WORKFLOWchoose Bidirectional notebook interop if the next need is a stronger bridge between exploratory notebooks and AGILab-managed workflows
choose Elastic/OpenSearch + Grafana if the next need is operations and observability
choose OpenSearch + OpenSearch Dashboards if the next need is audit and historical search
choose Postgres + Superset if the next need is curated KPI analytics
choose Connector framework hardening and the data connector facility if the next need is portability, SQL/ELK/data-system access, and reliable artefact flow
choose Pinned private-app validation if the next need is CI/release reproducibility for non-public apps without publishing or vendoring their code
choose DeepWiki/Open-style repository knowledge layer if the next need is faster codebase onboarding, architecture discovery, and repository Q&A without turning generated content into official docs
Final consolidated poll
Use both paths, because they serve different purposes:
Quick popularity signal in GitHub Discussions
Create or answer a poll: https://github.com/ThalesGroup/agilab/discussions/new?category=polls
Browse existing poll discussions: https://github.com/ThalesGroup/agilab/discussions/categories/polls
Structured roadmap vote in GitHub Issues
Submit a vote: https://github.com/ThalesGroup/agilab/issues/new?template=roadmap-vote.yml
Browse submitted votes: https://github.com/ThalesGroup/agilab/issues?q=is%3Aissue+in%3Atitle+”[Roadmap+vote]”
Open roadmap discussion in Issues
Central roadmap thread: https://github.com/ThalesGroup/agilab/issues/2
Use this thread if you want visible engineering discussion in the normal issue workflow.
Current candidate priorities
P0 release and runtime integrity
P1 first-run product experience
P2 notebook interop and no-lock-in
P3 security and supply-chain posture
P4 team and cluster operation
P5 pinned private-app validation for non-public app CI and release checks
Multi-app DAG orchestration productization, once the professional baseline is stable
Data connector facility and connector-aware views, once first-run and evidence paths are predictable
If the roadmap label is not visible yet in GitHub, the issue form still
works. The repository workflow will create or update that label on the next
successful run.
Reference URLs
Streamlit gallery: https://streamlit.io/gallery
st.metric: https://docs.streamlit.io/develop/api-reference/data/st.metricst.fragment: https://docs.streamlit.io/develop/api-reference/execution-flow/st.fragmentst.pydeck_chart: https://docs.streamlit.io/develop/api-reference/charts/st.pydeck_chartOpenSearch FAQ: https://docs.opensearch.org/faq/
AWS OpenSearch background: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html
Grafana Elasticsearch datasource: https://grafana.com/docs/grafana/latest/datasources/elasticsearch/
Superset Elasticsearch support: https://superset.apache.org/docs/databases/supported/elasticsearch/
Metabase data sources: https://www.metabase.com/data-sources/
Comment template for
issues/2