AGILab future work

This page tracks planned work only.

The goal here is to rank future work, not to restate the current feature set.

Professional target

AGILab should feel professional when a new team can trust the release, complete the first proof, import or export notebook work, diagnose failures, understand the security boundary, and hand evidence to another tool or reviewer without needing the original developer.

The roadmap therefore prioritizes trust, clarity, and maintainability before adding larger product surfaces.

Professional scorecard

Use this scorecard before promoting a release, demo, app, or major feature as professional-ready.

Area

Professional bar

Primary proof

Release trust

Clean install, release proof, badges, PyPI, docs, and demo text agree

Release guard and release proof

First-run UX

A newcomer can complete one local proof without guessing page actions

First-proof smoke and UI robot path

Notebook bridge

Work can enter from notebooks and leave as reusable notebooks

Import/export round-trip evidence

Failure clarity

Common failures are classified before raw tracebacks

Diagnostic tests and user-facing errors

Security boundary

Shared, public, sensitive, and production use limits are explicit

Security check and adoption docs

Team runtime

Cluster/share/service routes fail fast and explain remediation

Cluster/share/service health gates

Evidence handoff

Runs, artifacts, compatibility, and promotion decisions are portable

Evidence bundle and release decision

Maintainability

New features extend tested contracts instead of page-specific glue

Contract tests and pattern guardrails

Ecosystem quality

Published apps are named, documented, installable, runnable, and scoped

App package smoke and README checks

Phase plan

Use phases as the product sequence. Dates can move; ordering should not move unless a higher-priority item is explicitly accepted as a risk.

Phase

Focus

Exit gate

Phase 0

Release trust and docs alignment

Clean release lane, fresh docs mirror, green badge guard, no stale public claims

Phase 1

Newcomer first proof and notebook parity

Built-in and notebook first proofs install, execute, and open analysis predictably

Phase 2

Diagnostics, security, and team readiness

Failures are classified; shared/team/cluster use has explicit checks and limits

Phase 3

Evidence and data integration

Promotion evidence, run diff, connectors, and provenance are consumable outside the UI

Phase 4

Maintainable extension model

Apps, pages, notebooks, connectors, reducers, and evidence reports follow stable contracts

Phase 5

Product expansion

Multi-app DAG, operator mode, observability, and MLOps handoff build on the stable baseline

Sequencing rules

  • Fix release trust before adding feature breadth.

  • Fix first-run UX before asking users to try clusters or service mode.

  • Prove notebook import parity before advertising notebook migration broadly.

  • Add diagnostics before broadening team and cluster validation.

  • Add security and supply-chain gates before shared, exposed, or sensitive use.

  • Standardize evidence schemas before adding dashboards.

  • Stabilize extension contracts before publishing more apps.

  • Productize multi-app DAGs only after the first-run, runtime, evidence, and contract layers are stable.

Professionalization priority order

Use this order when the goal is to make AGILab feel professional, adoptable, and maintainable rather than just richer in features.

P0. Release and runtime integrity

Goal:

  • every public release can be installed, launched, and validated from a clean public environment without relying on the developer checkout

Concrete items:

  • keep the release guard as the mandatory pre-tag path: install smoke, first-proof, security check, docs mirror check, badge freshness, dependency policy, and trusted-publisher contract

  • keep the imported-notebook release smoke in the mandatory release-proof profile, not as a separate best-effort demo path

  • keep the exported-notebook handoff smoke in the same release-proof lane so the no-lock-in claim is guarded by manifest, source-cell, runtime-role, and artifact-reference checks

  • require each shipped notebook sample to keep creating an installable and runnable app equivalent to its packaged example

  • keep PyPI, GitHub release proof, public docs, and Hugging Face demo text aligned before publication

  • keep the package-aware pypi-publish reuse gate healthy: detect expected wheel/sdist artifacts before build, skip build, Trusted Publishing auth, and upload when PyPI already exposes them, download reused files back into the GitHub Release distribution bundle, and record their hashes in release distribution evidence

  • fail fast on local-path, stale-worker, missing-share, or stale-app-repository states instead of silently degrading

Done means:

  • a clean install can run the default first proof and the guarded imported notebook project end to end

  • release proof points to the exact version, commands, evidence, and known limitations

  • no release badge, docs mirror, dependency-policy, or trusted-publisher guard is knowingly stale

P1. First-run product experience

Goal:

  • a new user understands what to click, what will happen, and how to recover if it fails

Concrete items:

  • keep the landing page focused on the first proof and remove redundant call-to-action clutter

  • make every wizard action direct: install really installs, execute really runs, analysis opens the right result page, notebook import creates the project without asking the user to locate packaged files

  • keep PROJECT sidebars and advanced controls out of the default path unless they are needed for the current task

  • add deterministic error messages for install/run/delete/import flows and keep spinners scoped to the action that is still running

  • keep examples small enough to finish locally before users attempt cluster, service mode, or external app repositories

Done means:

  • a user can complete the built-in first proof or the notebook first proof without reading source code, finding hidden files, or guessing page actions

P2. Notebook interop and no-lock-in

Goal:

  • teams can enter AGILab from notebooks and leave AGILab back to notebooks without losing the useful work

Concrete items:

  • provide one importable notebook for every public packaged example that is suitable for notebook import

  • preserve explicit manager/worker role metadata while still requiring clear cell-by-cell review when metadata is missing or ambiguous

  • name imported projects predictably, for example flight-telemetry-from-notebook-project

  • keep notebook export positioned as the exit and handoff path, not just a convenience download

  • round-trip the stage order, code, runtime hints, artifacts, and provenance enough for review and reuse outside the AGILab UI

Done means:

  • import and export are documented as a reversible adoption bridge: notebooks can become AGILab projects, and AGILab work can be handed back as notebooks when the workbench is no longer needed

P3. Security and supply-chain posture

Goal:

  • AGILab is safe by default for controlled R&D and explicit about what remains outside the default threat model

Concrete items:

  • keep public UI binding local by default and document the reverse-proxy, authentication, TLS, and network controls required for exposure

  • treat apps, notebooks, generated snippets, and external repositories as executable code that needs review, allowlisting, and isolation

  • keep secrets out of command lines, logs, committed files, generated notebooks, and release evidence

  • regenerate SBOM and dependency audit evidence for the actual install profiles being adopted

  • keep PyPI trusted publishing and action pinning as mandatory release gates

Done means:

  • the docs do not imply production, multi-tenant, regulated-data, or public exposure readiness without the external controls required to make that true

P4. Team and cluster operation

Goal:

  • shared-team and cluster use is diagnosable, bounded, and repeatable

Concrete items:

  • make cluster share setup explicit and refuse cluster mode when no usable shared path is configured

  • keep SSH, SSHFS, LAN discovery, remote path, and share-sentinel diagnostics actionable from both CLI and UI

  • provide a small validation matrix for local, bare-metal cluster, VM-based cluster, AI Lightning, Hugging Face, and cloud targets when evidence exists

  • add service-health gates for long-running service mode: idle policy, unhealthy limit, restart-rate threshold, and machine-readable status

  • separate single-user convenience from multi-user isolation, quotas, and account policy

Done means:

  • a team can distinguish local failure, share failure, worker dependency failure, scheduler failure, and service-health failure without reading tracebacks first

P5. Evidence-driven MLOps bridge

Goal:

  • AGILab stays a workbench, but hands clean evidence to the MLOps and platform systems that own production

  • converge the existing evidence pieces into a portable proof capsule that can be verified, compared, replayed, and handed to another tool without relying on the original developer workspace

Concrete items:

  • strengthen run evidence, release decisions, run diff, artifact provenance, and compatibility profiles as first-class outputs

  • keep the first public proof-pack CLI layer small and verifiable: agilab prove, agilab verify, agilab replay, agilab sign, agilab export-lineage, agilab export-traces, agilab policy-check, agilab cards, and agilab metadata-store operate on run_manifest.json or .agipack evidence where appropriate, write plain JSON evidence, export a hash-verifiable .agipack archive, and support optional detached Ed25519 signatures plus local trust-policy verification before AGILab claims external Sigstore/SLSA attestation verification

  • introduce an Evidence Core contract that bundles the run manifest, workflow snapshot, environment lock, artifact hashes, notebook import/export manifest, optional MLflow references, policy checks, and verifier results as one portable audit surface

  • define the lifecycle around that contract: build, test, version, compare, promote, roll back, and archive evidence bundles with stable hashes and explicit compatibility metadata

  • derive a graph-shaped evidence view that links claims, code, data, models, environments, runs, notebooks, artifacts, cards, MLflow handoff, and publication targets without requiring an external graph service by default

  • support reviewer-oriented evidence queries such as unsupported claims, stale data sources, changed dependencies, policy failures, missing cards, and baseline-vs-candidate deltas

  • keep MLflow integration focused on tracking, artifacts, model registry handoff, and comparison rather than replacing AGILab execution

  • define promotion-ready evidence bundles for apps, imported notebooks, and cluster runs

  • add hooks for monitoring, drift, feature stores, orchestration engines, and serving platforms without claiming those systems are built into AGILab

Current shipped baseline:

  • proof-pack directory export from a run manifest with verification report, policy report, OpenLineage-shaped JSON, RO-Crate metadata, OpenTelemetry-shaped trace JSON, local metadata-store entry, and model/dataset/prompt/eval cards

  • hash-verifiable .agipack archive export with optional detached Ed25519 signatures and local JSON/TOML trust-policy verification

  • replay is safe by default: it prints the recorded command unless the operator explicitly passes --execute

  • policy-as-code starts with a small JSON/TOML gate over the manifest checks instead of a full external policy engine

  • Excel-shaped proof preview from packaged examples: a workbook, refresh-friendly CSV folder, and JSON hash evidence, intentionally before any Office add-in or arbitrary workbook import claim

  • Voila-shaped notebook proof preview from packaged examples: a dashboard notebook, hide-code manifest, widget-to-args migration hints, app-view plan, static HTML preview, and JSON hash evidence, intentionally before any Voila server integration or agilab[voila] claim

Spreadsheet adoption bridge:

  • keep the shipped preview dependency-light and file-based until there is user pull for deeper Excel integration

  • add a focused agilab excel-proof --workbook <file> --sheet <name> --out <file> command only after the preview validates the user story

  • write an AGILAB Evidence worksheet, refreshable CSV/parquet outputs, and a JSON evidence bundle for workbook runs

  • keep Office add-ins, macro execution, tenant deployment, and formula-preserving arbitrary workbook import as later product-expansion work, not current claims

Notebook dashboard adoption bridge:

  • keep the shipped preview dependency-light and file-based until there is user pull for deeper Voila integration

  • add a focused agilab notebook-proof dashboard.ipynb --app-name <name> command only after the preview validates the user story

  • promote stable ipywidgets to app arguments while keeping app-specific dashboard code inside the app project

  • keep a future agilab[voila] extra, server launch, and multi-user dashboard deployment as later product-expansion work, not current claims

Evidence Core roadmap:

  • define stable evidence node and edge types for claim, code, data, model, environment, run, notebook, artifact, policy, card, and publication

  • generate deterministic JSON, JSON-LD, or GraphML exports from proof-pack evidence so reviewers can inspect lineage and audit claims outside the UI

  • include source manifests, schema or ontology mappings, evidence policies, card metadata, reducer summaries, connector provenance, and imported-notebook metadata when the originating app can provide them

  • design inspection and verification surfaces around the existing proof commands instead of adding a second execution runtime

  • make the lifecycle explicit in CLI and UI language: a bundle can be built, checked, diffed, promoted, archived, or rejected without implying production certification

  • keep the baseline local-first and file-based; graph databases, vector indexes, message buses, or workflow-control services should remain optional adapters until the portable evidence contract is stable

Context-engineering gaps to close:

  • reusable workflow blueprints that capture stages, runtime parameters, prompt/tool settings, connector requirements, expected artifacts, evidence outputs, and smoke-test expectations

  • versioned prompt, tool, and agent configuration manifests that can be reviewed and replayed with the same rigor as app code

  • schema and ontology mapping hooks for apps that transform unstructured or semi-structured data into reviewable evidence, while keeping domain-specific extraction optional

  • lightweight deployment or run-profile generation that writes the commands, environment assumptions, and validation plan for local, notebook, cluster, service, or demo runs without becoming a Kubernetes platform

  • observability-lite evidence for execution latency, failure source, queue or worker backlog, artifact volume, token/cost metrics when an LLM is involved, and service-health state when a long-running service is used

Remaining state-of-the-art scope:

  • external Sigstore/SLSA attestation for signed .agipack archives and a third-party verifier path that validates provenance without a local trust root or source checkout

  • OpenLineage transport integration to emit events to an external lineage backend, not only write an interoperable JSON payload

  • native OpenTelemetry SDK/OTLP instrumentation across Streamlit actions, worker build, distributed execution, notebook export, MLflow handoff, and agent runs

  • durable ML metadata backend, for example SQLite/Postgres/MLMD-compatible storage, with query APIs for datasets, models, prompts, runs, artifacts, and lineage

  • app-declared model cards, data cards, prompt cards, and evaluation cards with domain metadata rather than evidence-only placeholders

  • richer policy-as-code, potentially OPA/Rego-compatible, for adoption gates, promotion gates, release gates, and sensitive-data gates

  • capability-based sandboxing for generated code, notebooks, and agent runs: explicit filesystem, network, secret, and subprocess scopes

  • first-class agent eval traces beyond the shipped local agent-run cards: prompt/tool/file timeline detail, diff evidence, replay, scoring, and safety policy results

  • monitoring and drift handoff adapters for production systems without turning AGILab into the production control plane

  • enterprise controls for shared deployments: secrets backend integration, authentication, RBAC, audit logs, and tenant isolation

State-of-the-art upgrade backlog:

Priority

Missing capability

First shippable AGILAB contract

Done when

P0

External proof-capsule attestation

Sigstore / SLSA provenance sidecars bound to the signed .agipack signature and trust-policy verification path

A reviewer can verify archive integrity, signer identity, build provenance, and allowed issuer without the original source checkout or a local-only trust root.

P0

Native OpenTelemetry and GenAI traces

agilab export-traces --otlp-endpoint ... plus SDK spans for UI actions, worker build, distributed execution, notebook export, MLflow handoff, and agent runs

A run has correlated trace IDs across manager, worker, notebook, and agent events, and an OTLP collector can ingest them without a custom converter.

P0

OpenLineage transport

agilab emit-lineage --backend openlineage --url ... using START, RUNNING, COMPLETE, and FAIL events with stable job, run, dataset, and parent-run facets

Marquez/OpenLineage-compatible backends can show AGILAB datasets, artifacts, and job hierarchy from emitted events, not only from local JSON files.

P1

Continuous eval and LLMOps loop

eval_manifest.json, app-declared scorer contracts, baseline/candidate comparisons, optional MLflow GenAI evaluation handoff, and promotion gates

A prompt, model, agent, or notebook change can fail promotion because quality, cost, latency, safety, or regression scores moved outside policy.

P1

Durable evidence metadata backend

Local SQLite first, then Postgres-compatible storage for runs, artifacts, datasets, models, prompts, cards, lineage, and policy reports

agilab metadata-store query ... can answer reviewer questions across runs without scanning ad hoc proof folders.

P1

Production serving handoff

KServe/Ray Serve/Seldon-style manifest export with model URI, environment, health checks, canary/rollback hints, and prediction-log evidence contract

AGILAB can hand a reproducible candidate to a serving platform while staying out of production traffic control.

P2

App-authored cards

App-owned model, data, prompt, and eval card schemas with required domain fields and evidence links

Cards are no longer placeholders inferred from run manifests; apps declare meaningful review metadata.

P2

Policy-as-code and capability sandboxing

OPA/Rego-compatible gates plus filesystem, network, secret, subprocess, and package capability declarations for generated code, notebooks, and agents

Unsafe execution paths are blocked before run, and the proof bundle records the policy decision and granted capabilities.

P2

Enterprise shared-deployment controls

Secrets-backend integration, authentication, RBAC, audit logs, and tenant-isolation hooks

Shared deployments can be evaluated with explicit platform controls instead of relying on local-workbench assumptions.

This backlog is intentionally adapter-first. AGILAB should produce portable contracts and evidence that platform tools can consume; it should not hide production serving, governance, or observability responsibilities inside the workbench.

Agent skills and resource evidence hardening:

  • keep the public agent discovery surface generated from the repo-managed skill source of truth: AGENT_SKILLS.md, llms.txt, llms-full.txt, and the Skills / Standard / Works with badges must remain generated artifacts, not hand-maintained marketing copy

  • add explicit compatibility metadata to each repo-managed skill: supported agents, required tools, write/network/subprocess expectations, local service assumptions, secret/environment requirements, and expected evidence outputs

  • close the current skill-scan hygiene gaps by replacing private absolute paths with placeholders and by documenting deliberate network or environment access in skill metadata rather than leaving it implicit in instructions

  • mature the skill security scanner from a deterministic local guard into a review surface: baseline or allow-list support, SARIF output, sticky PR comments, severity policy, and optional comparison with external agent-skill scanners before enforcing stronger gates

  • attach a resource_snapshot.json to agent runs, first-proof runs, heavy page proofs, and release evidence when resource state explains scheduling, reproducibility, or performance choices

  • feed resource snapshots and cluster inventory into scheduler recommendations and future autoscale decisions, while keeping autoscale behavior explicit and auditable rather than silently changing execution topology

  • build on the shipped agilab agent-run evidence commands (list, handoff, next, context, and lineage) by adding skill identity, skill version, resource snapshot, changed files, richer command timeline, and resulting proof-artifact links so multi-agent work can be replayed or reviewed together

  • publish agent-surface validation as release evidence: generated catalogs, badge freshness, changed-skill scan reports, and resource snapshot checks should be archived alongside SBOM, pip-audit, hashes, and provenance

Done means:

  • an experiment can be reviewed, compared, promoted, or rejected from evidence that is versioned, portable, and honest about its execution environment

  • the same evidence can be exported as a single proof capsule without claiming production certification

P6. Extension architecture and maintainability

Goal:

  • new apps, pages, connectors, and workflow features follow stable patterns instead of accumulating one-off glue

Concrete items:

  • keep public APIs, app templates, page metadata, connector models, reducer contracts, and workflow stage contracts explicit and tested

  • add flow blueprint contracts for reusable workflow presets, including the app/page inputs, parameter schema, prompt/tool configuration, connector requirements, expected artifacts, and evidence smoke

  • use design patterns to separate UI, orchestration, runtime execution, artifacts, and evidence generation

  • keep prompts, agent tools, MCP-style connectors, and runtime-tunable settings versioned and reviewable instead of hidden in page state

  • add pattern-gated checks before new workflow or notebook-import behavior can bypass existing contracts

  • keep strict typing and focused tests on shared helpers that affect many apps

  • document deprecations with migration paths and removal dates

Done means:

  • future features can be added by extending clear contracts, not by duplicating page-specific or app-specific behavior

P7. Ecosystem and distribution

Goal:

  • AGILab is easy to adopt incrementally through public packages, app packages, demos, and external repositories without locking users into one layout

Concrete items:

  • keep PyPI packages for publishable apps small, named consistently, and backed by trusted publishing

  • keep Hugging Face and public demos aligned with the same release evidence as the repository

  • provide clear app repository update, install, rename, and migration behavior instead of compatibility aliases for stale local copies

  • keep public app packaging and private app validation as separate lanes: public agi-app-* packages use PyPI, entry points, wheel/sdist metadata, and provenance checks; private or non-public apps still need a pinned validation model that does not vendor private code into the public repository

  • evaluate a private-side app validation manifest that records the external app repository origin, commit SHA, app path, runtime/package constraints, and expected validation commands while preserving APPS_REPOSITORY symlinks as the lightweight local-development shortcut

  • publish only examples that meet content-quality, install/run, README, and notebook-import criteria

Done means:

  • users can adopt one app, one notebook import, one demo, or the full workbench without discovering different contracts for each path

Professional execution backlog

Treat this as the delivery order. Lower-priority feature work should not displace higher-priority adoption, release, and safety work unless there is an explicit product decision.

Priority 1. Clean release lane

Ship only when the public package, public docs, release proof, coverage badges, trusted publishing, Hugging Face copy, and first-proof commands all describe the same release.

Acceptance gate:

  • ./dev release passes from a clean checkout, starting with its strict AGILAB audit/review gate before impact, PyPI, docs, typing, and badge checks

  • the GitHub release workflow mirrors the same audit-first validation order

  • ./dev docs and the release proof report pass from a clean checkout

  • the release proof names the exact version, validation routes, and known non-certified environments

  • the publish workflow shows which split packages were uploaded and which were intentionally reused because their wheel/sdist artifacts were unchanged

  • no manual release note, README, public docs page, or demo copy contradicts the package that was published

Why first:

  • professional adoption starts by trusting the published artifact, not by trusting the developer machine

Priority 2. Notebook import parity

Every public example that is advertised as importable from a notebook must ship with a notebook sample, deterministic metadata, and an imported-project smoke that proves INSTALL and EXECUTE behave like the original app.

Acceptance gate:

  • each supported sample creates a predictably named <example>-from-notebook-project

  • manager/worker cell roles are explicit or force review before project creation

  • the release smoke keeps at least one imported notebook project in the guarded create -> install -> execute -> analysis path

Why now:

  • notebook import is a unique adoption bridge only if users can prove that the imported project still runs

Priority 3. First-run wizard contract

The default UI path must be direct: buttons perform the action they promise, and the next required user action is visible before navigation.

Acceptance gate:

  • built-in first proof: select demo, install, execute, and analysis are all direct and recoverable

  • notebook first proof: create from packaged notebook does not require the user to find a hidden file

  • spinners, success messages, and failure messages are scoped to the action that actually ran

Why now:

  • first-run confusion makes the product feel experimental even when the backend works

Priority 4. Runtime failure diagnostics

Failures must classify themselves before showing raw tracebacks.

Acceptance gate:

  • install/run/delete/import failures distinguish dependency, path, archive, project-state, cluster-share, worker-copy, and scheduler failures

  • stale local app directories and stale worker environments produce actionable remediation

  • corrupted archives and invalid imported notebooks fail fast with a concise cause and a safe next step

Why now:

  • professional users can tolerate failures; they cannot tolerate unclear failures

Priority 5. Security and shared-use hardening

Keep controlled local R&D easy, but make hardened shared-team use a clear go path when explicit controls pass.

Acceptance gate:

  • public bind, external app repositories, notebook import, generated code, secrets, cluster accounts, and service mode each have an explicit guard or operator checklist

  • security checks can emit machine-readable results for the selected install profile

  • the docs continue to reject standalone production, public, multi-tenant, or regulated-data claims without external hardening

Why now:

  • adoption grows only if the boundary between safe default use and hardened use stays explicit

Priority 6. Cluster and team operation

Cluster mode should be a supported team workflow, not a best-effort advanced demo.

Acceptance gate:

  • cluster requests fail when no usable shared path exists

  • SSH, SSHFS, LAN discovery, remote path, and share-sentinel checks are exposed through CLI and UI diagnostics

  • validation evidence covers local, bare-metal cluster, VM cluster, AI Lightning, Hugging Face, and cloud targets only where each route has actually been tested

Why now:

  • distributed execution is a core differentiator only when setup and failure modes are operationally clear

Priority 7. Evidence and promotion workflow

AGILab should make it easy to decide whether a run, app, notebook import, or cluster validation is ready to reuse, publish, or hand off.

Acceptance gate:

  • run evidence, release decisions, compatibility reports, run diff, artifact provenance, and supply-chain evidence share stable schemas

  • evidence bundle lifecycle actions are explicit: build, verify, diff, promote, archive, reject, and roll back

  • evidence bundles expose a graph-shaped index that links the run manifest, artifacts, notebook exports, optional MLflow references, policy results, cards, and human-readable claims

  • schema/ontology mappings and source manifests are captured when an app creates derived evidence from structured, semi-structured, or unstructured inputs

  • evidence bundles can be consumed outside AGILab by reviewers, CI, MLflow, or platform teams

  • external graph, vector, streaming, or workflow-control infrastructure remains optional; the local proof pack stays the portable baseline

  • promotion decisions state what passed, what failed, and what is out of scope

Why now:

  • this is the bridge between a useful workbench and professional engineering governance

Priority 8. Connector-backed data access

Move data access from repeated path settings to declarative connectors.

Acceptance gate:

  • SQL, OpenSearch/ELK, object storage, local paths, and simulation backends use connector definitions instead of page-specific path glue where practical

  • connector health checks stay operator-triggered and do not leak credentials

  • import/export provenance names the connector and artifact source

Why now:

  • professional workflows fail when data paths are machine-specific or invisible

Priority 9. Extension and design-pattern guardrails

New app, page, workflow, notebook, connector, and reducer behavior should extend stable contracts instead of adding special cases.

Acceptance gate:

  • public app templates, page metadata, pipeline stages, notebook import roles, reducers, connectors, and evidence reports have focused tests

  • flow blueprints can be validated without launching the full UI, and they name their prompts, tools, connectors, expected artifacts, and evidence reports

  • pattern-gated checks block new workflow behavior that bypasses the shared contracts

  • deprecations include a migration path and removal target

Why now:

  • long-term maintenance depends more on repeatable patterns than on another feature page

Priority 10. Curated app ecosystem

Publish fewer apps, but make every published app useful, named well, documented, installable, runnable, and importable when it claims notebook support.

Acceptance gate:

  • app packages use consistent agi-app-* names, trusted publishing, and clean metadata

  • private/non-public app validation has a reproducible pinned-revision option for CI and release checks, while the local APPS_REPOSITORY symlink workflow remains available for day-to-day development

  • example READMEs explain purpose, inputs, outputs, install/run path, notebook import status, and limitations

  • app repository update behavior wins over stale local copies without hidden compatibility aliases

Why now:

  • app quality is the most visible proof that the platform contract works

Priority 11. Multi-app DAG productization

Productize multi-app orchestration only after the release, first-run, notebook, diagnostic, and evidence layers are stable.

Acceptance gate:

  • WORKFLOW can show, validate, and execute a product-level DAG with persisted operator-visible state

  • retry, partial rerun, dependency visualization, and artifact handoff are visible in the same operator surface

  • the shipped two-app executable DAG remains the regression baseline before broader DAG coverage is claimed

Why later:

  • multi-app DAGs are high-value, but they amplify every weak contract beneath them

Priority 12. Observability and MLOps handoff

Integrate with observability and MLOps platforms without claiming to replace them.

Acceptance gate:

  • MLflow remains the tracking and registry handoff path

  • OpenSearch/Grafana/Superset-style integrations consume AGILab evidence and telemetry instead of duplicating app logic

  • AGILab first emits a small, stable telemetry envelope for run latency, failure source, worker backlog, artifact volume, LLM token/cost metrics when present, and service-health state before adding dashboards

  • profile generators can write local, notebook, cluster, service, and demo run instructions with expected validation commands and evidence outputs

  • production serving, drift detection, feature stores, and enterprise governance are framed as external platform integrations

Why later:

  • observability is most useful after run evidence and operational status are already consistent

Explicit non-priorities until the above is stable

  • broad public OS, GPU, cloud, or network certification without matching run evidence

  • production multi-tenant claims without external identity, isolation, quotas, secrets management, audit, and monitoring controls

  • generic dashboards that are not tied to AGILab runs, artifacts, or decisions

  • always-on graph/vector/message-bus services as a requirement for local proof generation before the file-based evidence contract is stable

  • runtime prompt/tool mutation without a versioned manifest, review trail, and replay evidence

  • new app publishing when the app lacks a clear purpose, deterministic first run, README, evidence, and package metadata

Feature sequencing after the professional baseline

If the goal is near-term product sequencing rather than broad idea collection, use this order after the P0-P2 professionalization gates are under control:

  1. Multi-app DAG orchestration productization

    • let WORKFLOW represent one orchestrated DAG across the full workflow, not just one app-local execution view

    • build on the shipped multi-app DAG contract, read-only global pipeline DAG report, pending execution-plan report, read-only runner state, and persisted dispatch-state proof plus the two-unit app dispatch smoke, operator-state report, dependency-view report, live-update payload report, operator-action execution report, and operator-UI report

  2. Bidirectional notebook interop

    • build on the shipped supervisor-notebook export and analysis-page launcher metadata

    • add notebook-to-pipeline import maturity and optional single-kernel union-environment notebooks when stage environments are compatible

  3. Data connector facility

    • make SQL, ELK, object storage, and other external data sources first-class connector targets

    • build on the shipped data connector facility report for SQL, OpenSearch, and object-storage definitions plus the data connector resolution report for connector-aware app/page resolution

    • add the shipped data connector health report for operator-gated probe planning without live public network checks

    • add the shipped data connector health actions report for explicit operator-triggered health probe rows

    • add the shipped data connector runtime adapters report for credentialed runtime bindings without materializing secrets in public evidence

    • add the shipped data connector UI preview report for static connector state and provenance review

    • add the shipped data connector live UI report for Release Decision Streamlit integration without connector network probes

    • add the shipped data connector app catalogs report for app-local connector catalogs across every non-template built-in app

    • this turns connector work into a practical data-access layer, not just path cleanup

  4. Reduce contract adoption

    • AGILab already has distributed work-plan execution and an initial shared reducer contract

    • the public reducer benchmark now validates 8 partials / 80,000 synthetic items in 0.003s against a 5.0s target

    • execution_pandas_project and execution_polars_project now emit named benchmark reduce artefacts through that contract

    • flight_telemetry_project now emits trajectory-summary reduce artefacts through that contract

    • uav_queue_project now emits the same reduce_summary_worker_<id>.json artifact shape for queue metrics

    • uav_relay_queue_project now emits that shared queue-metrics reduce artifact shape too

    • weather_forecast_project now emits forecast-metrics reduce artefacts

    • Release Decision now surfaces benchmark, flight, weather forecast, and UAV queue-family reduce artefacts as evidence

    • a repository guardrail now requires every non-template built-in app to expose a reducer contract

    • minimal_app_project and multi_app_dag_project are the explicit template-only exemptions because they have no concrete merge output yet

    • future apps/templates must opt in when they produce durable worker summaries

  5. Intent-first operator mode

    • valuable, but it benefits from the cleaner evidence, compatibility, and connector contracts above

  6. Elasticity and active mesh optimization

    • keep the current public claim bounded: a compact Active Mesh Optimization teaching route exists, but it is centralized-policy evidence, not full decentralized MARL certification

    • harden the shipped route by comparing baseline versus adaptive-network outcomes, then extending the evidence to failure injection and train-then-serve handoff

    • use moving nodes such as aircraft, UAVs, or satellites as active agents that can adapt trajectory or routing behavior to improve network KPIs

    • avoid duplicating experiment tracking or model-registry concepts; the differentiator should be closed-loop execution and evidence, not another metrics UI

Why this order:

  • turn the shipped manifest remediation baseline and CI artifact harvest contract into external evidence import and release indexes before broader onboarding automation

  • build global orchestration on the shipped cross-app contract and read-only product graph plus pending execution plan instead of claiming runner behavior before it exists

  • keep notebook interop after the orchestration state model is clearer

  • stabilize contracts before standardizing distributed reduction

  • keep operator refinements downstream of the proof/evidence layer

  • keep any broader MARL claim downstream of reproducible execution, baseline/candidate comparison, failure-injection evidence, service-contract handoff, and the shared evidence contract

Streamlit-inspired AGILab views

The most promising Streamlit-style view patterns for AGILab are not generic gallery clones. They are focused application views that reinforce AGILab’s core value: orchestration, evidence, and domain-specific interaction.

1. Experiment Cockpit

Purpose:

  • compare runs quickly

  • inspect KPI summaries

  • open artefacts and benchmark results from one page

Suggested layout:

  • KPI cards on top

  • run filters and selectors on the left

  • comparison charts in the center

  • run table and artefact links below

Why it matters:

  • best value-to-effort ratio

  • directly useful across many AGILab apps

2. Evidence / Release View

Purpose:

  • decide whether a run, model, or artefact bundle is promotable

Suggested layout:

  • release decision banner

  • pass/fail gate checklist

  • baseline vs candidate KPI comparison

  • provenance and reproducibility panel

  • evidence bundle table

Why it matters:

  • strong differentiator for AGILab

  • aligns with evidence-driven engineering and promotion workflows

3. Scenario Playback View

Purpose:

  • replay a run over time

  • inspect state, actions, and KPI evolution together

Suggested layout:

  • run selector and time slider

  • map or network panel

  • current decision-state panel

  • KPI timeline and event log

Why it matters:

  • strong demonstration value

  • good fit with existing AGILab map/network views

4. Realtime Analytical and Geospatial Views

Purpose:

  • inspect dense live data without degrading interaction quality

  • support higher-frequency analysis for KPI, maps, and network state

Recommended direction:

  • use Plotly.js/WebGL first for analytical views such as KPI timelines, run comparison, monitoring, and large point clouds

  • use deck.gl for dense geospatial and network overlays

  • use Three.js only for specialized 3D mission views where depth is part of the meaning, such as orbital or spatial playback

Why it matters:

  • gives AGILab a practical realtime analysis layer without committing to custom low-level WebGL infrastructure

  • fits existing AGILab needs better than a generic “WebGL support” initiative

  • opens a clear path for performance gains in monitoring and playback views

5. Run Diff / Counterfactual Analysis

Purpose:

  • compare two runs and explain what changed in a way that is directly useful to engineers and reviewers

  • turn raw deltas into defensible reasoning about outcomes

Suggested scope:

  • input and configuration diff

  • topology and artefact diff

  • allocation and decision diff

  • KPI delta summary

  • candidate-vs-baseline narrative focused on the most material changes

Current shipped baseline:

  • agilab.run_diff_evidence.v1 defines a first no-execution run-diff evidence contract for public review

  • tools/run_diff_evidence_report.py --compact compares static baseline/candidate KPI checks, run manifests, and artifact rows, then emits counterfactual prompts for material deltas

  • the KPI evidence bundle includes this as run_diff_evidence_report_contract and verifies zero command, live-execution, and network-probe counts

  • tools/revision_traceability_report.py --compact validates agilab.revision_traceability.v1 and fingerprints repository HEAD, AGI core package versions, and built-in app manifests without invoking git commands or querying networks

  • tools/public_certification_profile_report.py --compact validates agilab.public_certification_profile.v1 and turns the compatibility matrix into a bounded_public_evidence certification profile without production or third-party certification claims

  • tools/supply_chain_attestation_report.py --compact validates agilab.supply_chain_attestation.v1 and fingerprints package metadata, lockfile, license, bundled AGI core versions, and built-in app manifests without formal supply-chain attestation claims

  • tools/ci_artifact_harvest_report.py --compact now defines the no-network external-machine attachment contract for run manifests, KPI bundles, compatibility reports, and promotion decisions

  • Release Decision can import ci_artifact_harvest.json, display harvested artifact status/checksum/provenance rows, block invalid harvests, and export ci_artifact_harvest_summary plus ci_artifact_harvest_evidence inside promotion_decision.json

  • tools/github_actions_artifact_index.py --archive converts downloaded GitHub Actions artifact ZIPs into a harvest-compatible artifact_index.json, and its opt-in --live-github path can query/download workflow-run artifacts when credentials are available

  • tools/ci_provider_artifact_index.py --provider gitlab_ci --archive converts downloaded GitLab CI or generic provider artifact ZIPs into the same harvest-compatible artifact_index.json without querying live provider APIs

  • the same tool supports opt-in --live-gitlab for credentialed GitLab CI pipeline artifact queries/downloads

  • tools/compatibility_report.py --artifact-index can derive per-release compatibility status from those downloaded artifact indexes or from ci_artifact_harvest.json summaries

  • the pypi-publish release workflow includes a release-evidence job that uploads sample external evidence, retrieves it through the live GitHub Actions artifact API with --live-github, and validates the resulting artifact index through the harvest and compatibility reports before publish jobs proceed

Remaining scope:

  • add richer domain-specific explanations for allocation, topology, and decision deltas

  • run non-GitHub live provider API harvests in credentialed operator CI

Why it matters:

  • high value for debugging, review, and evidence-driven engineering

  • fits AGILab better than generic BI dashboards because it stays tied to runs, artefacts, and orchestration decisions

  • creates a strong bridge between experimentation and promotion workflows

6. Multi-app DAG orchestration

Purpose:

  • extend orchestration from one app flow to DAGs that span multiple apps

  • make inter-app dependencies explicit instead of hiding them in manual glue

Current shipped baseline:

  • agilab.multi_app_dag.v1 defines the first portable cross-app DAG contract

  • docs/source/data/multi_app_dag_sample.json links uav_queue_project to uav_relay_queue_project through the explicit queue_metrics handoff

  • docs/source/data/multi_app_dag_portfolio_sample.json broadens the contract-only sample suite across flight_telemetry_project, weather_forecast_project, execution_pandas_project, and execution_polars_project

  • tools/multi_app_dag_report.py --compact validates schema, checked-in app nodes, acyclic dependencies, docs references, artifact handoffs, and the two-sample DAG suite

  • the KPI evidence bundle includes this as multi_app_dag_report_contract

  • the multi-app DAG report family now covers execution planning, persisted dispatch state, real two-app app-entry smoke execution, operator state, dependency views, live-update payloads, operator actions, and static operator UI proof for the checked-in queue_baseline -> relay_followup contract

Remaining scope:

  • no open report-driven contract gap remains for the shipped two-app executable DAG baseline or the broader contract-only sample suite

  • future work is broader app coverage, placement in the live product surface, external validation, and production hardening

Why it matters:

  • the contract closes the first bridge between app-local execution and a product-wide orchestrated workflow

  • the remaining work is scale and hardening rather than missing public evidence for the shipped two-app baseline

7. Multi-app DAG orchestration productization

Purpose:

  • turn the checked-in multi-app DAG, execution plan, read-only runner state, and persisted dispatch-state proof into live app execution with persisted operator-visible status

Current shipped baseline:

  • tools/global_pipeline_dag_report.py --compact assembles one read-only product-level graph from docs/source/data/multi_app_dag_sample.json

  • the graph expands uav_queue_project and uav_relay_queue_project through their checked-in pipeline_view.dot files

  • the graph preserves the cross-app queue_metrics artifact edge and reports app nodes, app-local stage nodes, app-local edges, and execution order

  • tools/global_pipeline_execution_plan_report.py --compact converts the graph into ordered runnable units in pending/not_executed state, marks queue_baseline ready, marks relay_followup blocked on queue_metrics, and records provenance for the DAG and each app-local pipeline view

  • tools/global_pipeline_runner_state_report.py --compact projects the plan into read-only runner state, marks queue_baseline as runnable, marks relay_followup as blocked, and records transition, retry, partial-rerun, operator-message, and provenance metadata without executing apps

  • the WORKFLOW page now includes an expanded Workflow graph surface that can choose project workflow or multi-app DAG scope, edit steps, created outputs, and used outputs through selector-driven workspace drafts and read-only summaries, validate the plan without hand-editing docs files, reset the persisted preview state, show readiness KPIs, optional graph and output details, and preview the next ready step without claiming live app execution

  • tools/global_pipeline_dispatch_state_report.py --compact writes and reads back a persisted dispatch-state JSON proof, records queue_baseline completion, publishes queue_metrics, marks relay_followup runnable, and preserves timestamps, retry counters, partial-rerun flags, operator messages, and provenance without executing apps

  • tools/global_pipeline_app_dispatch_smoke_report.py --compact executes queue_baseline and relay_followup through the real checked-in uav_queue_project and uav_relay_queue_project manager/worker entries, writes the actual queue_metrics, relay_metrics, and reducer artifacts, and persists them in dispatch-state JSON

  • tools/global_pipeline_operator_state_report.py --compact reads that persisted full-DAG dispatch state and exposes completed unit state, queue-to-relay handoffs, available artifacts, and retry/partial-rerun action rows for future operator flows

  • tools/global_pipeline_dependency_view_report.py --compact reads the operator-state proof and exposes upstream/downstream dependency visualization for queue_baseline -> relay_followup, including the queue_metrics edge, producer/consumer apps, adjacency lists, and artifact-flow rows

  • tools/global_pipeline_live_state_updates_report.py --compact reads the dependency view and emits deterministic live orchestration-state updates for graph-ready, unit-state, artifact-state, dependency-state, and operator-action refresh payloads; this is an update contract, not a streaming service or UI renderer

  • tools/global_pipeline_operator_actions_report.py --compact reads the live-update payloads, accepts queue_baseline:retry and relay_followup:partial_rerun, replays the corresponding queue and relay app entries, and persists action outcomes plus output artifacts

  • tools/global_pipeline_operator_ui_report.py --compact reads the action outcomes and renders status, unit-card, dependency-graph, update-timeline, action-control, and artifact-table components into a static HTML proof

  • the compact KPI bundle includes this as global_pipeline_dag_report_contract, global_pipeline_execution_plan_report_contract, global_pipeline_runner_state_report_contract, and global_pipeline_dispatch_state_report_contract, plus global_pipeline_app_dispatch_smoke_report_contract and global_pipeline_operator_state_report_contract and global_pipeline_dependency_view_report_contract and global_pipeline_live_state_updates_report_contract and global_pipeline_operator_actions_report_contract and global_pipeline_operator_ui_report_contract

Remaining scope for this item:

  • no open report-driven contract gap remains for the multi-app DAG runner/UI baseline; future work is product hardening, placement, and broader external validation

Why it matters:

  • the report gives AGILab a clearer product story than isolated per-app pipelines without overclaiming execution

  • live UI state is still needed before the orchestration layer is fully visible to operators and reviewers

8. Bidirectional notebook interop

Purpose:

  • complete the bridge between notebooks and AGILab pipelines without hiding per-stage runtime constraints

Current shipped baseline:

  • WORKFLOW can already export a supervisor notebook that preserves stage provenance, runtime metadata, and per-stage execution context

  • exported notebooks can include related analysis-page launcher helpers when an app declares them

  • tools/notebook_pipeline_import_report.py --compact now validates the first notebook-to-pipeline import contract from a checked-in .ipynb; it preserves markdown context, code cells, import hints, execution-count metadata, and artifact references as not_executed_import pipeline-stage evidence, writes a richer lab_stages.toml preview, and feeds the existing WORKFLOW upload path

  • tools/notebook_roundtrip_report.py --compact validates lab_stages.toml -> supervisor notebook -> import -> lab_stages preview preservation for saved stage description, prompt, model, code, runtime, import hints, and artifact references

  • tools/notebook_union_environment_report.py --compact validates a single-kernel union notebook candidate only for compatible runpy / current-kernel stages and records supervisor_notebook_required for mixed runtime or mixed-environment pipelines

  • this is intentionally not the same thing as flattening a multi-venv pipeline into one notebook kernel

  • packaged examples now include a dependency-light Voila-shaped notebook proof preview that records widget-to-args hints, a hide-code manifest, an app-view plan, and evidence hashes without launching a Voila server

Suggested scope:

  • harden notebook-to-pipeline import beyond the initial report and upload path, including broader edge cases for exported supervisor notebooks

  • make notebook-native analysis surfaces or Voilà-style packaging possible without duplicating the current apps-pages logic blindly

  • preserve enough provenance so the notebook remains explainable

Why it matters:

  • reduces the gap between exploratory notebook work and reproducible product workflows

  • gives teams a practical adoption bridge instead of a one-way migration story

Logging modernization

Purpose:

  • improve developer and operator logging without breaking compatibility across Streamlit, workers, subprocesses, and distributed services

Recommended direction:

  • keep Python stdlib logging plus AgiLogger as the canonical runtime logging contract

  • add real child logger support, structured JSON output, and stable context fields such as app id, host, worker, and run id

  • keep the current colorized human console output as the default local developer mode

  • treat loguru as an optional choice only for isolated helper scripts or local tools that do not need full stdlib logging interoperability

  • do not plan a repo-wide migration to loguru unless stdlib logging becomes a demonstrated blocker for AGILAB runtime requirements

Why it matters:

  • AGILAB already spans third-party libraries and multi-process surfaces that integrate naturally with stdlib logging

  • the real missing capability is structured context and better logger hierarchy, not a new logging syntax

  • this keeps the logging contract stable while still making observability stronger

Backend observability and audit architecture

AGILab should keep application-specific interaction inside the product and move generic observability, search, and fleet-level monitoring into tools designed for that job.

1. Elastic or OpenSearch + Grafana

Best when:

  • engineering operations and observability are the main priority

Good for:

  • run health

  • worker load

  • stage latency

  • failures and alerts

  • SLA-style monitoring

Why it matters:

  • strongest near-term operational value

  • clean split between AGILab interaction and backend observability

2. OpenSearch + OpenSearch Dashboards

Best when:

  • auditability, search, and historical traceability are the main priority

Good for:

  • log exploration

  • artefact traceability

  • historical run search

  • saved audit dashboards

Why it matters:

  • lowest friction for Kibana-like usage patterns

3. Postgres + Superset

Best when:

  • structured KPI analytics and management reporting are the main priority

Good for:

  • curated dashboards

  • cross-project reporting

  • evidence trend analysis

  • management-facing analytics

Why it matters:

  • stronger fit than Elastic-native tools for BI-style reporting

Connectors and integration

Connectors should appear explicitly in the roadmap because they are not just implementation detail. They determine how AGILab reaches external systems, resolves artefacts, and keeps app workflows portable.

Audience bridge strategy

The highest-leverage audience bridge is a Quarto / R / notebook bridge, not an R-native worker rewrite. AGILAB should stay the reproducible execution and evidence engine while bridges let each community consume that evidence in its normal workflow.

The dependency-light bridge MVP baseline now exposes these commands:

  1. Quarto / R report bridge: agilab export quarto and agilab run quarto

  2. read-only MCP evidence server and agent evidence cards: agilab mcp serve --read-only, agilab agent-run list, agilab agent-run handoff, agilab agent-run next, agilab agent-run context, agilab agent-run lineage, and agilab agent-run compare, plus agilab agent-run validate

  3. Hugging Face Docker Space exporter: agilab export hf-space

  4. MLflow JSON handoff: agilab export mlflow and agilab import mlflow

  5. VS Code / devcontainer onboarding: agilab init vscode

  6. DuckDB SQL bridge: agilab run duckdb

  7. Airflow / Dagster handoff exporters: agilab export airflow-dag and agilab export dagster-job

The current R-stage smoke app remains the payload-plane proof for external Rscript execution. Remaining roadmap work is to deepen each bridge with community-native packages, richer artifact previews, and production handoff polish while keeping R-native worker changes out of shared core until the app-local contract proves broader value.

See Audience bridges for the detailed bridge ranking, MVP scopes, and implementation order.

1. Connector framework hardening

Purpose:

  • make connector-backed workflows more predictable and portable

Focus areas:

  • path portability

  • artefact resolution

  • stable source and target contracts

  • less app-specific path glue

  • clearer connector diagnostics

Why it matters:

  • reduces friction across apps

  • makes automation more reusable

  • lowers the gap between conceptual workflows and executable stages

Connector integration change request

The concrete change request behind this roadmap item is to replace repeated raw path settings in app_settings.toml with references to reusable connector definition files.

Current problem:

  • pages such as view_maps_network rely on many low-level path keys

  • the same path logic is repeated across settings files

  • defaults are more machine-specific than they should be

  • page code must interpret too many raw path parameters directly

Proposed direction:

  • introduce a declarative Connector model

  • store connector definitions in plain-text TOML files

  • let app_settings.toml reference those connector files instead of embedding all path details inline

Completed baseline:

  • tools/data_connector_facility_report.py --compact validates first-class SQL, OpenSearch, and object-storage connector definitions without network probes

  • tools/data_connector_resolution_report.py --compact resolves connector IDs from an app-settings-style sample, validates connector-aware app/page resolution, and preserves legacy_path_fallback rows for migration

  • tools/data_connector_health_report.py --compact plans SQL, OpenSearch, and object-storage health/status probes behind operator opt-in while keeping public evidence in health_probe_plan_only mode

  • tools/data_connector_health_actions_report.py --compact exposes those probes as operator-triggered action rows in operator_trigger_contract_only mode

  • tools/data_connector_runtime_adapters_report.py --compact binds SQL, OpenSearch, and object-storage connectors to runtime adapter operations while deferring credential values to the operator runtime

  • tools/data_connector_live_endpoint_smoke_report.py --compact adds the operator-gated live endpoint smoke contract and validates the execution path with a local SQLite endpoint

  • tools/data_connector_ui_preview_report.py --compact renders connector state, page bindings, legacy fallbacks, and health-boundary provenance as static JSON+HTML evidence

  • tools/data_connector_live_ui_report.py --compact wires connector state and connector-derived provenance into the Release Decision Streamlit page in streamlit_render_contract_only mode

  • tools/data_connector_view_surface_report.py --compact verifies the connector-aware Release Decision panels for state/provenance, health boundary, import/export provenance, and external artifact traceability in connector_view_surface_contract_only mode

  • tools/data_connector_app_catalogs_report.py --compact validates app-local connector catalogs referenced from built-in app_settings.toml files

First connector model:

  • id

  • kind

  • label

  • description

  • base

  • subpath

  • globs

  • preferred_file_ext

  • metadata

Recommended file placement:

  • next to the app settings

  • for example src/connectors/*.toml

Recommended resolution rule:

  1. explicit query parameters

  2. current session-state widget values

  3. explicit page-level overrides in app_settings.toml

  4. connector references in app_settings.toml

  5. legacy raw path keys

  6. code-level defaults

Compatibility rule:

  • keep legacy raw path keys working in phase 1

  • let connector references win when both are defined

Expected impact:

  • view_maps_network is the primary beneficiary

Remaining scope:

  • run the opt-in smoke against real credentialed operator endpoints

Distributed execution and reduction

AGILab already ships real distributed execution primitives, but the product surface is not yet a fully migrated generic map/reduce layer.

Current state:

  • apps can build explicit distribution plans

  • workers execute partitioned plans locally or on Dask-backed clusters

  • agi_node.reduction defines a shared reducer contract with partial inputs, merge semantics, validation hooks, and a standard reduce artefact schema

  • tools/reduce_contract_benchmark.py --json validates 8 partials / 80,000 synthetic items in 0.003s against a 5.0s target

  • execution_pandas_project, execution_polars_project, flight_telemetry_project, weather_forecast_project, uav_queue_project, and uav_relay_queue_project write worker-scoped reduce_summary_worker_<id>.json artefacts through the shared contract

  • Release Decision surfaces those reduce artefacts with schema validation, reducer name, partial count, artifact path, benchmark row/source/execution fields, flight row/aircraft/speed fields, weather forecast MAE/RMSE/MAPE fields, and UAV queue-family packet/PDR fields when present

  • aggregation outside the migrated benchmark, flight, weather, and UAV queue-family apps is still mostly app-specific

Current guardrail:

  • all non-template built-in apps now expose a reducer contract

  • minimal_app_project is template-only and intentionally exempt because its worker hooks are placeholders with no concrete merge output

  • multi_app_dag_project is template-preview only and intentionally exempt because it demonstrates cross-app DAG contracts rather than a concrete worker merge output

  • future apps/templates must add reduction.py, emit reduce_summary_worker_<id>.json, and export a *_REDUCE_CONTRACT once they produce durable worker summaries

  • docs should avoid describing AGILab as a full generic map/reduce mechanism beyond the explicit contract and migrated apps

1. Reduce contract adoption

Purpose:

  • move the current distributed work-plan execution model onto the shared reusable aggregation contract

Focus areas:

  • reducer adoption in public apps

  • user-visible reduce artefacts in analysis views

  • user-visible evidence that a distributed run was merged successfully

Why it matters:

  • makes the product claim honest and specific

  • reduces repeated merge logic across apps

  • improves reviewability of distributed results

  • gives AGILab a clearer story than “Dask-backed execution exists somewhere in the stack”

Completed slices:

  • execution_pandas_project and execution_polars_project now emit named reduce_summary_worker_<id>.json ReduceArtifact files from worker results

  • flight_telemetry_project now emits worker-scoped reduce_summary_worker_<id>.json ReduceArtifact files for trajectory summary metrics

  • uav_queue_project now emits worker-scoped reduce_summary_worker_<id>.json ReduceArtifact files for queue summary metrics

  • uav_relay_queue_project now emits worker-scoped reduce_summary_worker_<id>.json ReduceArtifact files for relay queue summary metrics

  • weather_forecast_project now emits worker-scoped reduce_summary_worker_<id>.json ReduceArtifact files for forecast quality metrics

  • Release Decision now discovers reduce_summary_worker_*.json, parses it with ReduceArtifact.from_dict, displays reducer evidence, and flags invalid JSON

  • a repository guardrail now fails if a non-template built-in app lacks a reducer contract or worker-scoped artifact writer

  • minimal_app_project and multi_app_dag_project are documented as template-only rather than counted as reducer migration gaps

Next concrete change request:

  • keep future public apps/templates aligned with the shared reducer contract as they gain concrete merge semantics

  • extend the surfaced reducer evidence as more non-benchmark apps adopt the same artifact contract

Compatibility rule:

  • keep current app-owned aggregation working in phase 1

  • let apps opt into the shared reducer contract incrementally

Expected impact:

  • cleaner public positioning for distributed execution

  • easier regression testing of distributed apps

  • a better foundation for future run-diff and evidence views

  • PROJECT must expose connector references clearly enough to stay debuggable

  • WORKFLOW should remain unchanged in phase 1

Suggested implementation phases:

  1. core connector model, parser, resolver, and validation

  2. connector-aware default resolution in apps-pages

  3. connector preview and navigation support in PROJECT

  4. optional connector references in WORKFLOW only if needed later

Acceptance target:

  • connectors can replace path groups in app_settings.toml

  • existing apps still work without migration

  • connector definitions remain plain-text and git-friendly

2. Data connector facility

Purpose:

  • connect AGILab cleanly to external data systems and storage backends

Typical targets:

  • SQL databases

  • Elasticsearch or OpenSearch

  • ELK-backed data sources

  • object storage

  • GitHub or GitLab

  • simulation backends

  • shared data repositories

Why it matters:

  • expands AGILab beyond local file-driven workflows

  • makes observability, reporting, and traceability easier to industrialize

Current shipped baseline:

  • tools/data_connector_facility_report.py --compact validates agilab.data_connector_facility.v1 against docs/source/data/data_connectors_sample.toml

  • the sample covers SQL, OpenSearch/ELK, and object-storage connector definitions with kind-specific required fields; the current object-storage contract covers AWS S3/S3-compatible stores, Azure Blob Storage, and Google Cloud Storage

  • remote credentials are represented as env: references and the report runs in contract_validation_only mode without live network probes

  • tools/data_connector_resolution_report.py --compact validates agilab.data_connector_resolution.v1 against docs/source/data/data_connector_app_settings_sample.toml

  • connector-aware app/page resolution now resolves catalog IDs from app settings while preserving legacy_path_fallback rows for raw-path migration

  • tools/data_connector_health_report.py --compact validates agilab.data_connector_health.v1 and plans connector health/status probes behind operator opt-in without executing network checks

  • tools/data_connector_health_actions_report.py --compact validates agilab.data_connector_health_actions.v1 and exposes operator-triggered health probe action rows without executing network checks

  • tools/data_connector_runtime_adapters_report.py --compact validates agilab.data_connector_runtime_adapters.v1 and binds credentialed connector adapters to runtime operations while deferring credential values

  • tools/data_connector_live_endpoint_smoke_report.py --compact validates agilab.data_connector_live_endpoint_smoke.v1, keeps default public evidence in live_endpoint_smoke_plan_only mode, and proves the opt-in execution path with a local SQLite endpoint without opening external networks

  • tools/data_connector_ui_preview_report.py --compact validates agilab.data_connector_ui_preview.v1 and renders static connector state plus connector-derived provenance as JSON+HTML preview evidence

  • tools/data_connector_live_ui_report.py --compact validates agilab.data_connector_live_ui.v1 and wires connector state plus connector-derived provenance into the Release Decision Streamlit page without opening connector networks

  • tools/data_connector_view_surface_report.py --compact validates agilab.data_connector_view_surface.v1 and checks the Release Decision connector state/provenance panel, health/status boundary, import/export provenance panel, and external artifact traceability panel without opening connector networks

  • tools/data_connector_app_catalogs_report.py --compact validates agilab.data_connector_app_catalogs.v1 for app-local connector catalogs across every non-template built-in app

Remaining scope:

  • run the opt-in smoke against real credentialed SQL/OpenSearch/object-storage endpoints in operator environments

3. Connector-aware views

Purpose:

  • move the shipped static connector state and connector-derived provenance preview into the live UI pages

Typical views:

  • import or export provenance panel

  • connector health/status panel

  • external artefact traceability panel

Current shipped baseline:

  • tools/data_connector_view_surface_report.py --compact validates agilab.data_connector_view_surface.v1 in connector_view_surface_contract_only mode

  • the report verifies four Release Decision surfaces: connector state/provenance, connector health/status boundary, import/export provenance, and external artifact traceability

  • the evidence reads local page source plus the connector live-UI render contract, uses the existing Streamlit recorder, and keeps command execution and network probes at zero

  • the KPI evidence bundle includes this as data_connector_view_surface_report_contract

Remaining scope:

  • move the same pattern beyond Release Decision as additional live UI pages need connector-aware panels

  • run live connector health/status actions only in credentialed operator environments

Why it matters:

  • makes integrations visible and debuggable

  • gives users confidence about what data came from where

4. DeepWiki/Open-style repository knowledge layer

Purpose:

  • make the AGILab codebase easier to explore, onboard, and explain

  • provide a generated code wiki and Q&A layer across repositories

Recommended scope:

  • start with controlled local deployments before publishing hosted search

  • index each repository separately

  • include code, docs source, runbooks, and pyproject.toml

  • exclude generated artefacts, virtualenvs, build/, dist/, and runtime shares

Guardrail:

  • treat the generated wiki as an exploration aid, not as the source of truth

  • keep official product and operator documentation in versioned docs and runbooks

Current shipped baseline:

  • tools/repository_knowledge_report.py --compact validates agilab.repository_knowledge_index.v1 in repository_knowledge_static_index mode

  • the report indexes local code, tools, root tests, official docs, root runbooks, and package/app manifests with SHA-256 fingerprints, lightweight outlines, deterministic file, line, size, kind, and suffix statistics, and ratio, top-category, and largest-file summaries

  • generated artifacts, virtualenvs, build outputs, and distributions are excluded by contract

  • the report emits stable onboarding query seeds while explicitly keeping the generated index as an exploration aid and versioned docs as the source of truth

  • the KPI evidence bundle includes this as repository_knowledge_report_contract

Remaining scope:

  • connect this static index to a generated wiki or Q&A service in controlled deployments

  • extend indexing to external app repositories under the same source-of-truth guardrail

Why it matters:

  • reduces time spent rediscovering cross-cutting implementation details

  • helps new contributors navigate AGILab’s multi-repo, multi-app structure

  • complements agent workflows with repository-level context and diagrams

Decision guidance

Use this rule of thumb:

  • if the goal is professionalization, use the ordered list from Professionalization priority order first

  • if the professional baseline is already under control and the goal is feature sequencing, use Feature sequencing after the professional baseline

  • choose Experiment Cockpit if the next need is better daily usability for engineers comparing runs

  • choose Evidence / Release View if the next need is promotion readiness and defensible evidence

  • choose Scenario Playback View if the next need is time-based explanation and demonstration

  • choose Realtime Analytical and Geospatial Views if the next need is denser live analysis, faster interaction, and higher-volume visual playback

  • choose Run Diff / Counterfactual Analysis if the next need is faster debugging, clearer run review, and defensible explanation of KPI changes

  • choose Multi-app DAG orchestration if the next need is broader app coverage beyond the shipped two-app dependency contract

  • choose Multi-app DAG orchestration productization if the next need is to execute the shipped product-visible graph in WORKFLOW

  • choose Bidirectional notebook interop if the next need is a stronger bridge between exploratory notebooks and AGILab-managed workflows

  • choose Elastic/OpenSearch + Grafana if the next need is operations and observability

  • choose OpenSearch + OpenSearch Dashboards if the next need is audit and historical search

  • choose Postgres + Superset if the next need is curated KPI analytics

  • choose Connector framework hardening and the data connector facility if the next need is portability, SQL/ELK/data-system access, and reliable artefact flow

  • choose Pinned private-app validation if the next need is CI/release reproducibility for non-public apps without publishing or vendoring their code

  • choose DeepWiki/Open-style repository knowledge layer if the next need is faster codebase onboarding, architecture discovery, and repository Q&A without turning generated content into official docs

Final consolidated poll

Use both paths, because they serve different purposes:

  1. Quick popularity signal in GitHub Discussions

  2. Structured roadmap vote in GitHub Issues

  3. Open roadmap discussion in Issues

Comment template for issues/2

Vote: <one option>
Why: <why this matters now>
Expected value: <product / engineering / user impact>
Constraints or dependencies: <blocking items, staffing, sequencing>

Current candidate priorities

  • P0 release and runtime integrity

  • P1 first-run product experience

  • P2 notebook interop and no-lock-in

  • P3 security and supply-chain posture

  • P4 team and cluster operation

  • P5 pinned private-app validation for non-public app CI and release checks

  • Multi-app DAG orchestration productization, once the professional baseline is stable

  • Data connector facility and connector-aware views, once first-run and evidence paths are predictable

If the roadmap label is not visible yet in GitHub, the issue form still works. The repository workflow will create or update that label on the next successful run.

Reference URLs