Data Connectors
AGILAB data connectors are a lightweight contract for external data systems. They let an app or evidence report reference a named data source without hard-coding local paths, credentials, or provider-specific client details in the app code.
The current public contract is intentionally conservative:
connector definitions live in plain-text TOML catalogs
credentials are referenced through environment variables, never embedded
public evidence validates contracts without opening external networks
live probes stay operator-triggered and optional
legacy raw paths can remain available while apps migrate to connector IDs
This is not a second experiment tracker, model registry, or storage UI. It is the data-access contract around AGILAB workflows.
Connector Maturity Levels
Use these labels consistently when reading connector evidence:
Level |
What AGILAB proves |
What remains outside the proof |
|---|---|---|
Local proof |
A deterministic local connector such as SQLite produces query results, artifact hashes, and JSON evidence without network access. |
Behavior of the eventual production database, account policy, or network path. |
Contract proof |
TOML connector definitions, app/page references, credential-reference shape, and runtime dependency mapping are valid. |
Real endpoint reachability, IAM, firewall rules, quota, latency, or billing. |
Emulator proof |
Account-free local emulators match the expected adapter shape for S3, Azure Blob, GCS, or search endpoints. |
Real cloud control-plane behavior and managed-service differences. |
Operator-triggered live check |
An explicit user action probes a real endpoint in a prepared environment. |
General certification for every region, tenant, credential, or network policy. |
Catalog Shape
The public sample catalog is:
Each connector is a [[connectors]] TOML entry with a stable id, a
kind, a human label, and kind-specific fields.
Supported public kinds are:
Kind |
Typical target |
Contract boundary |
|---|---|---|
|
read-only warehouse or local SQLite proof |
validates URI, driver, and |
|
OpenSearch / ELK index |
validates URL, index, and credential reference |
|
artifact prefixes in cloud object storage |
validates provider, bucket/container, prefix, and credential reference |
Object Storage Providers
Object-storage connectors currently support these providers:
Provider |
Target URI shape |
Runtime dependency |
Credential hint |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
The s3 provider also accepts the aliases aws_s3, amazon_s3, and
s3_compatible. The runtime dependency column describes what an operator
environment needs for live probes; those packages are not required for the
default public contract-validation evidence.
SQLite Database Proof
Use the packaged SQLite preview when you need a concrete database demo that works on every local machine without a server, Docker, network access, or secrets:
uv --preview-features extra-build-dependencies run python src/agilab/examples/sqlite_connector_proof/preview_sqlite_connector_proof.py --output-dir /tmp/agilab-sqlite-proof
The preview writes:
/tmp/agilab-sqlite-proof/sqlite_connector_proof.db
/tmp/agilab-sqlite-proof/promotion_candidates.csv
/tmp/agilab-sqlite-proof/database_evidence.json
Read database_evidence.json first. It records a sql connector with the
sqlite driver, query_mode = "read_only", a schema hash, a parameterized
query hash, row count, result hash, and artifact hashes. This proves the AGILAB
database boundary before replacing the local URI with Postgres, a warehouse, or
another operator-managed SQL source.
Local Artifact Lane Contract
Use the artifact-lane contract when work enters AGILAB as files rather than a database or cloud connector. It is designed for simple, reviewable handoffs such as a data-analyst bundle with raw files, cleaned tables, aggregates, plots, and a report, or a document-ingestion lane with PDFs, Markdown outputs, and processed originals.
python3 tools/data_artifact_lane_contract.py --profile data-analysis --root <bundle> --check --json
For document ingestion lanes whose folders are not under one root, map the roles explicitly:
python3 tools/data_artifact_lane_contract.py \
--profile document-ingestion \
--dir input=/path/to/incoming \
--dir output=/path/to/markdown \
--dir done=/path/to/done \
--check --json
The report uses schema agilab.data_artifact_lane_contract.v1. It records
the profile, role directories, required artifact rules, matched artifacts,
sizes, SHA-256 hashes, and missing-directory or missing-artifact issues. This
proves the local file handoff is present and inspectable. It does not prove
data correctness, OCR quality, business interpretation, privacy compliance, or
background-service liveness.
Account-Free Cloud Emulator Validation
Use the cloud-emulators profile when you need AWS/Azure/GCP connector
confidence without owning cloud accounts:
uv --preview-features extra-build-dependencies run python tools/data_connector_cloud_emulator_report.py --compact
uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile cloud-emulators
The profile validates the sample emulator catalog against the same connector facility and runtime-adapter contracts used by real cloud targets. It covers:
Cloud target |
Account-free emulator |
Local endpoint |
What is proven |
|---|---|---|---|
AWS S3 / S3-compatible storage |
MinIO |
|
provider aliasing, bucket/prefix target shape, |
Azure Blob Storage |
Azurite |
|
account/container target shape, |
Google Cloud Storage |
fake-gcs-server |
|
|
Search-index wiring |
local OpenSearch or Elasticsearch |
|
URL/index contract and explicit credential boundary |
This gives API-contract and emulator-compatible validation only. It does not prove real IAM, cloud firewall rules, private endpoints, regional behavior, quota, or billing. Those remain opt-in live smoke checks in a real operator environment with real credentials.
Credential Rule
Remote connectors must use auth_ref = "env:NAME". The value points to an
environment variable name, not to the credential itself.
Examples:
auth_ref = "env:AWS_PROFILE"
auth_ref = "env:AZURE_STORAGE_CONNECTION_STRING"
auth_ref = "env:GOOGLE_APPLICATION_CREDENTIALS"
The reports deliberately avoid materializing credential values. If a connector contains a raw secret-like value, the facility report marks the catalog invalid.
Evidence Reports
The public checks are contract-first:
uv --preview-features extra-build-dependencies run python tools/data_connector_facility_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_resolution_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_health_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_health_actions_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_runtime_adapters_report.py --compact
Use the live endpoint smoke only when you intentionally want to prove the operator-triggered execution path. The default public mode remains network-free.
How To Read The Boundary
facilityproves the catalog is structurally valid.resolutionproves app/page settings can refer to connector IDs while preserving legacy fallback paths.healthplans status probes but does not execute them by default.health_actionsexposes explicit operator-triggered probe actions.runtime_adaptersmaps each connector to the dependency and operation a runtime would need when an operator opts in.
This keeps the first adoption path simple: a new user can run AGILAB without cloud credentials, while an operator can still see exactly which connector, dependency, and environment variable will be needed before enabling live access.