Data Connectors

AGILAB data connectors are a lightweight contract for external data systems. They let an app or evidence report reference a named data source without hard-coding local paths, credentials, or provider-specific client details in the app code.

The current public contract is intentionally conservative:

  • connector definitions live in plain-text TOML catalogs

  • credentials are referenced through environment variables, never embedded

  • public evidence validates contracts without opening external networks

  • live probes stay operator-triggered and optional

  • legacy raw paths can remain available while apps migrate to connector IDs

This is not a second experiment tracker, model registry, or storage UI. It is the data-access contract around AGILAB workflows.

Catalog Shape

The public sample catalog is:

Each connector is a [[connectors]] TOML entry with a stable id, a kind, a human label, and kind-specific fields.

Supported public kinds are:

Kind

Typical target

Contract boundary

sql

read-only warehouse or local SQLite proof

validates URI, driver, and query_mode = "read_only"

opensearch

OpenSearch / ELK index

validates URL, index, and credential reference

object_storage

artifact prefixes in cloud object storage

validates provider, bucket/container, prefix, and credential reference

Object Storage Providers

Object-storage connectors currently support these providers:

Provider

Target URI shape

Runtime dependency

Credential hint

s3

s3://bucket/prefix

boto3

AWS_PROFILE or AWS access-key/session environment

azure_blob

azure_blob://account/container/prefix

azure-storage-blob

AZURE_STORAGE_CONNECTION_STRING or Azure identity environment

gcs

gs://bucket/prefix

google-cloud-storage

GOOGLE_APPLICATION_CREDENTIALS or application-default credentials

The s3 provider also accepts the aliases aws_s3, amazon_s3, and s3_compatible. The runtime dependency column describes what an operator environment needs for live probes; those packages are not required for the default public contract-validation evidence.

Account-Free Cloud Emulator Validation

Use the cloud-emulators profile when you need AWS/Azure/GCP connector confidence without owning cloud accounts:

uv --preview-features extra-build-dependencies run python tools/data_connector_cloud_emulator_report.py --compact
uv --preview-features extra-build-dependencies run python tools/workflow_parity.py --profile cloud-emulators

The profile validates the sample emulator catalog against the same connector facility and runtime-adapter contracts used by real cloud targets. It covers:

Cloud target

Account-free emulator

Local endpoint

What is proven

AWS S3 / S3-compatible storage

MinIO

http://127.0.0.1:9000

provider aliasing, bucket/prefix target shape, boto3 dependency

Azure Blob Storage

Azurite

http://127.0.0.1:10000/devstoreaccount1

account/container target shape, azure-storage-blob dependency

Google Cloud Storage

fake-gcs-server

http://127.0.0.1:4443

gs:// target shape, google-cloud-storage dependency

Search-index wiring

local OpenSearch or Elasticsearch

http://127.0.0.1:9200

URL/index contract and explicit credential boundary

This gives API-contract and emulator-compatible validation only. It does not prove real IAM, cloud firewall rules, private endpoints, regional behavior, quota, or billing. Those remain opt-in live smoke checks in a real operator environment with real credentials.

Credential Rule

Remote connectors must use auth_ref = "env:NAME". The value points to an environment variable name, not to the credential itself.

Examples:

auth_ref = "env:AWS_PROFILE"
auth_ref = "env:AZURE_STORAGE_CONNECTION_STRING"
auth_ref = "env:GOOGLE_APPLICATION_CREDENTIALS"

The reports deliberately avoid materializing credential values. If a connector contains a raw secret-like value, the facility report marks the catalog invalid.

Evidence Reports

The public checks are contract-first:

uv --preview-features extra-build-dependencies run python tools/data_connector_facility_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_resolution_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_health_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_health_actions_report.py --compact
uv --preview-features extra-build-dependencies run python tools/data_connector_runtime_adapters_report.py --compact

Use the live endpoint smoke only when you intentionally want to prove the operator-triggered execution path. The default public mode remains network-free.

How To Read The Boundary

  • facility proves the catalog is structurally valid.

  • resolution proves app/page settings can refer to connector IDs while preserving legacy fallback paths.

  • health plans status probes but does not execute them by default.

  • health_actions exposes explicit operator-triggered probe actions.

  • runtime_adapters maps each connector to the dependency and operation a runtime would need when an operator opts in.

This keeps the first adoption path simple: a new user can run AGILAB without cloud credentials, while an operator can still see exactly which connector, dependency, and environment variable will be needed before enabling live access.