Execution Playground
The built-in execution playground is the quickest way to show what AGILAB adds on top of a plain dataframe benchmark.
Instead of only comparing libraries, AGILAB compares execution models on the same workload and keeps the whole orchestration path visible.
What is included
Two built-in projects ship the same synthetic workload:
execution_pandas_projectexecution_polars_project
They both read the same generated CSV dataset under
execution_playground/dataset and produce grouped benchmark outputs.
The difference is the worker path:
ExecutionPandasWorkerextendsPandasWorkerExecutionPolarsWorkerextendsPolarsWorker
That lets AGILAB expose not only timing differences, but also the execution style behind them.
Where you see it in the UI
The two apps are run through the normal AGILAB pages. The benchmark value comes from the fact that the same UI flow can drive two different worker families without changing the orchestration path.
The benchmark appears in the normal PROJECT -> ORCHESTRATE flow rather than in a separate one-off demo script.
Why this example matters
Many benchmark demos stop at:
pandas vs polars
local vs distributed
Python vs compiled
AGILAB goes one step further:
same workload
same orchestration flow
same benchmark UI
different worker/runtime path
This makes it easier to answer the practical question:
Did performance improve because of the library, or because of the execution model?
What the benchmark shows
For this example, the public message is intentionally simple:
PandasWorkerhighlights a process-oriented worker pathPolarsWorkerhighlights an in-process threaded worker path
The benchmark results in ORCHESTRATE then let you compare timings while the rest of AGILAB still shows:
install state
distribution plan
generated snippets
exported outputs
Measured local benchmark
The repository ships a reproducible benchmark helper:
uv --preview-features extra-build-dependencies run python tools/benchmark_execution_playground.py --repeats 3 --warmups 1 --worker-counts 1,2,4,8 --rows-per-file 100000 --compute-passes 32 --n-partitions 16
The helper now resolves its built-in app paths from the script location, so it can be launched from any working directory inside or outside the repo root.
Median results from a local run on macOS / Python 3.13.9 with 16 partitions,
100000 rows per file, and 32 compute passes:
These numbers are intentionally useful because the heavier mixed workload separates “more workers” from “better fit”:
the pandas process-oriented path is only slightly ahead in local
parallelmode at1worker (1.772s), then gets worse as worker count rises (2.157sat8workers)the polars threaded path improves at
1-2workers (1.520s,1.436s) and then converges back toward its steady state (1.564sat8workers)AGILAB therefore shows both execution model and worker-count scaling on the same reproducible workload
Raw benchmark artifacts are versioned under:
docs/source/data/execution_playground_benchmark.json
2-node 16-mode matrix
The repository also ships a second helper that benchmarks the full 16-mode matrix on 2 Macs over SSH:
uv --preview-features extra-build-dependencies run python tools/benchmark_execution_mode_matrix.py --remote-host <remote-macos-ip> --scheduler-host <local-macos-ip> --rows-per-file 100000 --compute-passes 32 --n-partitions 16 --repeats 2
--remote-host accepts either host or user@host. If you pass only a
host or IP, the helper defaults to agi@<host> for both the SSH probe/setup
steps and the dataset rsync step.
This run uses:
1 local macOS ARM scheduler/worker
1 remote macOS ARM worker over SSH
the same
16partitions,100000rows per file, and32compute passes
Mode families
The 16 modes split into 4 families:
0-3: local CPU modes4-7: 2-node Dask modes8-11: local modes with the RAPIDS bit requested12-15: 2-node Dask modes with the RAPIDS bit requested
The compact code column uses the order r d c p:
r= RAPIDS requestedd= Dask / cluster topologyc= Cython requestedp= pool/process path requested
In the versioned benchmark artifacts shipped with the repository, the r...
and rd... modes are still CPU-only because neither node exposed NVIDIA
tooling on that capture. The helper still reports RAPIDS requests explicitly,
and on other hardware it can mark local-only RAPIDS rows as GPU-accelerated
even if the remote node stays CPU-only.
How to read the matrix quickly
Ignore rows
8-15for performance interpretation in the versioned capture below: they keep the RAPIDS bit visible, but they are still CPU-only there.Read the matrix by families, not by isolated rows:
local Python/Cython baseline:
0-2local pool/process family:
1-32-node Dask family:
4-7
Compare each family back to mode
0(____) to see whether the execution model is buying you anything.
Compact map of the 16 execution modes grouped by topology and runtime family.
execution_pandas_project
Use this app when you want the benchmark to read as a process-oriented baseline.
Worker family:
ExecutionPandasWorkeroverPandasWorkerStory to tell: how far a process/pool/Dask path goes on the same workload
What to inspect in AGILAB: install/distribution state in ORCHESTRATE, then the benchmark table and exported artifacts for the
_d__familyPractical reading: this app is the easiest way to show that “more workers” does not automatically beat the local path unless the execution model fits
mode |
label |
topology |
median_seconds |
|---|---|---|---|
0 |
python |
local only |
0.885 |
1 |
pool of process |
local only |
0.585 |
2 |
cython |
local only |
0.910 |
3 |
pool and cython |
local only |
0.575 |
4 |
dask |
2-node cluster (1 local + 1 remote macOS worker) |
0.540 |
5 |
dask and pool |
2-node cluster (1 local + 1 remote macOS worker) |
0.613 |
6 |
dask and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.561 |
7 |
dask and pool and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.583 |
8 |
rapids |
local only |
0.860 |
9 |
rapids and pool |
local only |
0.585 |
10 |
rapids and cython |
local only |
0.885 |
11 |
rapids and pool and cython |
local only |
0.575 |
12 |
rapids and dask |
2-node cluster (1 local + 1 remote macOS worker) |
0.586 |
13 |
rapids and dask and pool |
2-node cluster (1 local + 1 remote macOS worker) |
0.596 |
14 |
rapids and dask and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.589 |
15 |
rapids and dask and pool and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.588 |
execution_polars_project
Use this app when you want the benchmark to read as an in-process threaded path with a different scaling profile.
Worker family:
ExecutionPolarsWorkeroverPolarsWorkerStory to tell: the same workload can prefer a lighter in-process path over a heavier process-oriented topology
What to inspect in AGILAB: the same ORCHESTRATE > Benchmark results table, but with attention on the
_d_pfamily and how it differs from the pandas appPractical reading: this app is the clearest proof that AGILAB is benchmarking execution models, not only dataframe libraries
mode |
label |
topology |
median_seconds |
|---|---|---|---|
0 |
python |
local only |
0.885 |
1 |
pool of process |
local only |
0.430 |
2 |
cython |
local only |
0.900 |
3 |
pool and cython |
local only |
0.445 |
4 |
dask |
2-node cluster (1 local + 1 remote macOS worker) |
0.307 |
5 |
dask and pool |
2-node cluster (1 local + 1 remote macOS worker) |
0.262 |
6 |
dask and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.307 |
7 |
dask and pool and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.304 |
8 |
rapids |
local only |
0.875 |
9 |
rapids and pool |
local only |
0.430 |
10 |
rapids and cython |
local only |
0.895 |
11 |
rapids and pool and cython |
local only |
0.440 |
12 |
rapids and dask |
2-node cluster (1 local + 1 remote macOS worker) |
0.310 |
13 |
rapids and dask and pool |
2-node cluster (1 local + 1 remote macOS worker) |
0.305 |
14 |
rapids and dask and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.306 |
15 |
rapids and dask and pool and cython |
2-node cluster (1 local + 1 remote macOS worker) |
0.336 |
What the matrix adds
This second benchmark makes three extra points visible:
the heavier scalar tail now separates the plain local Python/Cython family, the local pool family, and the 2-node Dask family much more clearly
the best mode is not the same for the two worker designs:
_d__forexecution_pandas_projectand_d_pforexecution_polars_projecta 2-node Dask topology can win for one execution model and not for another
requesting RAPIDS on hardware without NVIDIA tooling does not create a fake speedup: AGILAB still reports the run honestly as CPU-only
local-only RAPIDS rows and 2-node RAPIDS rows are reported independently, so GPU availability now follows the topology that actually ran
Raw matrix artifacts are versioned under:
docs/source/data/execution_mode_matrix_benchmark.jsondocs/source/data/execution_mode_matrix_benchmark.csvdocs/source/data/execution_pandas_project_mode_matrix.csvdocs/source/data/execution_polars_project_mode_matrix.csv
How to run it
Launch AGILAB:
uv --preview-features extra-build-dependencies run streamlit run src/agilab/About_agilab.py
In PROJECT, select
src/agilab/apps/builtin/execution_pandas_project.In ORCHESTRATE, run INSTALL once, then EXECUTE.
Enable Benchmark all modes when you want AGILAB to compare execution paths.
Repeat with
src/agilab/apps/builtin/execution_polars_project.Compare the benchmark table in ORCHESTRATE > Benchmark results and the generated outputs.
What to look for
This example is useful when you want to demonstrate that AGILAB makes three things explicit:
the workload
the orchestration path
the execution model
That is why this example is a better public teaser than a raw benchmark chart: it keeps the result, the runtime path, and the reproducible workflow together.