Development History¶

This document chronicles the evolution of py-launch-lab — the problems we set out to solve, the approaches that worked, the ones that didn't, and the iterative discoveries that shaped the final implementation.

Phase 1: Initial Report System Overhaul¶

The Starting Point¶

The original report system generated per-launcher sections in a static HTML file. Each launcher (python, uv, venv-direct, etc.) had its own table. The report had no filtering, no sorting, and no way to identify which results were unexpected.

What We Changed¶

Eight major features were implemented in a single pass:

1. Command Line Column¶

Problem: The report showed scenario IDs and launcher names, but not the actual command that was executed.

Solution: Added a command_line: list[str] | None field to the ScenarioResult Pydantic model. The runner.py populates it with the exact cmd list. The HTML report converts absolute paths to project-relative paths for readability.

# Before: /home/user/project/.cache/matrix_venv/Scripts/python.exe
# After:  .cache/matrix_venv/Scripts/python.exe

2. Anomaly Highlighting with Explanation Bubbles¶

Problem: Results were displayed without context — the reader had no way to know if a result was expected or surprising.

Solution: Created expectations.py with per-scenario expected behaviours and a check_expectations() function that produces Anomaly objects. The HTML report highlights anomaly rows with a red tint and shows expandable detail rows with: - Which fields deviate - Expected vs actual values - Technical explanation - Link to upstream issues

3. Single Unified Table¶

Problem: Per-launcher sections made it hard to compare results across different launchers.

Solution: Removed the per-launcher split entirely. All 20 scenarios appear in one table, sorted by scenario ID. A "Launcher" column lets you see the launcher for each row.

4. Column Filters¶

Problem: With 20 rows in one table, users need to filter by launcher, subsystem, etc.

Solution: Added a filter row below the header with dropdown selectors for enum columns (Launcher, PE Subsystem, Console Window, etc.) and text inputs for free-form columns. Filtering is done entirely in client-side JavaScript.

5. Sortable Column Headers¶

Click any header to sort ascending/descending. JavaScript-based, no server round-trip.

6. Verbose Logging¶

Problem: task report was a black box — no output about what it was doing.

Solution: Added _setup_logging(verbose=True) to the CLI. The report builder now emits detailed logs:

14:23:01 [launch_lab.html_report] INFO: HTML report builder starting
14:23:01 [launch_lab.html_report] INFO:   JSON source dir : E:\...\artifacts\json
14:23:01 [launch_lab.html_report] INFO:   Output dir      : E:\...\artifacts\html
14:23:01 [launch_lab.html_report] INFO:   Force rebuild   : True
14:23:01 [launch_lab.html_report] INFO: Loading JSON artifacts ...
14:23:01 [launch_lab.html_report] INFO:   Loaded 20 results
14:23:01 [launch_lab.html_report] INFO:   3 scenarios have anomalies
14:23:01 [launch_lab.html_report] INFO: Attempting to generate AI summary via Ollama …
14:23:15 [launch_lab.html_report] INFO: Ollama summary generated successfully (1423 chars).
14:23:15 [launch_lab.html_report] INFO: Report written to artifacts\html\report.html

7. `--force` Flag¶

Problem: The report builder skipped regeneration if the output file was newer than all JSON inputs. During development, you often want to force a rebuild.

Solution: Added --force flag to the CLI. Also updated taskfile.yaml to forward a FORCE variable: task report FORCE=1.

8. Ollama AI Integration¶

Problem: Interpreting 20 scenario results requires domain expertise.

Solution: Optionally call a local Ollama instance to generate a natural-language summary. The summary is inserted at the top of the HTML report. Configuration via environment variables:

OLLAMA_MODEL (default: llama3.2)
OLLAMA_HOST (default: http://localhost:11434)

Uses curl to avoid adding a requests dependency.

Phase 2: Fixing N/A Values¶

The Problem¶

After the initial overhaul, many scenarios showed "N/A" for Console Window and Visible Window in the report. Affected scenarios:

All uv / uvx / uvw scenarios
All shim scenarios
Most venv scenarios

Root Cause¶

The original _try_keepalive_detection() function only handled Python-like executables. It checked:

if _is_python_like(exe_path):
    return [exe, "-c", "import time; time.sleep(10)"]
return None  # ← everything else returned None

This meant uv, uvx, uvw, pyshim-win, and venv entry-point wrappers all exited before Phase 1 detection could observe them, and no keepalive was attempted.

The Fix¶

Expanded keepalive coverage with per-executable-type strategies:

def _build_keepalive_cmd(exe: str) -> list[str] | None:
    if _is_python_like(exe):
        return [exe, "-c", "import time; time.sleep(10)"]
    if _is_uv_like(exe):
        return [exe, "run", "python", "-c", "import time; time.sleep(10)"]
    if _is_shim_like(exe):
        return [exe, "--hide-console", "--", "python", "-c", "import time; time.sleep(10)"]
    # Venv wrappers: use sibling python.exe
    if exe_has_sibling_python(exe):
        return [sibling_python, "-c", "import time; time.sleep(10)"]
    return None

Also added PE-subsystem inference as a final fallback for any remaining None values — if we know the PE subsystem, we can infer console behaviour.

Result¶

All 20 scenarios now have populated Console Window and Application Window values.

Phase 3: The lab-window-gui.exe Console Detection Bug¶

The Problem¶

After fixing N/A values, the report showed venv-gui-entrypoint (.cache/matrix_venv/Scripts/lab-window-gui.exe) with Console Window = No.

The user knew this was wrong — launching lab-window-gui.exe absolutely opens a terminal window (visible as a brief flash on the desktop).

Investigation¶

lab-window-gui.exe is a pip/uv-generated GUI entry-point wrapper: - Wrapper PE subsystem: GUI ← this is correct - Expected behaviour: No console window ← this is the ideal behaviour

But in practice, a console DOES appear. Why?

The Discovery¶

The wrapper internally launches the venv's pythonw.exe to call the entry-point function. In a standard python -m venv, pythonw.exe is a genuine GUI binary — no console.

But in a uv venv, pythonw.exe is a CUI trampoline. Its PE subsystem is IMAGE_SUBSYSTEM_WINDOWS_CUI (value 3) instead of IMAGE_SUBSYSTEM_WINDOWS_GUI (value 2).

When the GUI wrapper spawns this CUI pythonw.exe, Windows allocates a console window for the child process. This console window briefly appears and then disappears when the script completes — the "terminal flash".

This is astral-sh/uv#9781, under active investigation at joelvaneenwyk/uv#1 with a fix in progress at joelvaneenwyk/uv#2.

Why Direct Detection Missed It¶

detect_console_host(lab_gui_pid) returned False because conhost.exe is a child of pythonw.exe, not lab-window-gui.exe. The process tree query only returns direct children.
Inference from PE subsystem said "GUI → no console" because the wrapper is GUI. But it's the child that determines console allocation.
Keepalive detection used the sibling python.exe (CUI), which correctly showed a console. But this was for the sibling, not for what lab-window-gui.exe actually does.

The Fix¶

Added _detect_child_python_subsystem():

Check if the executable is a venv wrapper (not python, not uv, not shim; has a sibling python.exe)
If it's a GUI wrapper → inspect the PE of pythonw.exe in the same Scripts/ directory
If it's a CUI wrapper → inspect the PE of python.exe
Return the child's PE subsystem

Then in run_scenario(), apply the override:

child_sub = _detect_child_python_subsystem(cmd[0])
if pe_subsystem == "GUI" and child_sub == "CUI":
    console_window = True  # CUI child WILL allocate a console

Result¶

The report now correctly shows:

Scenario	PE	Console Window	Anomaly?
`venv-gui-entrypoint`	GUI	Yes	Yes — uv#9781
`venv-dual-gui-entrypoint`	GUI	Yes	Yes — uv#9781
`venv-pythonw-script-py`	CUI	Yes	Yes — uv#9781

All three anomalies are correctly flagged with explanations linking to the upstream uv issue.

Key Lessons Learned¶

1. "What the binary says" vs "What actually happens"¶

The PE subsystem tells you what a binary claims to be. But what actually happens on screen depends on the entire process tree. A GUI binary can cause a console to appear if it spawns a CUI child.

2. You can't test window creation with pipes¶

Any test harness that redirects stdout/stderr via pipes will never see a console window. The pipe handles satisfy the CUI process's console requirement without allocating a visible window. Phase 1 (no pipes) was essential.

3. Process tree queries are point-in-time¶

CreateToolhelp32Snapshot gives you the state at the instant you call it. If a process exits 10 ms later, the snapshot won't reflect that. If a process hasn't spawned yet, the snapshot won't show it. Multiple detection strategies (direct, keepalive, PE inspection) provide overlapping coverage.

4. Deterministic checks beat timing-based checks¶

The child PE subsystem override is deterministic — it reads files on disk, not transient process state. It's the most reliable detection method. If we know the child interpreter is CUI, we know a console will appear. No timing, no race conditions.

5. Upstream bugs surface as test failures¶

The entire uv pythonw CUI trampoline issue (uv#9781) was discovered because our tests reported unexpected console behaviour. This is exactly what a conformance lab is supposed to find.

6. The keepalive trick is essential but imperfect¶

Re-launching an executable with sleep(10) is a great hack for detection, but it tests "what would this executable do if kept alive", not "what does the original command do". For most scenarios these are equivalent, but be aware of the distinction.

Timeline¶

Phase	Focus	Scenarios Affected	Outcome
1	Report overhaul	All 20	8 features implemented
2	N/A value fix	~12 scenarios	All values populated
3	Console detection fix	3 venv scenarios	Correct anomaly detection

Files Changed¶

File	Changes
`src/launch_lab/models.py`	Added `command_line` field
`src/launch_lab/expectations.py`	New file — expected behaviours & anomaly checker
`src/launch_lab/html_report.py`	Complete rewrite — unified table, filters, anomalies, Ollama
`src/launch_lab/runner.py`	Keepalive strategies, child PE detection, inference fallback
`src/launch_lab/cli.py`	`--force` flag, verbose logging setup
`taskfile.yaml`	FORCE variable forwarding
`tests/test_html_report.py`	Updated for new API, anomaly/filter tests

Development History¶

Phase 1: Initial Report System Overhaul¶

The Starting Point¶

What We Changed¶

1. Command Line Column¶

2. Anomaly Highlighting with Explanation Bubbles¶

3. Single Unified Table¶

4. Column Filters¶

5. Sortable Column Headers¶

6. Verbose Logging¶

7. --force Flag¶

8. Ollama AI Integration¶

Phase 2: Fixing N/A Values¶

The Problem¶

Root Cause¶

The Fix¶

Result¶

Phase 3: The lab-window-gui.exe Console Detection Bug¶

The Problem¶

Investigation¶

The Discovery¶

Why Direct Detection Missed It¶

The Fix¶

Result¶

Key Lessons Learned¶

1. "What the binary says" vs "What actually happens"¶

2. You can't test window creation with pipes¶

3. Process tree queries are point-in-time¶

4. Deterministic checks beat timing-based checks¶

5. Upstream bugs surface as test failures¶

6. The keepalive trick is essential but imperfect¶

Timeline¶

Files Changed¶

7. `--force` Flag¶