Autonomous Engineering Pipeline

An autonomous engineering pipeline in Python that takes a ticket from input to tested, reviewed code with no human in the execution loop. The architecture behaves more like a compiler for LLM behavior than a conventional agent framework: LLMs handle the parts that require interpretation, deterministic code handles everything that can be computed. The pipeline runs inside Docker against a ~100k line TypeScript monorepo. On certain ticket shapes the pipeline now commits code with zero LLM source-code authoring. The run archive spans ~1,000 runs, almost all of it the same handful of ticket families re-run to shake out non-determinism. All posts document real runs, real failures, and the architectural decisions behind them. Still R&D.

Architecture principles

Data Path Principle
Every fact an LLM copies is a fact it can hallucinate. Trust hierarchy (top to bottom): human chat-authored design > machine-extracted data > model-generated constraints > model narrative. Facts and decisions are never mixed.
Lego Instructions Principle
Manifest quality determines outcome more than builder quality. Every piece in the box (Synthesizer populates all facts), unambiguous steps (single source of truth per field), no impossible assemblies (feasibility gate rejects unsatisfiable criteria before any builder runs).
Brain layer / builders boundary
Builders execute only. If a builder is searching, locating, or deciding, that is brain work leaking downstream. The manifest was incomplete. The fix is always to push the decision back upstream.

Pipeline stages

Grounding chat
Pre-autonomous stage. The operator confirms intent, exclusions, and design decisions in a structured exchange grounded against the registry. Output is a machine-readable handoff the downstream pipeline reads as ground truth.
Synthesizer
Populates the manifest with file paths, symbols, signatures, and dependency resolutions from the registry, Tree-Sitter, and language server before any builder runs.
Planner
Reads the ticket and codebase, produces a structured manifest of what to build and which symbols to change.
Coder
Implements the code change within the constraints the manifest sets.
Test Writer
Deterministic materializer that renders structured test plans to TypeScript. Replaced an earlier LLM stage in Era 2. Also generates behavioral oracle tests derived from manifest operations, proven with mutation testing.
Debugger router
The Debugger router classifies each failure and dispatches to the stage that owns the fix: code-wrong to the Coder, manifest/test/ticket-wrong to an operator correction surface, and four of six end-of-ticket gates to a fail-safe Coder repair. The only stage that can redirect the pipeline. Diagnoses without inheriting the Coder's reasoning.
Reviewer
Checks scope and correctness before the ticket closes. Split into Test Reviewer and Impl Reviewer in Era 4.

Start here

Go deeper

Why a Warning Is Worse Than a Hard Stop

When the pipeline detects zero test files, logging a warning and continuing produces output that looks correct but cannot be caught by any downstream gate.