The Lego Instructions: An Architectural Principle for AI Coding Agents

Most AI coding agents improvise. A ticket arrives, the agent reads it, opens files it guesses are relevant, writes code, runs tests, and if the tests fail, tries again. That loop has a name in the popular imagination (“autonomous”) and a track record that is roughly what you would expect from an engineer who treats every ticket as a fresh puzzle to explore.

There is a different shape available. Instead of improvising, follow instructions. Instead of exploring, execute.

Three properties of a Lego instruction set

Every piece is in the box. Step 14 never says “find a blue 2×4 somewhere in your collection.” It tells you which bag it came from, what colour it is, what shape. The builder never hunts.

Every step is unambiguous. The instruction shows the exact piece, the exact position, the exact orientation. The builder never picks between two plausible interpretations.

No assembly is impossible. The instruction designer verified fit before publishing. No step asks you to connect two pieces that physically cannot click together.

Those three properties map one-to-one onto an agent pipeline.

Mapped to the pipeline

Every piece in the box becomes: file paths, symbols, signatures, mock targets, and dependency resolutions are populated into the manifest by a Synthesizer (drawn from a symbol registry, a Tree-Sitter parse, and a language server), not requested from the Planner. The builder does not search.

Unambiguous steps becomes: each manifest field has a single source of truth. The Coder never chooses between two plausible interpretations of a field, because only one value is present.

No impossible assemblies becomes: a feasibility gate resolves each acceptance criterion against the registry and the language server before any builder runs. Criteria that no test could satisfy on the current source topology (for example, “assert that foo does not call bar when both live in the same module and are therefore unmockable”) are rejected at manifest-validate time, with a structured alternative, before a single token of builder work is spent.

The rule that falls out: if a builder is searching, locating, or deciding, that is brain work leaking downstream. The manifest was incomplete. The fix is to push the work back up to the stage with the context to answer deterministically, not to prompt the builder harder.

Why it matters

The popular alternatives point the other way. AutoGPT-style loops hand the model a goal and let it improvise its way to an answer, which is the “find a blue piece somewhere” architecture scaled to a whole project. Devin’s pitch of an autonomous software engineer is structurally closer to “hire a smart builder” than “give the builder good instructions,” which is a reasonable bet if the bottleneck is builder quality.

Measured on 248 real pipeline runs against a TypeScript fixture, the bottleneck is manifest quality, not builder quality.

A better builder cannot follow an instruction that names a piece that is not in the box.

One note on timing

The 248 runs post was written while the feasibility gate was still on paper. That post and two others described it as “an active design, not a shipped subsystem.” It has since shipped. The measured outcome of the delivered architecture belongs in that post’s follow-up. This post is about the principle.

The pipeline runs on real tickets against a ~100k line TypeScript monorepo. Still R&D.