The Decision an AI Coding Agent Can't Make Alone: Operator-Grounded Intent Capture
A deterministic resolver that passed every unit test then failed 40% of runs. The fix was operator confirmation, not a smarter algorithm.
A deterministic resolver that passed every unit test then failed 40% of runs. The fix was operator confirmation, not a smarter algorithm.
A new pre-autonomous chat stage lets the operator ground design decisions in the registry before the pipeline runs, adding a new top to the trust hierarchy.
Seven pipeline runs, one ticket, four architectural eras. Per-test cost dropped from $0.385 to $0.074 by replacing LLM guesswork with structural derivation.
What shipped in the three weeks after the 248-run hallucination ceiling: removing the LLM from computable decisions and validating everything else.
Bernoulli model predicted 36% first-pass success across 248 pipeline runs. Measured: 21%. The gap explains why per-field hallucination fixes have a ceiling.