What Calls This Function? Why AI Coding Agents Need a Language Server

The language server has a startup cost. One Coder retry costs more than that startup, in tokens, every single time.

That asymmetry is why it ended up as an architectural requirement and not an optional optimisation. The economics are not close.

What the pipeline used before

The first version of the symbol indexer used regular expressions.

It worked well enough to get started. Function definitions, class declarations, export statements: regex can extract those reliably from well-formatted code. For a proof of concept, it was sufficient.

But it was brittle from the start and I knew it. A teacher told me something when I was learning to program that stuck:

If you are solving a problem with regular expressions, you now have two problems.

The regex indexer was always a starting point, not a destination. Minified files, unusual formatting, nested constructs. Any of those could break a regex pattern silently, producing missing symbols or wrong line numbers with no error to investigate.

Tree-Sitter replaced it. Tree-Sitter is a fast incremental syntax parser that produces a proper syntax tree rather than pattern-matching against text. It handles formatting variations, nested structures, and edge cases that would require an ever-growing set of regex patches to cover. It is also error-tolerant: it can parse files with syntax errors and still extract the symbols it can identify. The results go into a registry: a record per symbol with its location, type, and metadata. When the Planner reads a ticket, it queries the registry for a map of what exists and where, reasons about which symbols are relevant, and passes precise details for those specific symbols to the downstream agents.

The registry is persistent and queryable, giving the agents a live index of the codebase without reading entire files. For most tickets (additive changes, new endpoints, new functions), that is sufficient.

Where Tree-Sitter stops

Tree-Sitter is a syntax parser. It reads one file’s text and produces a syntax tree. It has no concept of what a name resolves to across files.

It can tell you where getAllPosts is defined. It cannot tell you where it is called. It can extract Post as a parameter annotation. It cannot tell you what Post resolves to, or whether a change to it breaks three other files.

This is not a gap in the implementation. It is a property of what Tree-Sitter is. A syntax parser does not do semantic analysis. That requires a language server.

Tree-Sitter knows where getAllPosts is defined. The language server knows every file that calls it: posts.ts, search.ts, and api/posts/index.ts, leading to a known blast radius.

The question that breaks the registry approach

The Planner’s job, when decomposing an interface-changing ticket, is to determine blast radius: which other parts of the codebase depend on the symbol being changed, and which of those need to change too.

For “add a comments endpoint”, an additive change with no existing interface touched, the registry is sufficient. The Planner looks at what exists, decides what to create, produces a manifest. The check passes cleanly.

For “change how getAllPosts returns data”, the Planner must know what calls getAllPosts to determine whether a “update callers” sub-ticket is needed. Without that, it either guesses wrong scope and leaves callers broken, or over-scopes and touches files it should not. Neither is acceptable.

textDocument/references answers this directly. Send the server the file and position of a symbol, get back every location in the codebase that references it. Nothing else does this reliably across barrel re-exports, TypeScript path aliases, and index files.

The economic case

A Coder retry cycle costs more than it looks. A missed import breaks the type checker. Tests fail. The pipeline re-runs the full Coder prompt: reconstructed context, full generation cycle, another test run, potentially a Debugger pass. On a moderately complex ticket that is 20,000 to 30,000 additional input tokens and another 30 to 60 seconds of wall clock time.

A language server diagnostic fix for the same import error: one local process request, zero tokens, one targeted patch, one test re-run. The test re-run is unavoidable. Everything else disappears.

The blast-radius result from the first pre-LSP run on a pagination ticket is concrete. The Planner missed that search.ts was a caller of getAllPosts. The Coder changed the return type. Eleven typecheck errors. A Debugger pass. 92,618 tokens. 122 seconds.

The same ticket with LSP active: no search.ts errors. Typecheck clean. The blast-radius failure did not occur.

	Pre-LSP	Post-LSP
Result	APPROVE	APPROVE
`search.ts` typecheck errors	11	0
`lib/posts.ts` modified	no	yes
Debugger fired for	blast-radius type mismatch	logic bug, unrelated to LSP
Tokens	92,618	95,170
Runtime	122s	97s

The token counts are similar because both runs had a Debugger cycle, just for different reasons. In the pre-LSP run, the Debugger fixed a structural failure caused by a missed blast-radius caller. In the post-LSP run, it fixed a logic error: a validation check placed after a clamping function, so it never fired. That is a different category of failure, one that requires understanding call flow, not just knowing which files were touched. LSP cannot prevent those. That is still the Debugger’s job.

The blast-radius overhead in the pre-LSP run was 33,364 extra input tokens and 74 extra seconds, the cost of missing one caller. That overhead does not appear in the post-LSP run.

The pre-LSP Debugger spent its cycles on structural failures that LSP now prevents. The post-LSP Debugger spent its cycle on something it is actually for.

What changes

The language server starts at boot, not lazily. Decomposition needs reference data before any manifest is produced, so the server has to be ready at preflight, not on first use.

A diagnostic pass runs between every Coder write and the test suite. The server checks the modified files for type and import errors before the tests run. Structural problems get caught and fixed before they reach the test suite. The test suite handles logic failures. The diagnostic pass handles structural ones.

They are different problems and should not share the same retry budget.

The Debugger is now reserved for genuine logic failures. Import and type errors that previously reached it no longer do.

What does not change

Tree-Sitter and the registry stay. Bulk indexing, domain tagging, content hashing, the context assembler. None of that changes. The registry is still how the Planner navigates the codebase at the start of every ticket. The language server extends it, it does not replace it.

The pre-write syntax gate also stays. Tree-Sitter is faster for checking whether a file is structurally valid before writing it to disk. That check runs in the same process, no network hop, sub-millisecond. Using the language server for that would be slower with no benefit.

The test runner stays. LSP diagnostics are a pre-test filter. They catch structural problems early. They do not verify behaviour. That is still the test suite’s job.

What this does not solve

Reference lookup prevents the specific failure where a caller is missed during blast-radius analysis. It does not prevent logic errors, scope misjudgements, or multi-file side effects that do not manifest as type errors. Those are still the Reviewer’s problem to catch. The fixture for this work is a TypeScript blog API, which is small and well-structured. Whether the same approach holds on a larger or messier codebase is not yet tested.

The registry answers “what exists and where.” The language server answers “what calls this and what does it resolve to.” Both are needed. The right question was never which one to use. It was when the second one becomes worth its startup cost. The first retry that hit a missed caller answered it.

The pipeline runs inside Docker on real tickets. Numbers are from actual runs against a TypeScript blog API fixture. Still R&D.