Erik Perttu

Ho Chi Minh City, Vietnam

Start Here

This page is a summary index. For technical detail, start with the linked essays below.

Public profiles: linkedin.com/in/erikperttu, github.com/erikperttu, x.com/ErikPerttu.

Opus vs Sonnet: Deterministic Gates Beat Model Tier in Code Fixes
The clearest evidence that the gates, not the model, decide correctness: four Claude tiers hit four different failure modes on the same bug, and a different deterministic gate caught every one.
Raising the Floor under the Coder: Generalizing an AI Coding Agent across Ticket Types
How the pipeline went from one zero-Coder ticket type to six, and why four architectural moves did most of the work.
From LLM Author to LLM Reviewer: An AI Coding Agent Authors a Production Feature With Zero LLM Code Generation
How the pipeline's structural floor rose high enough to commit a multi-layer feature without any LLM source-code authoring. The current state of the system.
From LLM Luck to Structurally Guaranteed: One Ticket Across Four Architectural Eras
The architectural overview: four eras, one ticket, cost per run dropping from $0.385 to $0.074.
Prompt Rules Are Advisory; Validators Are Binding
Why structural validation matters more than instructions alone.
What the Symbol Registry Stores, and How It Stays Fresh
The registry that keeps agent context precise across a changing codebase.
The Lego Instructions: An Architectural Principle for AI Coding Agents
Why manifest quality determines outcome more than builder quality: the principle behind the Synthesizer's role.
Stop Asking the Model What the Code Already Knows
Why every field a Planner emits that the codebase already knows is a dice roll, and how machine extraction eliminates it.
Correct Code, Wrong File: How the Write Gate Contains Scope Creep
How write scope is constrained so the right code lands in the right place.
What Calls This Function? Why AI Coding Agents Need a Language Server
Why the pipeline uses language-server context instead of loading whole files.

Profile

I build autonomous engineering pipelines hands-on, backed by years leading engineering teams that shipped production software.

I'm a Swedish engineer based in Ho Chi Minh City. Shipping software to a million users teaches you where systems fail, not in theory, but at 3am when something breaks and people are depending on it.

Hard stops, agent isolation, TDD enforcement: all come from shipping real software and living with the consequences.

Projects

February 2026 – Present

An autonomous engineering pipeline written in Python that takes a ticket from input to tested, reviewed, committed code with no human in the execution loop. The architecture behaves more like a compiler for LLM behavior than a conventional agent framework: LLMs handle the parts that require interpretation, non-LLM code handles everything that can be computed. The same split isn't specific to coding. It's just the domain with a large enough codebase and enough runs to prove it on.

A grounding chat stage runs before autonomous execution begins. The operator confirms intent, exclusions, and design decisions in a structured exchange grounded against the registry. The output is a structured handoff the downstream pipeline reads as the source of truth, not as text to re-interpret.

Specialist agents then run in sequence with no shared memory or reasoning between stages. Each receives a structured artifact from the previous stage and produces one in return. Tests are generated by a deterministic Test Writer before the Coder runs and confirmed failing before implementation starts. Behavioral oracle tests are derived from the same manifest operations that write the implementation, running below the route-test boundary where behavior is actually observable, and proven non-vacuous via two end-of-ticket gates: confirm-RED (oracle must fail before the change exists) and a mutation gate (a derived wrong-version mutant applied to committed code must be killed by the oracle before the ticket closes).

Built on a symbol registry and language server integration. A Synthesizer stage populates the manifest with machine-extracted file paths, symbols, signatures, and dependency resolutions before any builder runs. The registry is machine-derived from the codebase; every fact is extracted, not guessed, eliminating hallucination surfaces. Agents receive precise codebase context without reading entire files. Proven on TypeScript and Python codebases, including a ~100k line TypeScript monorepo. Sequential tickets with no reset between runs.

Every run produces a full artifact chain: per-stage LLM reasoning traces, structured manifests, Coder operations, Reviewer verdicts, and per-stage cost and token data. The archive spans ~1,300 runs from February 2026.

Empirical validation includes a 248-run reliability campaign that exposed a hallucination ceiling and drove architectural changes rather than patching prompts field by field.

Cost trajectory is documented across successive architectural refinements, reducing per-successful-run cost from $0.385 to $0.074 while raising consistency through structural controls.

Safety is structural, not advisory. Hard stops, a write gate that contains scope to the manifest, and full per-stage trace logging are enforced at the architecture level. Validated across ~1,300 runs on the target codebase.

The Debugger router classifies each failure and dispatches to the stage that owns the fix. Code failures route to the Coder with a corrective brief. Upstream specification failures surface to the operator with a structured correction request. Four of six end-of-ticket gates route to a fail-safe repair pass that re-runs the gate authoritatively rather than terminating. Gates verify what changed, not absolute health, catching problems the change did not introduce. It is the only stage that can change where the pipeline goes next.

Each stage has its own model configuration. Model, effort level, and LLM vendor are set independently per agent. Designed to work with any LLM provider.

The pipeline itself is developed under TDD.

Multiple ticket types now execute end-to-end with zero LLM source-code authoring, running on codebase structure alone.

Still R&D.

Project evidence

June 2025 – August 2025

Implemented DQN, REINFORCE, and PPO from scratch using PyTorch. The agent was constrained to first-person visual input only; full game state was available but deliberately excluded. Switched to StableBaselines3 for parallel training once parallel rollouts became the bottleneck.

Experience

January 2017 – Present

Leading technical strategy for Vietnam's largest education review platform, 1M+ MAU. Built and led the engineering team from the ground up: sourced, interviewed, hired, and mentored every engineer. Responsible for technical culture, career development, and engineering standards across the department. Partner directly with C-Suite to translate business goals into technical roadmaps.

Complete cloud migration cutting latency 90% and infrastructure costs 50%
Re-engineered core search logic, 50% increase in user engagement
Built a payment gateway covering MoMo, ZaloPay, and credit cards
Built internal marketing automation tools that tripled lead generation efficiency
Engineered a real-time testing platform handling thousands of concurrent users for large-scale student competitions

January 2016 – January 2017

Technical management for a financial software firm, bridging European stakeholders and a local engineering team in Vietnam.

Led and mentored local developers delivering high-performance, low-latency financial applications in C#/.NET for the banking sector. Worked directly with the CEO on operational reporting and resource allocation. Established development processes including technology selection, time estimation, and quality control. Led technical screening and hiring to scale the team.

Personally architected a backend solution connecting multiple disparate financial systems.

November 2012 – April 2015

Backend development for a fast-paced Swedish tech startup. Built core backend systems for web applications in Python, taking full ownership of feature lifecycles from estimation to deployment. Autonomous environment, high standards, early foundation in scalable architecture and clean code.

Technical Skills

Leadership: Engineering Leadership, Team Building, Hiring & Mentoring, Technical Strategy, Engineering Culture
AI & Agents: Large Language Models (LLM), AI Agents, Multi-Agent Systems, Autonomous Engineering Pipelines, AI Agent Architecture, Code Generation, LLM Hallucination Mitigation, Deterministic Enforcement, Mutation Testing, PyTorch, Test-Driven Development (TDD)
Code Intelligence: Language Server Protocol (LSP), Tree-Sitter, Abstract Syntax Tree (AST), Symbol Registry Design, Static Analysis
Languages & Frameworks: Python, Go (Golang), TypeScript, Node.js, PHP, C
Cloud & Infrastructure: Amazon Web Services (AWS), Docker, CI/CD Pipelines, Elasticsearch, Git, GitHub
Data: SQLite, Relational Database Design, MySQL

Languages

English Bilingual
Swedish Native
Vietnamese Beginner

Erik Perttu

Start Here

Profile

Projects

Autonomous Engineering Pipeline

Reinforcement Learning, VizDoom

Experience

Head of Engineering

Technical Lead & Project Manager

Software Developer

Technical Skills

Languages