🛠️ IndieKit

AI Coding Does Not Need More Model Hype. It Needs Control Systems.

2026-06-14 · 28 分钟阅读 · AI Coding, Codex, Loop Engineering, AI Agents, Engineering, Open Source

中文版本

Over the past few weeks, one idea has become increasingly clear to me:

AI coding is no longer limited only by model capability.

Models still differ. Some reason better. Some write cleaner code. Some hold context more reliably. But for many real engineering tasks, today’s models are already good enough to do useful work.

The harder question is:

Once an AI agent can keep writing code, changing code, and running tests, how do we manage the work it produces?

That is why I built codex-orchestrator.

It is not another tool that tries to make the model write more code. It is a control layer around Codex App: split work, isolate it in worktrees, persist task state, wake up on a heartbeat, detect stuck sessions, review completed branches, merge clean work, clean up after workers, and keep moving through a roadmap.

It started as a practical problem

I did not start by trying to build a product.

I was working on a large multi-module rewrite. The project had frontend apps, backend services, mobile surfaces, reporting, payments, printing, sync, permissions, and device-facing flows.

A single Codex session worked well for small tasks. But as the project grew, a different class of problem appeared:

So the first version of codex-orchestrator was not a CLI. It was a set of rules.

Each delegated task should run in its own worktree and branch. A worker should only commit to its own branch. It should not merge, push, or clean up its worktree. The orchestrator session should monitor, review, merge, push, and clean up.

That worked well enough to reveal the next problem: rules inside chat are not durable.

Chat context gets compressed. Task IDs go stale. Heartbeats can be missed. Worker state and Git state can disagree. So the project slowly grew from a Codex skill into a small Go helper CLI.

Treat worker branches as reviewable delivery units

“Multi-agent coding” often sounds like hiring a few extra AI employees.

I now think that is the wrong mental model.

The better model is this: every worker branch is a reviewable, rejectable, recoverable delivery unit.

When a worker says it is finished, I do not want to trust the sentence. I want to inspect the unit:

That checklist is often more valuable than asking the model to be a little smarter.

In real projects, the most dangerous failure is not always bad code generation. It is overclaiming evidence:

This is why codex-orchestrator has an evidence-label discipline. The labels are not decoration. They are part of the safety model.

Why a ledger matters

At first, I thought the orchestrator session could simply remember what was going on.

That did not scale.

In a long-running loop, state spreads across many surfaces:

If that state only lives in chat, it eventually becomes unreliable.

So codex-orchestrator uses a local .codex-orchestrator/ledger.json.

The ledger does not replace Codex App. It gives the outer loop a durable local task record. A task can move from pending setup, to active, to completed-unreviewed, to accepted, merged, pushed, and cleaned.

This is not about making AI write code. It is about answering a very practical question:

When I wake up tomorrow morning, how do I know what actually happened overnight?

Heartbeat is useful, but it is not magic

I used to think that adding a heartbeat would make the loop unattended.

Reality was more humbling.

Codex App can wake a thread on a schedule, but that is not the same as an operating-system-level daemon. The machine may sleep. The app may not dispatch the automation. The thread may not get a wakeup.

So I now describe this more conservatively:

That may sound less exciting, but it is more honest.

Reliable engineering systems are not built by declaring “fully autonomous.” They are built by naming boundaries, then adding observation, logs, alerts, and recovery paths around those boundaries.

The status page is not a dashboard. It is a visibility layer.

One lesson surprised me: without a status page, the human operator feels blind.

The loop may be running, but you cannot quickly tell whether it is:

Early on, I relied on command output and chat summaries. That was not enough.

So codex-orchestrator now writes status.md and status.html. They are not meant to be fancy dashboards. They are meant to answer the operational questions:

The goal is simple: reduce the anxiety of not knowing what the agent system is doing.

The biggest trap: dispatching just because capacity exists

One early mistake was very tempting.

If the system allows two concurrent workers and only one is active, should it dispatch another?

Not necessarily.

“Safe to run” and “worth running now” are different things.

If the current feature package is Customer Checkout, the orchestrator should not randomly dispatch a Staff/RBAC task, then a KDS task, then a reporting task just because those tasks do not conflict at the file level.

Each task may be individually reviewable, but the daily progress report will look scattered.

So I changed the rule: feature package first.

This was one of the most important product lessons:

AI orchestration is not “open more sessions.” It is “advance a feature loop.”

What I mean by Loop Engineering

I do not think Loop Engineering means writing a while loop around an agent.

It means designing the outer system around the agent:

Codex App remains the executor. It reads code, edits files, runs tests, and commits branches.

codex-orchestrator provides the operating discipline around that executor. It makes the work reviewable, recoverable, rejectable, mergeable, and cleanable.

What it is not

I do not want to overstate the project.

codex-orchestrator is not:

The accurate description is smaller and more useful:

It is a Codex App-first engineering orchestration workflow, plus a local helper.

The skill teaches Codex App how to run the workflow. The helper provides ledger state, status pages, preflight checks, routines, policy/eval checks, and self-update support.

Codex App still creates the sessions, reads the code, writes the code, and runs the worker tasks.

As models converge, control systems matter more

My current intuition is that model differences will still matter, but the leverage will increasingly move outward.

When models are all capable enough to write code, the differentiator becomes:

That is why I am increasingly confident in this direction.

codex-orchestrator is not a bet on one model. It is a bet on a pattern:

AI coding needs control systems.

The model is the engine.

Real projects still need steering, instruments, brakes, logs, and maintenance procedures.

What comes next

I want to keep improving four areas.

First, the status page should become more human-readable. Less raw machine state, more direct answers: what is happening, why are we waiting, why are we not dispatching, and where is human input required?

Second, failures should become fixtures. If we see pendingWorktreeId not recorded, availableSlots encouraging a bad dispatch, or local proof being overclaimed as direct proof, that should become a policy/eval case.

Third, external model review should become a natural checkpoint. Small slices may not need it. But when a feature package closes, generating a review pack for another model is useful.

Fourth, I want to keep testing it on real projects. Demos rarely expose the hard problems: missed heartbeats, setup races, stale ledgers, misleading status pages, documentation drift, and evidence overclaims.

Closing

codex-orchestrator did not start as an open-source product idea.

It started as a set of engineering habits I needed in order to use Codex App on a real project without losing control of the codebase.

First it was a prompt pattern.

Then it became a skill.

Then it gained a helper, a ledger, a status page, routines, policy checks, eval fixtures, and self-update.

But the core idea has stayed the same:

Do not just make AI write more code. Put AI work into an observable, reviewable, recoverable engineering loop.

If you use Codex App for real projects rather than one-off edits, you can try it here:

github.com/indiekitai/codex-orchestrator