It is Tuesday afternoon. I send a Telegram message that says, in roughly those words, "the timestamp on the task card is off by an hour, fix it." Six minutes later, a PR is open, CI is green, the fix is merged, the dashboard has reloaded, and the timestamp is correct. I didn't touch a keyboard.
This is my favorite thing I've ever built. I call it auto-coding. It is not a chatbot wrapper. It is not "AI pair programming." It is a factory.
Email and text go in. Features come out. The middle is a state machine where every column loads a different prompt.
The factory metaphor is not a metaphor
When you build cars, you don't have one giant worker who casts the engine block, mounts the chassis, paints the body, and signs the inspection report. You have a line. Each station is small, scoped, has the right tool, and hands off cleanly to the next station.
Software is the same. Auto-coding works because I treat my task board the way Toyota treats an assembly line: as a sequence of small, opinionated stations, each one good at exactly one thing.
Inside Symphony, my orchestrator, the same handful of agents move from station to station, but at every station they get a different prompt. Same model, same access, completely different operating mode. A Planner at Planning does not know how to write Rails code. A Reviewer at Needs Review doesn't write code at all, it looks for things that should not be in this PR. A Release worker at Merging / Deploying doesn't argue about design.
The state machine you see on the task board is the factory. Each column is a station with its own job description.
The stations, each one loads a different prompt
External dependency → Blocked · Wrong / unsafe / duplicate → Refused
One prompt per station
This is the trick that makes the whole thing work. A general-purpose AI agent is a generalist on a good day and a confused intern on a bad one. A station-specific agent is sharp.
Planning
Read the inbound email or Telegram message. Identify the actual product. Inspect the current code and task state. Decide: already exists, partial, missing, unsafe, or wrong product. Write acceptance criteria like a PM. Do not implement anything.
Development
Pick up a fully-scoped card. Implement it in an isolated worktree. Run the focused tests. Open a PR with a structured handoff comment. Stop. You are not allowed to argue about scope.
Needs Review
Read the diff. Check correctness, product fit, scope drift, tests, security, deploy risk. Distinguish "this works" from "this should ship." Send things back to Changes Requested with specific feedback when they fail the bar.
Changes Requested
Read the review context. Make the requested fixes. Don't expand scope. Return the card to Needs Review.
Ready to Merge
Verify CI is actually green. Verify the PR has no unresolved comments. Verify the branch is up to date with main. Hand off cleanly.
Merging / Deploying
Merge. Watch the deploy. Run smoke checks. Record evidence. Move to Done only when production actually agrees.
Each prompt is two or three paragraphs of instructions, hard rules, and red lines. When a card lands in a column, the orchestrator picks the right worker and hands it the right prompt. The agent is not becoming a PM or a reviewer; it is being briefed as one for the duration of this card.
The input is just email and text
This is the part that still feels like cheating. The interface is not a dashboard. It's not Jira. It's not a custom UI I had to design. It's the inbox.
Product ideas arrive at product@. Support and incidents arrive at the support inbox. Operator-level nudges arrive on Telegram. The Planning prompt reads them, decides what's real, and shapes the task. Within minutes, a card is on the board, a coder has picked it up, and a PR is in flight.
Email is a great interface for AI work. It's universal. Every human, every service, every CI system knows how to send one. The message has structure (subject, body, from, attachments). The thread tracks state. And, the part nobody appreciates, it doesn't require me to remember any URLs.
Most of the time, I'm not "operating" the factory. I'm just answering my email.
Why this is my favorite thing
Three reasons.
It collapses the loop
A normal feature cycle is days. Idea → spec → ticket → standup → estimate → assign → implement → review → merge → deploy → verify → close. Auto-coding compresses that into the kind of flow you get when one really good engineer is having a good morning. Six minutes is real. I've watched it happen dozens of times. The orchestrator runs on a five-minute tick, and most cards make it through several stations per tick.
It makes the work inspectable
Every card carries its history. Which prompt picked it up. What the planner decided. What the coder did. What the reviewer flagged. What the release worker verified. You can roll back any decision, audit any change, and see why a card got refused if it did. Nothing important lives only in chat.
It is honest about its limits
The factory does not pretend to be autonomous. Blocked exists for a reason. So does Refused. So does the human-review gate at Ready to Merge for anything risky. The orchestrator's job is not to remove me from the loop. It's to remove me from the tedious parts of the loop so I can show up at the decisions that matter.
What this isn't
- It isn't "give an AI access to your repo and pray." Each station has a tight prompt, a defined input, and a defined output. Without that scaffolding, you get confused interns with sudo.
- It isn't a chatbot. There is no chat window. There is a board, an inbox, and a clock.
- It isn't replacing engineers. It is replacing the thirty minutes between "good idea" and "first commit", for the boring 80% of asks. The interesting 20% still gets the human treatment.
- It isn't fragile. When something breaks, a card moves to
Blockedand a human gets a Telegram ping. Failure is a state, not a crash.
The takeaway
If you remember one thing from this article, make it this: the leverage is not in the model. It's in the factory you build around the model. A general-purpose LLM that does nothing is a $20/month tab in your browser. A general-purpose LLM that lives inside a state machine where every column loads a different prompt, reads from your real systems, and writes real diffs that go through real CI is a teammate.
This is what I'm building for businesses now. Not demos. Not chatbots. A factory.
The next feature you ship, the most boring possible version, could be a Telegram message away. That's still the part that feels like cheating.