---
## The Idea
I'm a cloud architect by day, and I have a hobby project: a Lua training mod for Street Fighter III: 3rd Strike running on FightCade (a MAME-based emulator). The mod adds training tools that don't exist in the original game — parry training, juggle air timers, tech throw indicators, and more.
The problem is time. After work, I have maybe an hour or two to make progress before I need to sleep. But Claude Code is sitting there, capable of reading code, writing code, and doing research — all without me watching. So I asked myself: can I let it keep working while I sleep?
The answer, after a bit of setup, is yes. But the more interesting question wasn't "can AI agents work overnight?" — it was "how do I stay in control while they do?"
---
## The Architecture
The core challenge with AI agents working autonomously is context. Each agent starts cold — it doesn't know what the previous one did, what decisions were made, or what state the codebase is in. I solved this with a single shared file: `OVERNIGHT_STATE.md`.
Think of it as a baton in a relay race. Every subagent:
1. Reads `OVERNIGHT_STATE.md` first to understand the current state and task
2. Does its assigned work
3. Writes its results and any blockers back into `OVERNIGHT_STATE.md` before stopping
No agent finishes without updating the baton. This way, even if Claude Code restarts or a session expires, the next agent can pick up exactly where things left off.
### The A/B/C Pattern
I split each task into three roles:
**Agent A — Research**
Reads the existing code, searches GitHub for reference implementations, and documents what it finds. It doesn't change anything — it just produces a detailed analysis: relevant file paths, line numbers, how the existing code is structured, and a proposed implementation plan.
**Agent B — Implementation**
Takes Agent A's analysis and writes the actual code. Strict scope limits — it only touches the files and lines A identified. No exploring, no refactoring, no scope creep.
**Agent C — Verification**
Reads Agent B's output independently and checks for logical errors, inconsistencies with the existing codebase, and anything that would prevent the code from running. It doesn't fix anything — it produces a pass/fail report with specific issues listed.
This separation matters. An agent that writes code and reviews its own work will miss things. Agent C comes in fresh, with no attachment to the implementation choices Agent B made.
---
## Human in the Loop — By Design
This is the part I spent the most time thinking about. Autonomous agents are only useful if you can trust what they produce. And trust has to be earned gradually.
### Start with a dry run
Before I let any agent touch real code, I ran a dry run: Agent A only, research task, no file modifications at all. The goal was just to verify that the state-passing mechanism worked — that an agent could read `OVERNIGHT_STATE.md`, do meaningful work, and write useful output back.
It worked. That dry run gave me enough confidence to move to the next step.
### Escalate task risk slowly
I didn't start with a feature. My first real overnight task was updating the README — adding documentation for a feature that already existed and was already tested. Low risk: worst case, the text is awkward and I rewrite it in the morning. No broken code, no untested logic.
Only after that succeeded did I move to actual code: the 720 Input Trainer feature. By then I had two overnight runs worth of evidence that the system was reliable.
The progression was deliberate:
| Night | Task | Risk level |
|-------|------|------------|
| 0 | Dry run (research only, no file changes) | None |
| 1 | README update (documentation only) | Low |
| 2 | New feature implementation | Medium |
### Pre-written constraints, not runtime decisions
The guard rails aren't instructions I give each agent at runtime — they're written into the system design before I go to sleep. The agents don't decide whether to push to git; they simply cannot, because the rule is in the briefing. The agents don't decide when to stop; they stop automatically when they hit a decision point they can't resolve.
The key constraints I wrote in:
**No git push.** The mod is a public repo. Nothing goes remote without my eyes on it. Agents can write to local files, but publishing is my job.
**No git commit either.** I test first, then commit. The agent produces a diff; I decide if it ships.
**Mandatory code review.** Agent C runs on every implementation task. Not optional, not skippable. If C finds a critical issue, the run stops and I wake up to a problem report, not broken code in my repo.
**Decision points stop the run.** If an agent hits something ambiguous — a design question, missing context, something that requires judgment — it writes it to the "Waiting for Peter" section of `OVERNIGHT_STATE.md` and stops. It doesn't guess.
**Usage monitoring.** I check Claude API spend before each agent starts via `ccusage`. If a 5-hour block exceeds $12, pause. If the daily total exceeds $25, stop entirely.
**Scope is explicit.** Each agent's briefing includes specific file paths and line numbers. "Implement the 720 mode" is not a valid briefing. "Add to `special_training_mode` table at line 548, add struct at line 576, add if block after line 3543" is.
---
## Last Night's Task: 720 Input Trainer
The specific task was building a training mode that detects whether a player has completed a 720° joystick rotation — the input for Hugo's Gigas Breaker super art in SF3, one of the hardest inputs in fighting games.
I won't go deep into the implementation here, but the highlight was Agent C catching a critical bug before I ever touched the code: Agent B had used Lua 5.3 bitwise operators (`|`, `&`, `<<`) in several places. FightCade embeds Lua 5.1, which doesn't support those operators. The entire script would have failed to load.
Agent C documented the exact line numbers and the corrected code. I wake up, apply a 5-minute fix, and I'm ready to test — instead of starting my morning debugging a cryptic syntax error.
That's the code review step earning its place.
---
## What I Woke Up To
A complete `OVERNIGHT_STATE.md` with:
- Agent A's architecture analysis and implementation plan
- Agent B's implementation notes (what changed, where, any surprises)
- Agent C's code review (one critical bug, exact fix provided, everything else passed)
- A clear checklist of what I need to do to finish and ship
Total time I need to spend this morning: apply the Lua syntax fix (5 minutes), test in FightCade (10 minutes), commit if it works.
That's a full feature research-and-implement cycle done overnight. I review, test, and ship.
---
## What's Next
The system works, but the human-in-the-loop pieces are still mostly manual. I set up the task list before I sleep, I manually kick off the first agent, and I review everything in the morning. That's fine — that's the point.
Eventually I want to add a monitoring layer through 9527 (my other AI assistant) so it can check `OVERNIGHT_STATE.md` on a schedule and notify me via WhatsApp if the run completes or gets stuck. But I'll build that incrementally, the same way I built this — dry run first, escalate slowly.
---
Reference:
1. GitHub:
https://github.com/Juilin77/3rd_training_lua_peter
First Published (YYYY-MM-DD) / Last Modified (YYYY-MM-DD)
2026-06-19 / 2026-06-19


0 comments:
Post a Comment