David Chung
Back to Lab
/
ai agents project-management systems

The AI Project Manager That Actually Saved Money

What I learned after rebuilding my PM agent into a multi-agent factory instead of another chat prompt.

I updated Pete again.

Pete is my technical PM agent. Not a mascot. Not a cute name for a prompt. Pete is the system I use when a software project is too big to hold in one clean pass.

The old version worked. It decomposed work into user stories, kept files scoped, and gave agents clear assignments.

But after a few real builds, the leak became obvious.

Not one giant failure. Small waste everywhere.

Subagents were returning 300 tokens of self-congratulation around a 50-token result. Status updates were rewriting Markdown files over and over. Verification commands were dumping hundreds of lines back into the context window. The main model was doing work a cheaper model could do. The PM record lived in chat until someone remembered to save it.

Each leak felt small.

Together, they were the difference between a clean build and a compaction spiral.

What Pete v2 Is Now

Pete v2 is a multi-agent factory.

It turns a technical build into atomized US-XXX stories, assigns each story a model tier, locks the files each agent is allowed to touch, dispatches safe work in parallel, verifies the outputs before commit, and preserves the durable PM record on disk.

The current skill stack has three parts:

PieceJobWhy It Matters
pete-pm-tech-v2The PM system: plan, scope, waves, verification, issuesKeeps the build from drifting
subagent-brief-v2The dispatch templateSends agents only what they need
subagent-shorthand-v2The return contractForces short, parseable status reports

That sounds procedural. It is.

That’s the point.

The magic isn’t that an AI agent writes code. The magic is that the work gets routed to the right context at the right cost with the right guardrails.

The Money Was In The Routing

The biggest savings didn’t come from shorter status reports.

The biggest savings came from not using the most expensive model for everything.

During the first local_wispr build, Pete split the project into 13 stories and dispatched 12 of them across five waves. Most of the work happened in cheap subagent context: Haiku for mechanical tasks, Sonnet for judgment-heavy code, and Opus in the main context for strategy and PM decisions.

The estimate from that build:

ApproachEstimated Direct Token CostHidden Cost
Write everything in main Opus context$5-8Main context fills with file contents and forces compaction
Pete v2 multi-agent dispatch$2.50-4Coordination overhead, but context stays clean

On that one small Mac utility, the direct bill was roughly cut in half.

But the real savings were bigger than the bill.

The important line from the log was this: delegation is the 60-80% optimization. Shorthand is the 5-10% optimization.

That’s the lesson I don’t want to lose.

Making reports shorter helps. Routing work to the right model changes the economics of the whole build.

The Time Savings Showed Up Next

The next day, Pete v2 got a harder test.

local_wispr v2 wasn’t a tiny cleanup. It added UI parity with Wispr Flow: floating overlay, menu bar tray, settings dialog, streaming partials, cancel behavior, MLX model support, offline model rules, hotkey changes, and PortAudio resilience.

Pete dispatched 17 stories across eight waves.

The result: about two hours wall clock instead of an estimated six hours sequential.

Same work. Different architecture.

That matters because AI doesn’t just have a dollar cost. It has an attention cost. If a build takes six hours, the human operator degrades. Decisions get sloppier. Smoke testing gets rushed. The last 20% turns into “good enough” because everyone is tired.

Cutting wall clock by two-thirds isn’t just speed.

It protects judgment.

The Context Savings Were The Quiet Win

The Pete update that looks least exciting is probably the most useful at scale.

Pete v2.1 now says: don’t echo verification output.

That sounds petty until you watch a build eat itself.

A verification command can produce 200 lines. Five verification commands across one wave can push 1,000 lines of junk back into the next turn. Across a 50-story branch, that is 5,000 to 10,000 tokens of noise.

Not useful context. Noise.

So Pete now forces summary-style verification:

python3 -m py_compile file.py && echo OK || echo FAIL

or count-based checks:

rg "pattern" path | wc -l

The full output is still available when something fails. We just stop paying to reread it when everything passed.

That’s the difference between debugging and hoarding.

Live State And Durable Memory Are Not The Same Thing

The other big Pete update was status handling.

Before, every story flip lived in pm/status.md: ready, in progress, done, blocked. That file is useful. Future-me needs it. But editing it on every state change is expensive.

Each flip costs a read, an edit, and then the Markdown comes back into context. Multiply that across dozens of stories and you’re burning attention on bookkeeping.

Pete v2 now splits the job:

State TypeToolWhy
Live wave stateHarness task listCheap, fast, visible during the run
Durable project recordpm/status.mdSaved at wave boundaries for future sessions
Blockers and riskspm/ISSUES.mdAppend-only, durable, not trapped in chat

This is a small systems design lesson.

Ephemeral state should be cheap. Durable state should be reliable.

When you use one artifact for both, you pay the worst cost of each.

The QA Gate Paid For Itself

The clearest proof came from R2-Coach Copilot.

That wasn’t a toy build. It was an abandoned Tauri + Rust + Swift coaching HUD that had never worked end-to-end. Pete decomposed the rebuild into 30+ stories, file-locked the work, and dispatched roughly 30 subagent tasks.

The rebuild had a midstream architecture pivot: replace the old Deepgram/cpal audio path with a Swift sidecar, ScreenCaptureKit, AVAudioEngine, and local WhisperKit.

That’s exactly the kind of project where chat-only PM falls apart.

Pete’s four-doc artifact set kept the rework scoped:

  1. implementation_plan.md held the constraint, bet, usage flow, technical approach, and story graph.
  2. status.md tracked story state across waves.
  3. ISSUES.md preserved blockers, deviations, and intentional gaps.
  4. SMOKE_TEST.md captured manual gates that code tests could not cover.

The QA gate caught 11 real bugs: 3 P0s, 7 P1s, and more lower-severity issues.

Without that gate, smoke testing would have failed on permissions, audio-thread allocation, Gemini 404s, and race conditions.

That’s the part people miss when they talk about “AI saving time.”

Bad AI workflows save time by skipping checks.

Good AI workflows save time because the checks are designed into the system.

What I Would Tell Someone Building Their Own PM Agent

Do not start by writing a clever persona.

Start by deciding what the PM is allowed to protect.

Pete protects five things:

  1. Scope: every build starts with the real constraint, not the requested feature.
  2. Files: no two parallel agents touch the same file.
  3. Context: expensive main context stays reserved for strategy, architecture, and judgment.
  4. Verification: agents don’t get believed just because they sound confident.
  5. Memory: decisions and issues live on disk, not just in the conversation.

That’s why it saves money.

Not because it writes code faster in some vague way. Because it prevents the expensive failure modes: wrong model doing cheap work, repeated context bloat, hidden rework, status trapped in chat, and bugs discovered too late.

The Actual Savings

Here’s the clean version:

Savings TypeObserved ResultSource
Direct token costlocal_wispr v1 estimated $2.50-4 vs $5-8 in main context13-story build log
Delegation economicsDelegation is the 60-80% optimization; shorthand is 5-10%Pete skill update log
Wall clocklocal_wispr v2 cut estimated sequential work from ~6 hours to ~2 hours17-story v2 log
Context hygieneSummarized verification saves 5-10k tokens of noise on a medium/large branchPete v2.1 skill note
Dispatch overheadInlining the output contract saves one file read plus ~400 tokens per dispatchsubagent-brief-v2
QA reworkR2-Coach QA gate caught 11 real bugs before smokeR2 rebuild log

At one build, the dollar delta was small: a few bucks. Across 50+ branches per quarter, the same pattern becomes hundreds of dollars saved, plus fewer forced compactions and cleaner decision-making.

Those aren’t all the same kind of savings.

Some are dollars. Some are hours. Some are context. Some are avoided rework.

But that is how real operational savings work. They rarely show up as one dramatic number. They show up as fewer expensive mistakes repeated across the whole system.

The Bigger Lesson

AI agents don’t become useful because they’re autonomous.

They become useful when the workflow around them makes autonomy safe.

Pete isn’t valuable because it makes agents go faster. Pete is valuable because it makes fast work inspectable.

That’s the difference.

Without the PM layer, multi-agent work is just chaos with a bigger token bill. With the PM layer, it becomes a factory: scoped inputs, constrained files, right-sized models, short reports, verification gates, and durable memory.

The next version will keep getting tighter.

But the pattern is already clear:

Send cheap work to cheap models. Keep strategic judgment in the main context. Stop paying to reread noise. Verify before you believe. Save the decisions where future-you can find them.

Simple systems beat expensive improvisation.

Related Notes