May 7, 2026 /

ai agents project-management systems

The AI Project Manager That Actually Saved Money

What I learned after rebuilding my PM agent into a multi-agent factory instead of another chat prompt.

An assembly line where work enters one end, splits into parallel stations sized to different workers, passes a single inspection gate, and ships out the far end.

I updated Pete again.

Pete is my technical PM agent. Not a mascot. Not a cute name for a prompt. Pete is the system I reach for when a software project is too big to hold in one clean pass.

Think of him as the floor manager of a factory.

The old version worked. It broke a project into small jobs, kept each worker on its own bench, and handed out clear assignments.

But after a few real builds, the leak became obvious.

Not one giant failure. Small waste everywhere.

Subagents, the helper AIs Pete hands a single job to, were returning 300 words of self-congratulation wrapped around a 50-word result. Status files were getting rewritten over and over. Verification commands, the checks that confirm code actually runs, were dumping hundreds of lines back into the conversation. The expensive model was doing work a cheap model could do. The project record lived in chat until someone remembered to save it.

Each leak felt small.

Together, they were the difference between a clean build and a compaction spiral, the failure where the AI’s working memory fills up and it starts throwing away context to keep going.

What Pete v2 Is Now

Pete v2 is a multi-agent factory.

It turns a technical build into a set of small numbered jobs, gives each job a worker sized to the task, locks the files each worker is allowed to touch, sends the safe jobs down the line in parallel, inspects the output before it ships, and files the durable record on disk.

The line itself has three parts:

Piece	Job	Why It Matters
`pete-pm-tech-v2`	The floor manager: plan, scope, schedule, inspect	Keeps the build from drifting
`subagent-brief-v2`	The work order handed to each station	Sends a worker only what it needs
`subagent-shorthand-v2`	The return slip every worker fills out	Forces short, readable status

That sounds procedural. It is.

That’s the point.

The magic isn’t that an AI writes code. The magic is that the work gets routed to the right bench, at the right cost, behind the right guardrails.

A factory pipeline: a single block of work enters on the left, splits into three parallel stations of different sizes, each station's output converges on one inspection gate in the middle, and finished work ships out the right. — The factory: one build, split into right-sized stations, through one inspection gate, out the door.

The Money Was In The Routing

The biggest savings didn’t come from shorter status slips.

The biggest savings came from not putting the most expensive worker on every job.

The AI models come in tiers: Haiku is the cheap fast worker, Sonnet is the mid-priced one with better judgment, Opus is the expensive specialist. During the first local_wispr build, a small Mac utility, Pete split the project into 13 jobs and sent 12 of them down the line across five waves. Most of the work ran on the cheap benches: Haiku for the mechanical jobs, Sonnet for the code that needed judgment, Opus held back for strategy and the hard calls.

The estimate from that build:

Approach	Estimated Direct Cost	Hidden Cost
Do everything on the expensive bench	`$5-8`	The main workspace fills with file contents and forces a memory dump
Pete v2 multi-agent dispatch	`$2.50-4`	Some coordination overhead, but the workspace stays clean

On that one small utility, the direct bill was roughly cut in half.

But the real savings were bigger than the bill.

The important line from the log was this: delegation is the 60 to 80 percent win. Shorter reports are the 5 to 10 percent win.

That’s the lesson I don’t want to lose.

Making reports shorter helps. Routing work to the right worker changes the economics of the whole build.

Send cheap work to cheap workers. Keep the expensive one for the hard calls.

The Time Savings Showed Up Next

The next day, Pete v2 got a harder test.

local_wispr v2 wasn’t a tiny cleanup. It added a floating overlay, a menu bar tray, a settings dialog, live partial text, cancel behavior, support for MLX (Apple’s on-device AI engine so the model runs on the Mac, not in the cloud), offline rules, hotkey changes, and audio resilience.

Pete sent 17 jobs down the line across eight waves.

The result: about two hours on the clock instead of an estimated six hours done one job at a time.

Same work. Different floor plan.

That matters because AI doesn’t just have a dollar cost. It has an attention cost. If a build takes six hours, the human running it degrades. Decisions get sloppier. Testing gets rushed. The last 20 percent turns into “good enough” because everyone is tired.

Cutting the clock by two-thirds isn’t just speed.

It protects judgment.

The Context Savings Were The Quiet Win

The update that looks least exciting is probably the most useful at scale.

Pete v2.1 now says: don’t echo the inspection output.

That sounds petty until you watch a build eat itself.

One check can produce 200 lines. Five checks in one wave can push 1,000 lines of junk back into the next turn. Across a 50-job branch, that is 5,000 to 10,000 words of noise.

Not useful. Noise.

So Pete now forces a summary-style check. Instead of printing everything the check saw, it prints one word, pass or fail:

python3 -m py_compile file.py && echo OK || echo FAIL

or a simple count:

rg "pattern" path | wc -l

The full output is still there when something fails. We just stop paying to reread it when everything passed.

That’s the difference between debugging and hoarding.

Live State And Durable Memory Are Not The Same Thing

The other big update was how Pete tracks status.

Before, every job change lived in pm/status.md: ready, in progress, done, blocked. That file is useful. Future-me needs it. But editing it on every change is expensive.

Each flip costs a read, an edit, and then the whole file comes back into the workspace. Multiply that across dozens of jobs and you’re burning attention on paperwork.

Pete v2 now splits the job:

State Type	Where It Lives	Why
Live wave state	The harness task list, a scratchpad for the current run	Cheap, fast, visible while it runs
Durable project record	`pm/status.md`	Saved at wave boundaries for future sessions
Blockers and risks	`pm/ISSUES.md`	Append-only, durable, not trapped in chat

This is a small systems lesson.

Scratch state should be cheap. Permanent state should be reliable.

When you use one drawer for both, you pay the worst cost of each.

The QA Gate Paid For Itself

The clearest proof came from R2-Coach Copilot.

That wasn’t a toy build. It was an abandoned coaching HUD, a desktop overlay app, that had never worked end to end. Pete broke the rebuild into 30-plus jobs, locked the files, and sent roughly 30 of them down the line.

The rebuild had a midstream pivot: rip out the old audio path and replace it with a new one built on Apple’s screen and audio capture tools and a local speech-to-text model.

That’s exactly the kind of project where chat-only PM falls apart.

Pete’s four-document set kept the rework scoped:

implementation_plan.md held the constraint, the bet, the user flow, the technical approach, and the job graph.
status.md tracked job state across waves.
ISSUES.md preserved blockers, deviations, and intentional gaps.
SMOKE_TEST.md captured the by-hand checks that code tests could not cover.

The QA gate, the inspection station every job passes before it ships, caught 11 real bugs: 3 P0s and 7 P1s, the most severe and the next tier down, plus a few smaller ones.

Without that gate, testing would have failed later on permissions, an audio crash, a broken model call, and timing bugs.

That’s the part people miss when they talk about “AI saving time.”

Bad AI workflows save time by skipping checks.

Good AI workflows save time because the checks are built into the line.

What I Would Tell Someone Building Their Own PM Agent

Do not start by writing a clever persona.

Start by deciding what the floor manager is allowed to protect.

Pete protects five things:

Scope: every build starts with the real constraint, not the requested feature.
Files: no two parallel workers touch the same file.
Context: the expensive bench stays reserved for strategy, architecture, and judgment.
Verification: a worker doesn’t get believed just because it sounds confident.
Memory: decisions and issues live on disk, not just in the conversation.

That’s why it saves money.

Not because it writes code faster in some vague way. Because it prevents the expensive failure modes: the wrong worker doing cheap work, repeated context bloat, hidden rework, status trapped in chat, and bugs found too late.

The Actual Savings

Here’s the clean version:

Savings Type	Observed Result	Source
Direct cost	`local_wispr` v1 estimated `$2.50-4` vs `$5-8` on the expensive bench	13-job build log
Delegation economics	Delegation is the 60 to 80 percent win; shorthand is 5 to 10 percent	Pete skill update log
Wall clock	`local_wispr` v2 cut estimated work from ~6 hours to ~2 hours	17-job v2 log
Context hygiene	Summarized checks save 5 to 10k words of noise on a medium or large branch	Pete v2.1 skill note
Dispatch overhead	Inlining the return slip saves one file read plus ~400 words per job	`subagent-brief-v2`
QA rework	R2-Coach gate caught 11 real bugs before testing	R2 rebuild log

At one build, the dollar delta was small: a few bucks. Across 50-plus branches per quarter, the same pattern becomes hundreds of dollars saved, plus fewer forced memory dumps and cleaner decisions.

Those aren’t all the same kind of savings.

Some are dollars. Some are hours. Some are context. Some are avoided rework.

But that is how real operational savings work. They rarely show up as one dramatic number. They show up as fewer expensive mistakes repeated across the whole system.

The Bigger Lesson

AI agents don’t become useful because they’re autonomous.

They become useful when the line around them makes autonomy safe.

Pete isn’t valuable because it makes agents go faster. Pete is valuable because it makes fast work inspectable.

That’s the difference.

Without the PM layer, multi-agent work is just chaos with a bigger bill. With the PM layer, it becomes a factory: scoped inputs, constrained files, right-sized workers, short reports, an inspection gate, and durable memory.

The next version will keep getting tighter.

But the pattern is already clear:

Send cheap work to cheap workers. Keep the hard calls on the expensive bench. Stop paying to reread noise. Verify before you believe. File the decisions where future-you can find them.

Simple systems beat expensive improvisation.