I Published a Post Saying Claude Code Burns 32% of My Plan Per Session. Then I Measured It.
I estimated 32% plan burn per session. Then I measured a 2-day production sprint: 2% per session, 20% weekly. The architecture sets the burn, not the prompt.
Why this matters
I estimated 32% plan burn per session. Then I measured a 2-day production sprint: 2% per session, 20% weekly. The architecture sets the burn, not the prompt.
Two days before this measurement, I published a post arguing that plan-quota, not API tokens, is the binding constraint for Claude Code under subscription pricing. I estimated roughly 32% of the weekly plan per 8-hour session.
Then I ran /ui-autopilot against my usage dashboard at the close of a 2-day sprint and read the actual numbers.
The published estimate was wrong. In my favor.
The sprint context
The 2-day window covered: shipping two phases of a content sequence, deploying an MCP server to GA, running a browser-automation skill through 3 zero-hard-wall production runs, retrofitting a drafting pipeline on 3 prior posts, and multiple Vercel and Convex deploy cycles with GSC and IndexNow re-submits.
That is a representative production workload. Not a toy session. Not a demo. The agent ran continuously across two days on real shipped work.
The actual numbers (measured 2026-05-19)
| Metric | Measured value | Reset window remaining |
|---|---|---|
| Current 5-hour session burn | 2% used | 4h 43min |
| Weekly all-models burn | 20% used | 16h 23min |
| Weekly Sonnet-only burn | 0% used | 16h 23min |
| Extra usage overage spend | $0.00 of $120 monthly cap | Jun 1 |
| Plan tier | Max (20x) | n/a |
The 5-hour session burned 2%. The weekly burn after a 2-day production sprint was 20%.
Why this refutes the 32% claim
The manifesto post arrived at ~32% per session through a different measurement window. That measurement was taken during a period when I was using Claude Code closer to a runtime: the conversation was the work, long threads, repeated context reconstruction, fewer subagents and hooks absorbing the tool calls.
The 2-day sprint that produced the 2% session burn used the orchestration architecture: subagents run in background tasks, hooks intercept tool calls before they reach the model, disk-resident state means the model doesn’t rebuild context from scratch each session, a retrieval layer answers specialty questions without the model doing open-ended search.
The difference between 32% and 2% per session is not a measurement error. It is the difference between two architectures.
What the orchestration pattern actually does to token consumption
When Claude Code runs as a runtime, each session requires the model to:
- Re-read project state from the conversation or from re-injected context
- Hold the task hierarchy in the thread
- Perform tool calls inline, where each call costs tokens in both input and output
- Regenerate context summaries when the window fills
When Claude Code runs as an orchestration layer, most of that work moves out of the model’s context:
- State lives in disk-resident files read by hooks or subagents, not re-injected into the model thread each turn
- The task hierarchy is in a session-state file the model reads once per turn and writes append-only
- Tool calls route through
PreToolUsehooks, so many validations fire in shell scripts before the model ever sees the result - Context windows don’t fill at the same rate because the model is steering, not executing
The burn rate difference is structural. It holds across sessions because the architecture holds.
What this means for the constraint model
The original framing still holds: plan-quota is the binding constraint, not API spend. The architecture change doesn’t make quota irrelevant. It changes the slope of the consumption curve.
At 32% per session, you hit quota walls in 3 sessions per week under sustained use. That forces work-shaping around quota resets, which is its own kind of constraint.
At 2% per session, you can run more sessions, more agents, and more automation cycles within the same plan without hitting the wall. The orchestration architecture doesn’t eliminate the constraint. It moves the wall far enough back that it stops being a daily decision variable.
The implication for anyone designing a Claude Code workflow: the architectural question isn’t just about capability or reliability. It directly sets your burn rate. Runtime-mode users will hit quota friction much earlier than orchestration-mode users running equivalent workloads.
The measurement methodology
A browser-automation skill read the usage dashboard autonomously, with zero operator keystrokes. This matters because manual screenshots introduce selection bias: operators check usage when they’re worried, which skews the sample toward high-burn moments.
The automated read at sprint close captures the burn after the most production-dense period in the measurement window. It is the worst-case snapshot, not a cherry-picked low-burn moment. The numbers being this low at the worst-case snapshot is the finding.
The limitation: this is n=1 on a specific workload. The architecture produces different burn curves for different task shapes. I/O-heavy sessions (lots of file reads and writes, lots of tool calls) burn differently than reasoning-heavy sessions (long drafts, complex analysis). The data here reflects an agent workflow with high I/O and moderate reasoning load. It may not generalize to a pure writing or analysis workflow.
The corrective I am not making
I am not going back to update the 32% figure in the original post to 2%. They measure different things: the 32% measured runtime-mode behavior; the 2% measures orchestration-mode behavior. Both are real measurements of real behavior under their respective architectures.
What I am adding is this post as the empirical companion. The manifesto argued the architectural choice matters. The sprint data quantifies how much.
If you built your Claude Code usage model around the 32% estimate and you are running orchestration-mode architecture, you have more headroom than the estimate suggests. If you are running runtime-mode, the 32% estimate is closer to accurate for sustained production use.
The architecture is the variable. The quota number follows from it.
Sources & Further Reading
Further Reading
- I Built a Private MCP Server to Give Claude Memory Across Sessions. Here Is What Broke. I shipped a private MCP server bridging my knowledge base into claude.ai via OAuth 2.1: the architecture, two bugs the smoke test missed, and the isolation pattern.
- 10 Patterns Behind a 32% Claude Code Plan-Quota Burn 10 specific patterns that explain why my Claude Code plan-quota burn runs at a fraction of what r/claudecode operators report on similar workloads. Each pattern lists the multiplier and the named alternative you can run today.
- Claude Code Has 8 Hook Events. None of Them Can See the Agent's Output. I built hooks into a 38,240-line production harness. The gap nobody documents: no hook fires on the agent's output text. The harness is the guardrail.
What do you think?
I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.
Discuss on LinkedIn