I Published a Post Saying Claude Code Burns 32% of My Plan Per Session. Then I Measured It.

Two days before this measurement, I published a post arguing that plan-quota, not API tokens, is the binding constraint for Claude Code under subscription pricing. I estimated roughly 32% of the weekly plan per 8-hour session.

Then I ran /ui-autopilot against my usage dashboard at the close of a 2-day sprint and read the actual numbers.

The published estimate was wrong. In my favor.

The sprint context

The 2-day window covered: shipping two phases of a content sequence, deploying an MCP server to GA, running a browser-automation skill through 3 zero-hard-wall production runs, retrofitting a drafting pipeline on 3 prior posts, and multiple Vercel and Convex deploy cycles with GSC and IndexNow re-submits.

That is a representative production workload. Not a toy session. Not a demo. The agent ran continuously across two days on real shipped work.

The actual numbers (measured 2026-05-19)

Metric	Measured value	Reset window remaining
Current 5-hour session burn	2% used	4h 43min
Weekly all-models burn	20% used	16h 23min
Weekly Sonnet-only burn	0% used	16h 23min
Extra usage overage spend	$0.00 of $120 monthly cap	Jun 1
Plan tier	Max (20x)	n/a

The 5-hour session burned 2%. The weekly burn after a 2-day production sprint was 20%.

Why this refutes the 32% claim

The manifesto post arrived at ~32% per session through a different measurement window. That measurement was taken during a period when I was using Claude Code closer to a runtime: the conversation was the work, long threads, repeated context reconstruction, fewer subagents and hooks absorbing the tool calls.

The 2-day sprint that produced the 2% session burn used the orchestration architecture: subagents run in background tasks, hooks intercept tool calls before they reach the model, disk-resident state means the model doesn’t rebuild context from scratch each session, a retrieval layer answers specialty questions without the model doing open-ended search.

The difference between 32% and 2% per session is not a measurement error. It is the difference between two architectures.

What the orchestration pattern actually does to token consumption

When Claude Code runs as a runtime, each session requires the model to:

Re-read project state from the conversation or from re-injected context
Hold the task hierarchy in the thread
Perform tool calls inline, where each call costs tokens in both input and output
Regenerate context summaries when the window fills

When Claude Code runs as an orchestration layer, most of that work moves out of the model’s context:

State lives in disk-resident files read by hooks or subagents, not re-injected into the model thread each turn
The task hierarchy is in a session-state file the model reads once per turn and writes append-only
Tool calls route through PreToolUse hooks, so many validations fire in shell scripts before the model ever sees the result
Context windows don’t fill at the same rate because the model is steering, not executing

The burn rate difference is structural. It holds across sessions because the architecture holds.

What this means for the constraint model

The original framing still holds: plan-quota is the binding constraint, not API spend. The architecture change doesn’t make quota irrelevant. It changes the slope of the consumption curve.

At 32% per session, you hit quota walls in 3 sessions per week under sustained use. That forces work-shaping around quota resets, which is its own kind of constraint.

At 2% per session, you can run more sessions, more agents, and more automation cycles within the same plan without hitting the wall. The orchestration architecture doesn’t eliminate the constraint. It moves the wall far enough back that it stops being a daily decision variable.

The implication for anyone designing a Claude Code workflow: the architectural question isn’t just about capability or reliability. It directly sets your burn rate. Runtime-mode users will hit quota friction much earlier than orchestration-mode users running equivalent workloads.

The measurement methodology

A browser-automation skill read the usage dashboard autonomously, with zero operator keystrokes. This matters because manual screenshots introduce selection bias: operators check usage when they’re worried, which skews the sample toward high-burn moments.

The automated read at sprint close captures the burn after the most production-dense period in the measurement window. It is the worst-case snapshot, not a cherry-picked low-burn moment. The numbers being this low at the worst-case snapshot is the finding.

The limitation: this is n=1 on a specific workload. The architecture produces different burn curves for different task shapes. I/O-heavy sessions (lots of file reads and writes, lots of tool calls) burn differently than reasoning-heavy sessions (long drafts, complex analysis). The data here reflects an agent workflow with high I/O and moderate reasoning load. It may not generalize to a pure writing or analysis workflow.

The corrective I am not making

I am not going back to update the 32% figure in the original post to 2%. They measure different things: the 32% measured runtime-mode behavior; the 2% measures orchestration-mode behavior. Both are real measurements of real behavior under their respective architectures.

What I am adding is this post as the empirical companion. The manifesto argued the architectural choice matters. The sprint data quantifies how much.

If you built your Claude Code usage model around the 32% estimate and you are running orchestration-mode architecture, you have more headroom than the estimate suggests. If you are running runtime-mode, the 32% estimate is closer to accurate for sustained production use.

The architecture is the variable. The quota number follows from it.

I Published a Post Saying Claude Code Burns 32% of My Plan Per Session. Then I Measured It.

Why this matters

The sprint context

The actual numbers (measured 2026-05-19)

Why this refutes the 32% claim

What the orchestration pattern actually does to token consumption

What this means for the constraint model

The measurement methodology

The corrective I am not making

Sources & Further Reading

Further Reading

What do you think?

I Built a Private MCP Server to Give Claude Memory Across Sessions. Here Is What Broke.

10 Patterns Behind a 32% Claude Code Plan-Quota Burn

Claude Code Has 8 Hook Events. None of Them Can See the Agent's Output.