Skip to main content

Claude Code Has 8 Hook Events. None of Them Can See the Agent's Output.

Chudi May 13, 2026 Updated Jun 9, 2026 6 min read

I built hooks into a 38,240-line production harness. The gap nobody documents: no hook fires on the agent's output text. The harness is the guardrail.

Why this matters

I built hooks into a 38,240-line production harness. The gap nobody documents: no hook fires on the agent's output text. The harness is the guardrail.

The official docs list eight hook events. Most Claude Code tutorials mention three of them. And none of the eight can inspect the text the agent is about to show you.

That gap is not a trivia point. It is where agent enforcement breaks down in production.

I found it while building the hook layer into a production workflow that runs against a 38,240-line Python codebase. You set up the hooks you know about. The harness feels solid. Then the agent does something you did not expect at a lifecycle point you had no gate on. The question you ask afterward is: was there a hook for that? For tool calls and turn boundaries, usually yes. For the agent’s own output text, no.

What the hook system actually covers

Claude Code’s hook system fires shell commands at specific points in the agent lifecycle. The events most builders work with are UserPromptSubmit, PreToolUse, PostToolUse, and Stop. The full documented set, as of June 2026, is eight events in three groups.

Session-level (fire around session lifecycle): SessionStart, SessionEnd, PreCompact. SessionStart is where you load context, inject environment state, or run a pre-flight check before the model does anything. PreCompact fires before context compaction, your last chance to persist state that would otherwise be summarized away. Most operators skip everything except SessionStart.

Turn-level (fire per turn): UserPromptSubmit fires before the model sees the prompt and can BLOCK, intercepting, transforming, or rejecting the turn entirely. Stop fires after the model has emitted its text. It can BLOCK stopping and force the agent to keep working, but it cannot scan or modify the text that was already emitted. SubagentStop is the same boundary for subagent completions.

Tool-level: PreToolUse can BLOCK the tool call, validate the input, or log it before it fires. PostToolUse sees the tool result before the agent processes it.

The enforcement model has one asymmetry that matters operationally: three hooks can BLOCK. UserPromptSubmit can stop a turn before it starts. PreToolUse can stop a tool call before it fires. Stop can refuse to stop and force the agent to retry. Nothing in this list is positioned to intercept what the model generates as output text before it reaches you.

The gap: there is no PreResponseEmit hook

This is an architectural constraint, not a configuration gap.

Once the model generates text, that text reaches you unintercepted. There is no PreResponseEmit hook. Every other enforcement gate sits at a turn boundary or a tool boundary. The output text boundary is open.

The failure mode this enables is the “looks fine” bug class. The agent generates a response. The response is internally coherent. No blocked tool was called. No hook fired. But the output contains a claim, a formatting error, or a factual error that none of the gates were positioned to catch.

I hit this in early May 2026. The agent claimed the UI was “formatted properly” across multiple turns. No hook fired because none of the hooks are positioned to inspect response text. The markdown table was rendering as a wall of pipe-delimited prose. The agent was not lying in any meaningful sense. It was generating plausible text about a state it could not verify through any of its tool outputs.

The fix is not a hook. It is a verification architecture: a PostToolUse hook that runs a separate check against tool outputs, combined with a Stop hook that forces a retry if verification conditions are unmet. Two-step enforcement where one step is unavailable, so you verify the preconditions instead of the output.

How this compares to other harnesses

The output gating gap is not unique to Claude Code. LangGraph has no native output hook. CrewAI has output guardrails, but at the crew level, not the turn level. The framework with the most comparable coverage is the Pi harness, which defines hooks at tool_call, tool_result, session_*, input, before_provider_request, and before_compact, putting its breadth in the same range as Claude Code’s documented set.

The one surface Claude Code does not have a native equivalent for: OpenAI Agents has OutputGuardrail, which blocks final responses before they reach the operator. That is the enforcement position Claude Code’s hook architecture cannot currently reach.

The comparison matters not as a ranking but as a map: where is your enforcement layer actually positioned, and what does the agent do in the gaps between the positions you can defend?

How this changes how I write hooks

The practical implication: anything I need to enforce on the output side has to be enforced on the input side first, or on the tool-call boundary.

If I want the agent to only cite verified statistics, the enforcement point is PreToolUse on the tool that would generate the claim, not a post-hoc output scan. If I want the agent to avoid a specific framing, the enforcement point is UserPromptSubmit injecting a constraint before the prompt reaches the model.

This is the design tension the official docs do not make explicit: hooks are a powerful enforcement layer with a precise shape. That shape is: turn entry, tool boundary, turn exit. Output text lives inside the turn, after the model fires, before the Stop hook. That space has no native gate.

Building a harness that takes this seriously means treating output verification as a separate architectural problem from hook configuration. The eight events are load-bearing. The gap between what they cover and what you might assume they cover is where production failures concentrate.

What PostToolUse is actually for

PostToolUse is underused relative to PreToolUse. Most builders use PreToolUse for access control, blocking specific tools from being called. PostToolUse is where you verify results before the model acts on them.

The agent reads a file. PostToolUse fires. A script checks the file content against expected schema. If it fails, the hook returns an error that the agent sees as the tool result. The agent then has correct information about the tool output state before generating its response.

That pattern closes part of the output gating gap indirectly. The agent cannot make a false claim about a tool result it has already been corrected on. It does not close the gap completely.

The part that remains is sycophancy and internal-consistency errors: cases where the model generates plausible text about something it cannot verify through tool outputs at all. That is a model behavior problem sitting above the hook layer. Hooks are not the solution to that class of failure.

The harness is the guardrail. The hooks are how you build the harness. Knowing the eight events, which three can BLOCK, and where the output text boundary sits is the foundation.

Sources & Further Reading

Further Reading

What do you think?

I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.

Discuss on LinkedIn