Coding agents didn't reduce your cognitive load — they relocated it. You're no longer writing boilerplate; you're evaluating it, forty times a day, and your judgment gets worse with every review.
The fix isn't "review more carefully." That scales linearly with agent output, and you'll always lose the race. The fix is a protocol: force the agent to surface its own trade-offs before you read a single line, then triage every output into one of three buckets before deep-reading anything. Here's the prompt and the table.
Why reviewing agent code is harder than writing it
Writing code is generative work. You hold a model in your head, you extend it, you get into a flow. Reviewing code is evaluation work — sustained, comparison-heavy, and cognitively heavier. One widely-upvoted HN comment put it cleanly: "meditative coding" has been replaced by constant high-intensity review. That's anecdotal, but every engineer using Cursor or Claude Code daily recognizes it.
Decision quality also degrades with decision frequency. Research on judges, nurses, and editors shows the same pattern: judgment on the twentieth decision in a row is worse than the first, regardless of experience. The transfer to code review is an inference, not a measured finding — but the mechanism is the same: sustained evaluation under volume.
Then there's the requirement-adherence problem. CMU research cited in Augment Code's analysis found 54% of participants said code generation tools often fail to meet specified requirements. Translation: you can't skim. Every review is a verification pass against intent — and the agent rarely tells you what its intent was. Architectural decisions get buried inside the diff. You're reverse-engineering on every pass, which is the worst possible condition for fatigued judgment.
"Review more carefully" doesn't scale. Agents produce faster than you can evaluate. You need to change the shape of the work, not the intensity.
Force the agent to do its own triage first
If the agent surfaces its trade-offs before you read the code, your job shifts from reconstruction to verification. Reconstruction is expensive — you're rebuilding the agent's reasoning from artifacts. Verification is cheap — you're checking claims against code.
Prepend this to any non-trivial coding task. It works as a task prefix in Cursor, in CLAUDE.md or inline for Claude Code, and as a system message in Codex.
Before writing any code, output a PREAMBLE block with these four sections.
The PREAMBLE must appear BEFORE any code block, not after.
PREAMBLE
--------
DECISIONS: The 2-4 concrete choices you made (data structure, library,
control flow, error model). One line each.
REJECTED: Alternatives you considered and discarded, with one-line reason.
TRADE-OFFS: What this implementation is worse at than the rejected paths.
ASSUMPTIONS: Inputs, environment, or requirements you assumed without
confirmation. Mark each with a "?" if it should be verified.
Then write the code. If the task is ambiguous, list the ambiguity in
ASSUMPTIONS rather than picking silently.Context files (CLAUDE.md, .cursorrules) are not a substitute. ETH research referenced in the Augment analysis showed they provide only ~4% improvement in task success, and LLM-generated context files can make things worse. They set defaults; they don't surface per-task reasoning.
The three-tier triage that front-loads your judgment budget
Read the preamble first. Assign a tier. Then decide whether to read the code.
| Tier | Trigger conditions | Action |
|---|---|---|
| Accept | Preamble is complete. Decisions match task intent. Assumptions match reality. Tests are present and pass. No "?" assumptions on critical paths. | Spot-check the diff. Approve. ~2 minutes. |
| Question | One specific gap: a load-bearing assumption you can't verify, a rejected alternative you'd have chosen, or a trade-off that conflicts with a non-functional requirement. | Ask one question. Wait. Decide with the answer. Do not read the code yet. |
| Reject | Preamble missing or compressed. Decisions diverge from requirements. Diff too large to scope (>~300 lines or >1 module). Multiple unverified critical assumptions. | Reject. Ask for smaller chunks or a corrected preamble. Move on. |
The triage pass happens on the preamble, not the code. That's the whole point — you're spending judgment on whether this output deserves a deep read, not on the read itself.
Two failure modes to design around. First, the Question tier becomes a stall loop if every review spawns a clarifying round-trip. Rule: one question per review block. Then decide. Second, triage breaks on large diffs — a 400-line agent output can't be scoped at the function level in one pass. The right answer is Reject and ask for smaller chunks, not "push harder through triage."
Structuring a session so fatigue doesn't accumulate
Most teams imported their pull-request review culture into agent workflows unchanged. That culture was designed for human-paced output — one PR per engineer per day or two. Agent output is one diff per fifteen minutes. The bottleneck moved; the process didn't.
A working session structure:
- Open a review block. Close Slack, close the second monitor. One agent output at a time.
- Read only the preamble. Assign Accept / Question / Reject from the table. Do not scroll to the code yet.
- Act on the tier. Accept → spot-check and approve. Question → write the one question, send, move on. Reject → reject with a one-line reason.
- Write the one-sentence architectural summary for anything you accept. If you can't, drop to Question.
- Hard stop after five outputs. Stand up. Get water. Do something non-evaluative for at least ten minutes.
- Resume or stop. If the next preamble feels like noise, you're done reviewing for the session. Switch to generative work or call it.
Five outputs per block is conservative. The point isn't the number — it's that you set the limit before you start, not when you notice you're scanning.
Waiting for AI-assisted review tooling to mature is a strategy for shipping undiscovered bugs today. The protocol works now, with the agents you already use.
Paste the preamble prompt into your Cursor or Claude Code setup today and run one review session against the triage table tomorrow morning. The goal isn't zero fatigue — it's spending your best judgment on Reject decisions, not reconstructing what the agent was trying to do.