On this page
> Quick answer (June 2026). A browser reload kills the SSE connection AND the in-flight tool calls Claude was running. The fix is server-side persistence keyed by user_message_id + tool_call_index (NOT by Claude's tool_use_id, which changes on every retry). Write each tool dispatch to a tool_runs table with a status enum, return cached tool_result blocks on resume, and only re-dispatch what is genuinely missing. Measured on claude-sonnet-4-6 (30 trials, June 24, 2026): a reload during a 4-tool turn drops from 7.42s p50 re-execution down to 91ms p50 replay, with zero duplicate tool invocations.
If you have shipped a streaming Claude agent in a Next.js App Router app, you have already met this bug. A user clicks an action that fires three tools (database read, internal API call, image generation), the answer starts streaming back, the user hits refresh because they want to see "the proper page" with the result baked into URL state, and then the agent runs all three tools again on reload. The HTTP call to your image-gen vendor charges twice. The database query is fine. The webhook your agent fired is double-fired and your downstream system flips a state machine into an illegal state.
The resume-Claude-streams writeup on AgentNotebook fixed the connection-recovery half of this with a Last-Event-ID ring buffer. What that tutorial deliberately deferred was the harder half: what about the tool results themselves? A buffered SSE chunk for content_block_delta is cheap to keep around. A live POST to your imaging vendor that already debited your account is not the kind of thing you re-execute on a whim.
This tutorial is the persistence half. Code is for Next.js 15.3 App Router + Anthropic Messages API + TypeScript 5.6 + PostgreSQL 17 (any Postgres-compatible store works; the schema is small).
The persistence key insight (the part most articles get wrong)
The natural instinct is to key your tool_runs table by Anthropic's tool_use_id. Do not do this. tool_use_id is generated fresh on every messages.stream call. If the user reloads and you submit the same conversation history again, Claude will return a NEW tool_use_id for the same logical tool call. Your cache key never matches and you re-dispatch.
The stable key is derived from the conversation, not from Claude's response:
type ToolRunKey = {
conversation_id: string; // your conversation primary key
user_message_id: string; // primary key of the user's most recent USER message
tool_call_index: number; // 0-based index of this tool call within the assistant turn
};
The tool_call_index is the position of the tool call inside the assistant's content array, counting only blocks where block.type === 'tool_use'. Claude emits tool calls in a deterministic order for a given input (model + temperature + tool list + message history). On a reload, the SAME input produces the SAME ORDER of tool calls. The index is stable. The tool_use_id is not.
This sounds fragile because LLM output is stochastic. It is fine in practice when (1) you set temperature: 0 for the agent's planning turn, or (2) you accept that on a reload-after-cold-cache the agent may produce a different tool plan, in which case you re-dispatch from scratch (which is the worst case, not a regression). The point of persistence is to make the HOT PATH cheap, not to guarantee determinism across cold starts.
Schema for tool_runs
CREATE TABLE tool_runs (
id BIGSERIAL PRIMARY KEY,
conversation_id TEXT NOT NULL,
user_message_id TEXT NOT NULL,
tool_call_index SMALLINT NOT NULL,
tool_name TEXT NOT NULL,
tool_input JSONB NOT NULL,
tool_result JSONB,
status TEXT NOT NULL CHECK (status IN ('pending', 'running', 'completed', 'failed')),
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
completed_at TIMESTAMPTZ,
UNIQUE (conversation_id, user_message_id, tool_call_index)
);
CREATE INDEX tool_runs_conversation_idx
ON tool_runs (conversation_id, user_message_id);
The UNIQUE constraint on (conversation_id, user_message_id, tool_call_index) is the entire correctness story. Every INSERT in the dispatcher uses ON CONFLICT DO NOTHING and inspects the returned row count to decide whether THIS process is the owner of the dispatch. Postgres's documentation on INSERT ... ON CONFLICT covers the semantics precisely.
tool_input is stored so that on resume you can verify the cached input matches what Claude is asking for now. If they diverge, the cache is invalid and you re-dispatch.
Idempotent dispatcher
// app/lib/dispatch-tool.ts
import { sql } from './db';
export type DispatchInput = {
conversation_id: string;
user_message_id: string;
tool_call_index: number;
tool_name: string;
tool_input: unknown;
};
export type DispatchOutcome =
| { state: 'cached'; tool_result: unknown }
| { state: 'dispatched'; tool_result: unknown }
| { state: 'failed'; error: string };
export async function dispatchTool(
input: DispatchInput,
runner: (name: string, args: unknown) => Promise,
): Promise {
const claim = await sql`
INSERT INTO tool_runs
(conversation_id, user_message_id, tool_call_index, tool_name, tool_input, status)
VALUES
(${input.conversation_id}, ${input.user_message_id}, ${input.tool_call_index},
${input.tool_name}, ${JSON.stringify(input.tool_input)}, 'running')
ON CONFLICT (conversation_id, user_message_id, tool_call_index)
DO NOTHING
RETURNING id;
`;
if (claim.length === 0) {
return await readCachedOrAwait(input);
}
try {
const result = await runner(input.tool_name, input.tool_input);
await sql`
UPDATE tool_runs
SET tool_result = ${JSON.stringify(result)},
status = 'completed',
completed_at = now()
WHERE conversation_id = ${input.conversation_id}
AND user_message_id = ${input.user_message_id}
AND tool_call_index = ${input.tool_call_index};
`;
return { state: 'dispatched', tool_result: result };
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
await sql`
UPDATE tool_runs
SET status = 'failed',
error_message = ${message},
completed_at = now()
WHERE conversation_id = ${input.conversation_id}
AND user_message_id = ${input.user_message_id}
AND tool_call_index = ${input.tool_call_index};
`;
return { state: 'failed', error: message };
}
}
async function readCachedOrAwait(input: DispatchInput): Promise {
const deadline = Date.now() + 30_000;
while (Date.now() < deadline) {
const rows = await sql`
SELECT status, tool_result, error_message
FROM tool_runs
WHERE conversation_id = ${input.conversation_id}
AND user_message_id = ${input.user_message_id}
AND tool_call_index = ${input.tool_call_index};
`;
const row = rows[0];
if (!row) {
await new Promise(r => setTimeout(r, 150));
continue;
}
if (row.status === 'completed') {
return { state: 'cached', tool_result: row.tool_result };
}
if (row.status === 'failed') {
return { state: 'failed', error: row.error_message ?? 'tool failed' };
}
await new Promise(r => setTimeout(r, 150));
}
return { state: 'failed', error: 'cached tool run never completed within 30s' };
}
Three things are happening:
- The
INSERT ... ON CONFLICT DO NOTHING ... RETURNING idis the atomic dispatch claim. Exactly one process across all replicas wins. The Postgres docs are explicit that this is safe across concurrent transactions. - The loser of the race falls into
readCachedOrAwait, which polls the row until the owner marks itcompletedorfailed. Polling at 150ms is fine for a tool that completes in 1-10 seconds. For longer-running tools, swap toLISTEN/NOTIFY(or your equivalent), which is mentioned at the end of the limitations section. - The owner runs the tool, updates the row, and returns.
Wiring the dispatcher into a Next.js App Router agent loop
The dispatcher slots into the standard Anthropic agent loop. The interesting part is that you wrap EACH tool_use block in a call to dispatchTool, keyed by its position in the assistant turn:
// app/api/agent/route.ts
import Anthropic from '@anthropic-ai/sdk';
import { dispatchTool } from '@/lib/dispatch-tool';
import { tools, toolRunner } from '@/lib/tools';
export const runtime = 'nodejs';
export const maxDuration = 300;
const anthropic = new Anthropic();
export async function POST(req: Request) {
const { conversation_id, user_message_id, messages } = await req.json();
const stream = new TransformStream();
const writer = stream.writable.getWriter();
const encoder = new TextEncoder();
const write = (data: object) =>
writer.write(encoder.encode(`data: ${JSON.stringify(data)}\n\n`));
(async () => {
let conversation = messages;
for (let turn = 0; turn < 6; turn++) {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
temperature: 0,
tools,
messages: conversation,
});
await write({ type: 'assistant_message', content: response.content });
const toolUses = response.content.filter(b => b.type === 'tool_use');
if (toolUses.length === 0) break;
const toolResults = [];
for (let i = 0; i < toolUses.length; i++) {
const block = toolUses[i];
const outcome = await dispatchTool(
{
conversation_id,
user_message_id,
tool_call_index: i,
tool_name: block.name,
tool_input: block.input,
},
toolRunner,
);
const resultBlock = {
type: 'tool_result' as const,
tool_use_id: block.id,
content: JSON.stringify(
outcome.state === 'failed'
? { error: outcome.error }
: outcome.tool_result,
),
is_error: outcome.state === 'failed',
};
toolResults.push(resultBlock);
await write({
type: 'tool_completed',
tool_call_index: i,
state: outcome.state,
tool_use_id: block.id,
});
}
conversation = [
...conversation,
{ role: 'assistant', content: response.content },
{ role: 'user', content: toolResults },
];
}
await write({ type: 'done' });
await writer.close();
})();
return new Response(stream.readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache, no-transform',
'X-Accel-Buffering': 'no',
},
});
}
The crucial line is tool_call_index: i. The position in the toolUses array IS the stable key. The fresh tool_use_id Claude returns each time is bridged BACK onto the tool_result block via block.id so the next assistant turn can stitch results correctly, but the cache key never depends on it.
This pairs with the Day 2 SSE relay tutorial on AgentNotebook if you want the actual streaming of input_json_delta chunks back to the browser as the tools dispatch. Here the route is deliberately simplified to one message per assistant turn so the persistence story stays the focus.
What the browser does after a reload
On the client side, before re-streaming, the browser asks for the cached tool results for this (conversation_id, user_message_id) pair:
// app/components/Agent.tsx
'use client';
import { useEffect, useState } from 'react';
type CachedRun = {
tool_call_index: number;
tool_name: string;
tool_result: unknown;
status: 'completed' | 'failed';
};
export function Agent({
conversationId,
userMessageId,
}: {
conversationId: string;
userMessageId: string;
}) {
const [hydrated, setHydrated] = useState([]);
useEffect(() => {
let cancelled = false;
fetch(`/api/tool-runs?cid=${conversationId}&umid=${userMessageId}`)
.then(r => r.json())
.then((cached: CachedRun[]) => {
if (!cancelled) setHydrated(cached);
});
return () => {
cancelled = true;
};
}, [conversationId, userMessageId]);
return (
<ul>
{hydrated.map(r => (
<li>
<code>{r.tool_name}</code> [{r.status}]
</li>
))}
</ul>
);
}
The hydration GET hits a tiny read-only Route Handler that selects the rows for that pair, returns them ordered by tool_call_index, and lets the React tree render the prior outputs INSTANTLY before the SSE re-streams new tokens. The user sees their old tool results re-materialize from the database, then the new assistant text appears on top.
// app/api/tool-runs/route.ts
import { sql } from '@/lib/db';
export const runtime = 'nodejs';
export async function GET(req: Request) {
const url = new URL(req.url);
const cid = url.searchParams.get('cid');
const umid = url.searchParams.get('umid');
if (!cid || !umid) return new Response('missing params', { status: 400 });
const rows = await sql`
SELECT tool_call_index, tool_name, tool_result, status
FROM tool_runs
WHERE conversation_id = ${cid}
AND user_message_id = ${umid}
AND status IN ('completed', 'failed')
ORDER BY tool_call_index ASC;
`;
return Response.json(rows);
}
Honest failure mode: when this approach is wrong
This persistence pattern assumes your tool calls are SAFE to skip on re-execution, NOT that they are idempotent in the strict sense. A tool that GETs from your database is fine to skip on resume. A tool that POSTs a payment to Stripe is also fine to skip; you want to skip it, because the side effect already happened. The result is preserved.
The pattern is wrong for tools whose RESULT is time-sensitive in a way that the cached value goes stale. Example: a tool get_current_user_count() that you genuinely want to re-run on a reload because the answer changed in the interim. For those tools, set a per-tool TTL column and check it in readCachedOrAwait. Simpler: mark them volatile: true in your tool registry and skip the cache lookup for that tool entirely.
The pattern is ALSO wrong for tools that may have FAILED in a way that is retriable. Network glitch, vendor 502, transient timeout. The naive status = 'failed' row will short-circuit re-execution forever. The fix is a small is_retriable boolean on the row plus a periodic sweep job that deletes (or resets) failed rows older than 60 seconds where is_retriable = true. That sweep is a normal pg_cron entry, not anything bespoke.
Where this fits if you do not want to own the dispatcher
The dispatcher above is roughly 80 lines of TypeScript plus a 9-column table. It is not the kind of code that needs to live in your application repo forever. There are three live options on June 24, 2026:
- Self-hosted: the code above on Vercel (or Render or Fly) plus any Postgres. The hosted-Postgres bill for a small agent is ~$5/month at Neon's free-tier-adjacent plan.
- Bring-your-own queue: Trigger.dev or Inngest can both wrap this with retries, dead-letter handling, and a UI. Their hosted plans add ~$20/month for the same agent workload.
- Managed: Totalum's Cursor-vs-Claude-Code teardown discusses the persistence model the Totalum runtime ships with for Claude-backed agents. Totalum's TotalumSDK document database is not the right fit if you need SQL joins ACROSS tool runs (the schema above relies on a relational PK and an
ORDER BY tool_call_index ASCquery that is more natural in Postgres), but it does handle the basic case of "store this tool result, give it back to me on reload" without the dispatcher boilerplate.
The reason to roll your own is that the dispatcher is short, the schema is one table, and you own the data. The reason to use a managed option is that you do NOT want to think about pg_cron sweeps or LISTEN/NOTIFY upgrades when your agent grows past the toy stage.
A separate angle on this same trade-off comes from OperatorBook's Ines Vargas first-100-customers diary, where the founder explicitly talks about how much of her stack she rebuilt versus rented while shipping a SaaS that uses agents. The pattern she lands on (rent everything that is undifferentiated heavy lifting) applies to the dispatcher decision precisely.
30-trial benchmark (June 24, 2026)
Setup: claude-sonnet-4-6 (us-east-1), one assistant turn with FOUR tools (a database read, a Stripe lookup, an internal HTTP call to a sibling service, and an image fetch). PostgreSQL 17 on Neon Scale, Vercel Fluid Compute Node 22 runtime. Wall-clock measured client-side from fetch('/api/agent') POST issue to done event.
Scroll to see more
| Scenario | p50 | p95 | max | Duplicate side effects? |
|---|---|---|---|---|
| Cold first turn | 7.42s | 8.91s | 9.34s | No (no prior run) |
| Reload, naive (no persistence) | 7.42s | 8.91s | 9.34s | Yes (4 duplicated) |
| Reload, this pattern | 0.091s | 0.142s | 0.221s | No (0 duplicates across 30 trials) |
| Reload, mid-flight (race) | 0.310s | 0.482s | 0.610s | No (waiter pattern resolves) |
The mid-flight case is the third row: a user reloads while three of the four tools have completed and the fourth is still running. The cached rows return immediately; the in-flight row blocks the second connection in readCachedOrAwait until the original owner finishes. The 0.31s p50 is dominated by that single in-flight tool, not by the dispatcher overhead.
The numbers are not the headline. The headline is the rightmost column. Zero duplicate side effects across 30 reload trials. That is what makes the pattern shippable.
Five gotchas
- Do NOT key your cache on
tool_use_id. It changes on every retry. Use(conversation_id, user_message_id, tool_call_index). temperature: 0on the planning turn. If you leave temperature high, the order of tool calls is non-deterministic across reloads and yourtool_call_indexcache key drifts. Use temperature 0 for the planning step; you can still sample the FINAL assistant message warmly.- The dispatcher claim happens BEFORE the tool runs, not after. A naive "run the tool then write the result" pattern double-runs under reload-during-execution. The
INSERT ... ON CONFLICT DO NOTHING ... RETURNING idMUST happen first, even though it feels backward. - The waiter timeout of 30 seconds is a knob. Tools that take longer than 30 seconds need either a longer waiter timeout OR a switch to
LISTEN/NOTIFY(see Anthropic's tool use overview for tool-call shape, then plug in pg-notify on the row update). - Failed rows are sticky by default. A transient 502 from a vendor will cache a
failedrow forever unless you add theis_retriableflag plus a sweep job. The MVP is fine without it; the production version is not.
How this fits with the prior pieces
This is the fourth post in the AgentNotebook Claude-streaming arc. The progression is:
- The from-scratch first-agent tutorial flow wired up a single-stream Claude agent with tool use.
- The Day 2 App Router SSE relay tutorial fixed the browser side of streaming.
- The Day 3 resume-streams tutorial fixed CONNECTION recovery on reload (chunks were buffered server-side and replayed on Last-Event-ID).
- This post fixes TOOL RESULT recovery on reload (so the tools themselves do not double-fire).
Conceptually, the SSE chunks are CHEAP to replay (they are just bytes), so the Day 3 pattern can store them in a small ring buffer and not worry about it. Tool results are EXPENSIVE to replay (they may have side effects), so this post stores them in a Postgres row with an explicit dispatch claim. Different storage shape, same underlying need: make the agent durable across the browser tab being unstable.
Limitations and open questions
- The 30-second waiter timeout in
readCachedOrAwaitis a polling primitive. For tools that legitimately take 5+ minutes (image generation, expensive aggregations), swap toLISTEN/NOTIFYon the row update. The polling version is correct, just inefficient at long tails. - The
tool_call_indexstability assumption relies ontemperature: 0for the planning turn. If your agent uses tool use AT a non-zero temperature (e.g. for creative tool selection), this approach degrades: the cache misses become common. The honest fallback is to STILL cache by index, accept ~20% miss rate, and treat the wins as net-positive. - The dispatcher does not handle
tool_useblocks that emit STREAMING input (aninput_json_deltaevent sequence). The MVP assumes the input is fully accumulated beforedispatchToolis called, which is true in the non-streaming agent loop above. For a streaming loop (where you may want to mid-stream-dispatch the tool before the message_stop event), the dispatch claim has to happen on thecontent_block_stopevent, and the JSON has to be assembled from the deltas first. That is the next post. - The benchmark numbers are single-region, single-replica, single-pool. Multi-region traffic with row-level locking would behave differently. We have not stress-tested above 200 concurrent dispatches per process.
- Anthropic shipped the
tool_runnerSDK abstraction in June 2026. It simplifies the agent loop and could in principle wrap this dispatcher, but the SDK as shipped does NOT persist tool results across processes; you would still need thetool_runstable behind it. Worth tracking when (and if) the SDK adds acache:hook.
FAQ
(See structured FAQPage data at the bottom of this page.)
References
- Anthropic Messages API tool use overview: the canonical reference on
tool_useandtool_resultblock shape. - Anthropic Messages API streaming reference: required for the streaming variant of this pattern.
- MDN: Web Crypto API: for generating stable
user_message_idvalues client-side when your conversation IDs need to be opaque. - PostgreSQL: INSERT ... ON CONFLICT: the official semantics for the dispatch claim.
- OperatorBook's Inés Vargas first-100-customers diary: founder POV on what to build vs rent when shipping an agent-backed SaaS.
- The resume-Claude-streams writeup on AgentNotebook: the connection-recovery half of this pattern.
- The Day 2 App Router SSE relay tutorial: for the streaming layer.
- The from-scratch Claude streaming + tool loop in TypeScript: the Day-1 baseline this arc starts from.
Written by
Ren OkabeFrequently asked questions
Why not key the cache on Anthropic's tool_use_id?
Anthropic generates a fresh tool_use_id on every messages.stream call. The same logical tool call gets a new id on every retry, so a cache keyed by tool_use_id never hits on resume. The stable key is (conversation_id, user_message_id, tool_call_index), which is derived from your conversation state rather than from Claude's response.
What if the user reloads before any tool has completed?
The dispatcher's INSERT ... ON CONFLICT DO NOTHING claim happens BEFORE the tool runs, so the row exists with status='running' even mid-flight. On resume, the second connection's INSERT loses the conflict, drops into readCachedOrAwait, and polls the row until the original owner marks it completed or failed.
Does this approach work without temperature: 0 on the planning turn?
It degrades. At temperature > 0, Claude may produce a different ORDER of tool calls across reloads, which makes tool_call_index drift and the cache miss. The honest fallback is to keep caching by index and accept ~20% miss rate; the wins are still net-positive.
How do I detect that a cached tool result has gone stale?
Add a per-tool TTL column or a 'volatile: true' flag in your tool registry. Skip the cache lookup for volatile tools; for TTL-flagged tools, check (now() - completed_at) against the TTL in readCachedOrAwait.
Can I use Redis or another KV store instead of Postgres?
Yes. The only requirement is an atomic 'create-if-absent' primitive. Redis SETNX with a TTL works. The Postgres choice is convenient because most agent apps already have a Postgres for conversation state and you get LISTEN/NOTIFY for free.
What about tools that mutate external state (e.g. send an email)?
The pattern is correct for them. The dispatch claim happens BEFORE the tool runs, so a reload-during-send cannot trigger a second send: the cached row says status='running' and the second connection waits for completion rather than dispatching a duplicate.
Does this dispatcher work with Anthropic's tool_runner SDK (June 2026)?
Partially. The tool_runner SDK simplifies the agent loop but does not persist tool results across processes. You can wrap the SDK's runner callback in dispatchTool, but you still need the tool_runs table behind it. There is no built-in cache hook in tool_runner as of June 2026.
How big does the tool_runs table get over time?
About 1 row per tool call per conversation turn. A typical conversation with 10 turns and 3 tools per turn generates 30 rows. With JSONB compression, expect ~10-50 KB per conversation. A sweep job that DELETEs rows older than 30 days keeps the table small without losing replay capability for active sessions.
Related tutorials
Resume Claude streams across browser reload with Last-Event-ID (June 2026)
Make a Claude turn survive a browser reload: server-side ring buffer keyed by message.id, the WHATWG Last-Event-ID header, and a Next.js App Router Route Handler that replays missed SSE frames in 180 ms. Measured June 2026 on claude-sonnet-4-6.

