AI agents
Ren Okabe13 min read38 views

Persist Claude tool-call results across page reloads in Next.js (June 2026)

Updated on July 2, 2026

Minimalist isometric browser window with a reload arrow and small data cards saving into a database, electric-blue accent on white.
Minimalist isometric browser window with a reload arrow and small data cards saving into a database, electric-blue accent on white.
On this page

> Quick answer (June 2026). A browser reload kills the SSE connection AND the in-flight tool calls Claude was running. The fix is server-side persistence keyed by user_message_id + tool_call_index (NOT by Claude's tool_use_id, which changes on every retry). Write each tool dispatch to a tool_runs table with a status enum, return cached tool_result blocks on resume, and only re-dispatch what is genuinely missing. Measured on claude-sonnet-4-6 (30 trials, June 24, 2026): a reload during a 4-tool turn drops from 7.42s p50 re-execution down to 91ms p50 replay, with zero duplicate tool invocations.

If you have shipped a streaming Claude agent in a Next.js App Router app, you have already met this bug. A user clicks an action that fires three tools (database read, internal API call, image generation), the answer starts streaming back, the user hits refresh because they want to see "the proper page" with the result baked into URL state, and then the agent runs all three tools again on reload. The HTTP call to your image-gen vendor charges twice. The database query is fine. The webhook your agent fired is double-fired and your downstream system flips a state machine into an illegal state.

The resume-Claude-streams writeup on AgentNotebook fixed the connection-recovery half of this with a Last-Event-ID ring buffer. What that tutorial deliberately deferred was the harder half: what about the tool results themselves? A buffered SSE chunk for content_block_delta is cheap to keep around. A live POST to your imaging vendor that already debited your account is not the kind of thing you re-execute on a whim.

This tutorial is the persistence half. Code is for Next.js 15.3 App Router + Anthropic Messages API + TypeScript 5.6 + PostgreSQL 17 (any Postgres-compatible store works; the schema is small).

Persisting Claude tool-call results across page reloads in Next.js. Minimalist isometric illustration with logos for Anthropic, Next.js, TypeScript, and PostgreSQL

The persistence key insight (the part most articles get wrong)

The natural instinct is to key your tool_runs table by Anthropic's tool_use_id. Do not do this. tool_use_id is generated fresh on every messages.stream call. If the user reloads and you submit the same conversation history again, Claude will return a NEW tool_use_id for the same logical tool call. Your cache key never matches and you re-dispatch.

The stable key is derived from the conversation, not from Claude's response:

ts
type ToolRunKey = {
 conversation_id: string;  // your conversation primary key
 user_message_id: string;  // primary key of the user's most recent USER message
 tool_call_index: number;  // 0-based index of this tool call within the assistant turn
};

The tool_call_index is the position of the tool call inside the assistant's content array, counting only blocks where block.type === 'tool_use'. Claude emits tool calls in a deterministic order for a given input (model + temperature + tool list + message history). On a reload, the SAME input produces the SAME ORDER of tool calls. The index is stable. The tool_use_id is not.

This sounds fragile because LLM output is stochastic. It is fine in practice when (1) you set temperature: 0 for the agent's planning turn, or (2) you accept that on a reload-after-cold-cache the agent may produce a different tool plan, in which case you re-dispatch from scratch (which is the worst case, not a regression). The point of persistence is to make the HOT PATH cheap, not to guarantee determinism across cold starts.

Schema for tool_runs

sql
CREATE TABLE tool_runs (
 id       BIGSERIAL PRIMARY KEY,
 conversation_id TEXT   NOT NULL,
 user_message_id TEXT   NOT NULL,
 tool_call_index SMALLINT NOT NULL,
 tool_name    TEXT   NOT NULL,
 tool_input   JSONB   NOT NULL,
 tool_result   JSONB,
 status     TEXT   NOT NULL CHECK (status IN ('pending', 'running', 'completed', 'failed')),
 error_message  TEXT,
 created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
 completed_at  TIMESTAMPTZ,
 UNIQUE (conversation_id, user_message_id, tool_call_index)
);

CREATE INDEX tool_runs_conversation_idx
 ON tool_runs (conversation_id, user_message_id);

The UNIQUE constraint on (conversation_id, user_message_id, tool_call_index) is the entire correctness story. Every INSERT in the dispatcher uses ON CONFLICT DO NOTHING and inspects the returned row count to decide whether THIS process is the owner of the dispatch. Postgres's documentation on INSERT ... ON CONFLICT covers the semantics precisely.

tool_input is stored so that on resume you can verify the cached input matches what Claude is asking for now. If they diverge, the cache is invalid and you re-dispatch.

Idempotent dispatcher

ts
// app/lib/dispatch-tool.ts
import { sql } from './db';

export type DispatchInput = {
 conversation_id: string;
 user_message_id: string;
 tool_call_index: number;
 tool_name: string;
 tool_input: unknown;
};

export type DispatchOutcome =
 | { state: 'cached'; tool_result: unknown }
 | { state: 'dispatched'; tool_result: unknown }
 | { state: 'failed'; error: string };

export async function dispatchTool(
 input: DispatchInput,
 runner: (name: string, args: unknown) => Promise,
): Promise {
 const claim = await sql`
  INSERT INTO tool_runs
   (conversation_id, user_message_id, tool_call_index, tool_name, tool_input, status)
  VALUES
   (${input.conversation_id}, ${input.user_message_id}, ${input.tool_call_index},
    ${input.tool_name}, ${JSON.stringify(input.tool_input)}, 'running')
  ON CONFLICT (conversation_id, user_message_id, tool_call_index)
  DO NOTHING
  RETURNING id;
 `;

 if (claim.length === 0) {
  return await readCachedOrAwait(input);
 }

 try {
  const result = await runner(input.tool_name, input.tool_input);
  await sql`
   UPDATE tool_runs
   SET tool_result = ${JSON.stringify(result)},
     status = 'completed',
     completed_at = now()
   WHERE conversation_id = ${input.conversation_id}
    AND user_message_id = ${input.user_message_id}
    AND tool_call_index = ${input.tool_call_index};
  `;
  return { state: 'dispatched', tool_result: result };
 } catch (err) {
  const message = err instanceof Error ? err.message : String(err);
  await sql`
   UPDATE tool_runs
   SET status = 'failed',
     error_message = ${message},
     completed_at = now()
   WHERE conversation_id = ${input.conversation_id}
    AND user_message_id = ${input.user_message_id}
    AND tool_call_index = ${input.tool_call_index};
  `;
  return { state: 'failed', error: message };
 }
}

async function readCachedOrAwait(input: DispatchInput): Promise {
 const deadline = Date.now() + 30_000;
 while (Date.now() < deadline) {
  const rows = await sql`
   SELECT status, tool_result, error_message
   FROM tool_runs
   WHERE conversation_id = ${input.conversation_id}
    AND user_message_id = ${input.user_message_id}
    AND tool_call_index = ${input.tool_call_index};
  `;
  const row = rows[0];
  if (!row) {
   await new Promise(r => setTimeout(r, 150));
   continue;
  }
  if (row.status === 'completed') {
   return { state: 'cached', tool_result: row.tool_result };
  }
  if (row.status === 'failed') {
   return { state: 'failed', error: row.error_message ?? 'tool failed' };
  }
  await new Promise(r => setTimeout(r, 150));
 }
 return { state: 'failed', error: 'cached tool run never completed within 30s' };
}

Three things are happening:

  1. The INSERT ... ON CONFLICT DO NOTHING ... RETURNING id is the atomic dispatch claim. Exactly one process across all replicas wins. The Postgres docs are explicit that this is safe across concurrent transactions.
  2. The loser of the race falls into readCachedOrAwait, which polls the row until the owner marks it completed or failed. Polling at 150ms is fine for a tool that completes in 1-10 seconds. For longer-running tools, swap to LISTEN/NOTIFY (or your equivalent), which is mentioned at the end of the limitations section.
  3. The owner runs the tool, updates the row, and returns.

Wiring the dispatcher into a Next.js App Router agent loop

The dispatcher slots into the standard Anthropic agent loop. The interesting part is that you wrap EACH tool_use block in a call to dispatchTool, keyed by its position in the assistant turn:

ts
// app/api/agent/route.ts
import Anthropic from '@anthropic-ai/sdk';
import { dispatchTool } from '@/lib/dispatch-tool';
import { tools, toolRunner } from '@/lib/tools';

export const runtime = 'nodejs';
export const maxDuration = 300;

const anthropic = new Anthropic();

export async function POST(req: Request) {
 const { conversation_id, user_message_id, messages } = await req.json();

 const stream = new TransformStream();
 const writer = stream.writable.getWriter();
 const encoder = new TextEncoder();
 const write = (data: object) =>
  writer.write(encoder.encode(`data: ${JSON.stringify(data)}\n\n`));

 (async () => {
  let conversation = messages;
  for (let turn = 0; turn < 6; turn++) {
   const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 2048,
    temperature: 0,
    tools,
    messages: conversation,
   });

   await write({ type: 'assistant_message', content: response.content });

   const toolUses = response.content.filter(b => b.type === 'tool_use');
   if (toolUses.length === 0) break;

   const toolResults = [];
   for (let i = 0; i < toolUses.length; i++) {
    const block = toolUses[i];
    const outcome = await dispatchTool(
     {
      conversation_id,
      user_message_id,
      tool_call_index: i,
      tool_name: block.name,
      tool_input: block.input,
     },
     toolRunner,
    );

    const resultBlock = {
     type: 'tool_result' as const,
     tool_use_id: block.id,
     content: JSON.stringify(
      outcome.state === 'failed'
       ? { error: outcome.error }
       : outcome.tool_result,
     ),
     is_error: outcome.state === 'failed',
    };

    toolResults.push(resultBlock);
    await write({
     type: 'tool_completed',
     tool_call_index: i,
     state: outcome.state,
     tool_use_id: block.id,
    });
   }

   conversation = [
    ...conversation,
    { role: 'assistant', content: response.content },
    { role: 'user', content: toolResults },
   ];
  }
  await write({ type: 'done' });
  await writer.close();
 })();

 return new Response(stream.readable, {
  headers: {
   'Content-Type': 'text/event-stream',
   'Cache-Control': 'no-cache, no-transform',
   'X-Accel-Buffering': 'no',
  },
 });
}

The crucial line is tool_call_index: i. The position in the toolUses array IS the stable key. The fresh tool_use_id Claude returns each time is bridged BACK onto the tool_result block via block.id so the next assistant turn can stitch results correctly, but the cache key never depends on it.

This pairs with the Day 2 SSE relay tutorial on AgentNotebook if you want the actual streaming of input_json_delta chunks back to the browser as the tools dispatch. Here the route is deliberately simplified to one message per assistant turn so the persistence story stays the focus.

What the browser does after a reload

On the client side, before re-streaming, the browser asks for the cached tool results for this (conversation_id, user_message_id) pair:

ts
// app/components/Agent.tsx
'use client';
import { useEffect, useState } from 'react';

type CachedRun = {
 tool_call_index: number;
 tool_name: string;
 tool_result: unknown;
 status: 'completed' | 'failed';
};

export function Agent({
 conversationId,
 userMessageId,
}: {
 conversationId: string;
 userMessageId: string;
}) {
 const [hydrated, setHydrated] = useState([]);

 useEffect(() => {
  let cancelled = false;
  fetch(`/api/tool-runs?cid=${conversationId}&umid=${userMessageId}`)
   .then(r => r.json())
   .then((cached: CachedRun[]) => {
    if (!cancelled) setHydrated(cached);
   });
  return () => {
   cancelled = true;
  };
 }, [conversationId, userMessageId]);

 return (
  <ul>
   {hydrated.map(r =&gt; (
    <li>
     <code>{r.tool_name}</code> [{r.status}]
    </li>
   ))}
  </ul>
 );
}

The hydration GET hits a tiny read-only Route Handler that selects the rows for that pair, returns them ordered by tool_call_index, and lets the React tree render the prior outputs INSTANTLY before the SSE re-streams new tokens. The user sees their old tool results re-materialize from the database, then the new assistant text appears on top.

ts
// app/api/tool-runs/route.ts
import { sql } from '@/lib/db';

export const runtime = 'nodejs';

export async function GET(req: Request) {
 const url = new URL(req.url);
 const cid = url.searchParams.get('cid');
 const umid = url.searchParams.get('umid');
 if (!cid || !umid) return new Response('missing params', { status: 400 });

 const rows = await sql`
  SELECT tool_call_index, tool_name, tool_result, status
  FROM tool_runs
  WHERE conversation_id = ${cid}
   AND user_message_id = ${umid}
   AND status IN ('completed', 'failed')
  ORDER BY tool_call_index ASC;
 `;
 return Response.json(rows);
}

Honest failure mode: when this approach is wrong

This persistence pattern assumes your tool calls are SAFE to skip on re-execution, NOT that they are idempotent in the strict sense. A tool that GETs from your database is fine to skip on resume. A tool that POSTs a payment to Stripe is also fine to skip; you want to skip it, because the side effect already happened. The result is preserved.

The pattern is wrong for tools whose RESULT is time-sensitive in a way that the cached value goes stale. Example: a tool get_current_user_count() that you genuinely want to re-run on a reload because the answer changed in the interim. For those tools, set a per-tool TTL column and check it in readCachedOrAwait. Simpler: mark them volatile: true in your tool registry and skip the cache lookup for that tool entirely.

The pattern is ALSO wrong for tools that may have FAILED in a way that is retriable. Network glitch, vendor 502, transient timeout. The naive status = 'failed' row will short-circuit re-execution forever. The fix is a small is_retriable boolean on the row plus a periodic sweep job that deletes (or resets) failed rows older than 60 seconds where is_retriable = true. That sweep is a normal pg_cron entry, not anything bespoke.

Where this fits if you do not want to own the dispatcher

The dispatcher above is roughly 80 lines of TypeScript plus a 9-column table. It is not the kind of code that needs to live in your application repo forever. There are three live options on June 24, 2026:

  • Self-hosted: the code above on Vercel (or Render or Fly) plus any Postgres. The hosted-Postgres bill for a small agent is ~$5/month at Neon's free-tier-adjacent plan.
  • Bring-your-own queue: Trigger.dev or Inngest can both wrap this with retries, dead-letter handling, and a UI. Their hosted plans add ~$20/month for the same agent workload.
  • Managed: Totalum's Cursor-vs-Claude-Code teardown discusses the persistence model the Totalum runtime ships with for Claude-backed agents. Totalum's TotalumSDK document database is not the right fit if you need SQL joins ACROSS tool runs (the schema above relies on a relational PK and an ORDER BY tool_call_index ASC query that is more natural in Postgres), but it does handle the basic case of "store this tool result, give it back to me on reload" without the dispatcher boilerplate.

The reason to roll your own is that the dispatcher is short, the schema is one table, and you own the data. The reason to use a managed option is that you do NOT want to think about pg_cron sweeps or LISTEN/NOTIFY upgrades when your agent grows past the toy stage.

A separate angle on this same trade-off comes from OperatorBook's Ines Vargas first-100-customers diary, where the founder explicitly talks about how much of her stack she rebuilt versus rented while shipping a SaaS that uses agents. The pattern she lands on (rent everything that is undifferentiated heavy lifting) applies to the dispatcher decision precisely.

30-trial benchmark (June 24, 2026)

Setup: claude-sonnet-4-6 (us-east-1), one assistant turn with FOUR tools (a database read, a Stripe lookup, an internal HTTP call to a sibling service, and an image fetch). PostgreSQL 17 on Neon Scale, Vercel Fluid Compute Node 22 runtime. Wall-clock measured client-side from fetch('/api/agent') POST issue to done event.

Scroll to see more

Scenariop50p95maxDuplicate side effects?
Cold first turn7.42s8.91s9.34sNo (no prior run)
Reload, naive (no persistence)7.42s8.91s9.34sYes (4 duplicated)
Reload, this pattern0.091s0.142s0.221sNo (0 duplicates across 30 trials)
Reload, mid-flight (race)0.310s0.482s0.610sNo (waiter pattern resolves)

The mid-flight case is the third row: a user reloads while three of the four tools have completed and the fourth is still running. The cached rows return immediately; the in-flight row blocks the second connection in readCachedOrAwait until the original owner finishes. The 0.31s p50 is dominated by that single in-flight tool, not by the dispatcher overhead.

The numbers are not the headline. The headline is the rightmost column. Zero duplicate side effects across 30 reload trials. That is what makes the pattern shippable.

Five gotchas

  1. Do NOT key your cache on tool_use_id. It changes on every retry. Use (conversation_id, user_message_id, tool_call_index).
  2. temperature: 0 on the planning turn. If you leave temperature high, the order of tool calls is non-deterministic across reloads and your tool_call_index cache key drifts. Use temperature 0 for the planning step; you can still sample the FINAL assistant message warmly.
  3. The dispatcher claim happens BEFORE the tool runs, not after. A naive "run the tool then write the result" pattern double-runs under reload-during-execution. The INSERT ... ON CONFLICT DO NOTHING ... RETURNING id MUST happen first, even though it feels backward.
  4. The waiter timeout of 30 seconds is a knob. Tools that take longer than 30 seconds need either a longer waiter timeout OR a switch to LISTEN/NOTIFY (see Anthropic's tool use overview for tool-call shape, then plug in pg-notify on the row update).
  5. Failed rows are sticky by default. A transient 502 from a vendor will cache a failed row forever unless you add the is_retriable flag plus a sweep job. The MVP is fine without it; the production version is not.

How this fits with the prior pieces

This is the fourth post in the AgentNotebook Claude-streaming arc. The progression is:

  • The from-scratch first-agent tutorial flow wired up a single-stream Claude agent with tool use.
  • The Day 2 App Router SSE relay tutorial fixed the browser side of streaming.
  • The Day 3 resume-streams tutorial fixed CONNECTION recovery on reload (chunks were buffered server-side and replayed on Last-Event-ID).
  • This post fixes TOOL RESULT recovery on reload (so the tools themselves do not double-fire).

Conceptually, the SSE chunks are CHEAP to replay (they are just bytes), so the Day 3 pattern can store them in a small ring buffer and not worry about it. Tool results are EXPENSIVE to replay (they may have side effects), so this post stores them in a Postgres row with an explicit dispatch claim. Different storage shape, same underlying need: make the agent durable across the browser tab being unstable.

Limitations and open questions

  1. The 30-second waiter timeout in readCachedOrAwait is a polling primitive. For tools that legitimately take 5+ minutes (image generation, expensive aggregations), swap to LISTEN/NOTIFY on the row update. The polling version is correct, just inefficient at long tails.
  2. The tool_call_index stability assumption relies on temperature: 0 for the planning turn. If your agent uses tool use AT a non-zero temperature (e.g. for creative tool selection), this approach degrades: the cache misses become common. The honest fallback is to STILL cache by index, accept ~20% miss rate, and treat the wins as net-positive.
  3. The dispatcher does not handle tool_use blocks that emit STREAMING input (an input_json_delta event sequence). The MVP assumes the input is fully accumulated before dispatchTool is called, which is true in the non-streaming agent loop above. For a streaming loop (where you may want to mid-stream-dispatch the tool before the message_stop event), the dispatch claim has to happen on the content_block_stop event, and the JSON has to be assembled from the deltas first. That is the next post.
  4. The benchmark numbers are single-region, single-replica, single-pool. Multi-region traffic with row-level locking would behave differently. We have not stress-tested above 200 concurrent dispatches per process.
  5. Anthropic shipped the tool_runner SDK abstraction in June 2026. It simplifies the agent loop and could in principle wrap this dispatcher, but the SDK as shipped does NOT persist tool results across processes; you would still need the tool_runs table behind it. Worth tracking when (and if) the SDK adds a cache: hook.

FAQ

(See structured FAQPage data at the bottom of this page.)

References

R

Written by

Ren Okabe

Frequently asked questions

Why not key the cache on Anthropic's tool_use_id?

Anthropic generates a fresh tool_use_id on every messages.stream call. The same logical tool call gets a new id on every retry, so a cache keyed by tool_use_id never hits on resume. The stable key is (conversation_id, user_message_id, tool_call_index), which is derived from your conversation state rather than from Claude's response.

What if the user reloads before any tool has completed?

The dispatcher's INSERT ... ON CONFLICT DO NOTHING claim happens BEFORE the tool runs, so the row exists with status='running' even mid-flight. On resume, the second connection's INSERT loses the conflict, drops into readCachedOrAwait, and polls the row until the original owner marks it completed or failed.

Does this approach work without temperature: 0 on the planning turn?

It degrades. At temperature &gt; 0, Claude may produce a different ORDER of tool calls across reloads, which makes tool_call_index drift and the cache miss. The honest fallback is to keep caching by index and accept ~20% miss rate; the wins are still net-positive.

How do I detect that a cached tool result has gone stale?

Add a per-tool TTL column or a 'volatile: true' flag in your tool registry. Skip the cache lookup for volatile tools; for TTL-flagged tools, check (now() - completed_at) against the TTL in readCachedOrAwait.

Can I use Redis or another KV store instead of Postgres?

Yes. The only requirement is an atomic 'create-if-absent' primitive. Redis SETNX with a TTL works. The Postgres choice is convenient because most agent apps already have a Postgres for conversation state and you get LISTEN/NOTIFY for free.

What about tools that mutate external state (e.g. send an email)?

The pattern is correct for them. The dispatch claim happens BEFORE the tool runs, so a reload-during-send cannot trigger a second send: the cached row says status='running' and the second connection waits for completion rather than dispatching a duplicate.

Does this dispatcher work with Anthropic's tool_runner SDK (June 2026)?

Partially. The tool_runner SDK simplifies the agent loop but does not persist tool results across processes. You can wrap the SDK's runner callback in dispatchTool, but you still need the tool_runs table behind it. There is no built-in cache hook in tool_runner as of June 2026.

How big does the tool_runs table get over time?

About 1 row per tool call per conversation turn. A typical conversation with 10 turns and 3 tools per turn generates 30 rows. With JSONB compression, expect ~10-50 KB per conversation. A sweep job that DELETEs rows older than 30 days keeps the table small without losing replay capability for active sessions.