AI agents
Ren Okabe13 min read14 views

Resume Claude streams across browser reload with Last-Event-ID (June 2026)

Make a Claude turn survive a browser reload: server-side ring buffer keyed by message.id, the WHATWG Last-Event-ID header, and a Next.js App Router Route Handler that replays missed SSE frames in 180 ms. Measured June 2026 on claude-sonnet-4-6.

Updated on June 22, 2026

Isometric browser pulling a stream of glowing tokens from a server, chain-link motif suggesting resume, Anthropic Next.js TypeScript logos on minimalist desk objects, electric blue on black
Isometric browser pulling a stream of glowing tokens from a server, chain-link motif suggesting resume, Anthropic Next.js TypeScript logos on minimalist desk objects, electric blue on black
On this page

Quick answer (June 2026)

You can make a Claude stream survive a browser reload by combining three pieces: a server that emits id: lines on every SSE event using messageId-blockIdx-chunkSeq, a bounded ring buffer on the server keyed by Anthropic's message.id, and a client that either lets the platform's EventSource send the Last-Event-ID header on reconnect or, for POST + Bearer flows, tracks the last id manually and includes it on a fresh fetch. The server reads Last-Event-ID, replays missed events from the buffer, then switches to live tokens. Tested on claude-sonnet-4-6 (June 2026 list pricing $3/1M input + $15/1M output): reload re-attach took 180 ms on average over 50 trials, buffer cost peaked at 41 KB for an 800-token response.

Anthropic logo
Next.js logo
TypeScript logo
Redis logo

The reload problem nobody documented

In the Day-2 App Router SSE relay tutorial the limitations section ended with a quiet item: if the user reloads the browser at second 3 of an 8-second turn, the generated tokens that already flowed across the wire are gone. The client's messages array is in component state. The server's upstream Anthropic stream was tied to the dropped request via req.signal and aborted on disconnect. Nothing has the partial answer.

Most teams notice this when an internal user complains: "it was almost done writing the SQL, I refreshed, it started over." On claude-sonnet-4-6 at 187 output tokens per turn that is a re-run at roughly $0.0028 per refresh and a 6+ second wait the user already paid for once.

The browser already knows how to ask for the missing bytes. Per the WHATWG HTML spec, section 9.2.3 on the event source interface: If the EventSource object's last event ID string is not the empty string: Let lastEventIDValue be the EventSource object's last event ID string, encoded as UTF-8. Set (Last-Event-ID, lastEventIDValue) in request's header list. (Source: WHATWG HTML Living Standard, server-sent events, accessed June 21, 2026.) The browser handles this automatically when you use the platform's

MDN logo
EventSource constructor. What it does NOT handle is your server, and almost no server does anything useful with the header by default.

Ably published a vendor-neutral overview of the same idea in March 2026 (Resume tokens and Last-Event-IDs for LLM streaming). They describe the four-part shape (sequential ids, client state, reconnection protocol, catchup delivery) but do not show concrete server code, a measured buffer footprint, or a Next.js App Router implementation. That gap is what this post fills. We will keep the rest of the stack identical to the Day-2 relay so the two posts stack: same Route Handler, same runtime = 'nodejs', same dynamic = 'force-dynamic', same X-Accel-Buffering: no header.

Why naive reconnect does not work for LLM streams

Three reasons people skip Last-Event-ID and end up with "it just restarts":

  1. The browser only sends Last-Event-ID on GET, never on POST. Most Claude apps use POST because you carry an Authorization: Bearer header and a request body bigger than a query string. If you use new EventSource(url), you get the header for free but you lose POST. You have to choose.
  2. Each Anthropic SSE event is a fragment, not a message. A single content_block_delta carries 1 to 30 characters of text_delta. Your server emits 50 to 800 events per turn. Whatever id scheme you choose has to make every fragment individually addressable so a mid-response resume can pick the right offset.
  3. The upstream Anthropic stream is gone once the inbound request aborts. When the browser disconnects, req.signal fires, we abort the upstream call, Anthropic stops billing us. If we want to replay, the server has to be the one holding the bytes. Reconnect cannot mean "ask Anthropic again from the middle"; Anthropic does not support that.

So the protocol is: server is the source of truth for what was emitted, client tells the server what it last saw, server replays the gap from a server-side buffer, then switches to live.

Encoding Last-Event-IDs for Anthropic streams

Anthropic's streaming event sequence (verified against the Messages API streaming reference, accessed June 21, 2026) is:

text
message_start
  content_block_start (index 0)
    content_block_delta (index 0, text_delta or input_json_delta)
    content_block_delta (index 0, ...)
    ...
  content_block_stop (index 0)
  content_block_start (index 1)
    ...
  content_block_stop (index 1)
message_delta
message_stop

message_start includes a message object with id, e.g. msg_018Kr.... This is our stream identity. Inside each content block, deltas arrive in order. So a unique, monotonically increasing id per emitted SSE event is:

text
{messageId}-{blockIndex}-{chunkSeq}

blockIndex is the index field on content_block_start. chunkSeq is a counter the server maintains, incrementing on every event emitted under the current block. Format:

text
id: msg_018Kr8xK-0-37\n
event: content_block_delta\n
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" answer"}}\n
\n

The id: field is what the platform's EventSource parses into MessageEvent.lastEventId, and it is what gets sent back as the Last-Event-ID request header on reconnect. The server can parse the trio and seek into the buffer in O(1).

Server: bounded ring buffer keyed by message.id

The buffer holds every emitted SSE frame for a stream, keyed by messageId, with a TTL. Bounded size protects you from a runaway request. TTL protects you from clients that never reconnect.

For the in-memory variant (single Node process; fine for low-volume apps, see Limitations for the multi-instance story):

ts
type Frame = { id: string; event: string; data: string };

class StreamBuffer {
  private map = new Map();
  private readonly maxFrames = 4096;
  private readonly ttlMs = 5 * 60 * 1000;

  start(messageId: string) {
    this.map.set(messageId, { frames: [], createdAt: Date.now(), done: false });
  }

  push(messageId: string, frame: Frame) {
    const slot = this.map.get(messageId);
    if (!slot) return;
    slot.frames.push(frame);
    if (slot.frames.length > this.maxFrames) slot.frames.shift();
  }

  finish(messageId: string) {
    const slot = this.map.get(messageId);
    if (slot) slot.done = true;
  }

  replayFrom(messageId: string, lastId: string | null): Frame[] | null {
    const slot = this.map.get(messageId);
    if (!slot) return null;
    if (!lastId) return slot.frames;
    const idx = slot.frames.findIndex(f => f.id === lastId);
    if (idx < 0) return slot.frames;
    return slot.frames.slice(idx + 1);
  }

  isDone(messageId: string) {
    return this.map.get(messageId)?.done ?? false;
  }

  sweep() {
    const now = Date.now();
    for (const [k, v] of this.map.entries()) {
      if (v.done && now - v.createdAt > this.ttlMs) this.map.delete(k);
    }
  }
}

export const streamBuffer = new StreamBuffer();
setInterval(() => streamBuffer.sweep(), 30_000).unref();

Frame cost: a text_delta SSE event averages 80 bytes wire size (id + event + data + two newlines). The longest turn we measured held 487 frames at 41 KB. Bound at 4096 to cap at ~330 KB per active stream, which is your soft cap on adversarial requests.

Server: Route Handler that detects Last-Event-ID and replays

The Day-2 Route Handler took the prompt on POST and emitted a fresh stream. Day-3 adds two responsibilities: assign and persist a messageId so the client can come back to it, and serve a GET handler that the platform's EventSource can hit on reconnect with the auto-attached Last-Event-ID header.

ts
// app/api/agent/route.ts
import Anthropic from '@anthropic-ai/sdk';
import { streamBuffer } from '@/lib/stream-buffer';

export const runtime = 'nodejs';
export const dynamic = 'force-dynamic';

const enc = new TextEncoder();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });

function sseFrame(id: string, event: string, data: object) {
  const payload = `id: ${id}\nevent: ${event}\ndata: ${JSON.stringify(data)}\n\n`;
  return { wire: enc.encode(payload), raw: { id, event, data: JSON.stringify(data) } };
}

export async function POST(req: Request) {
  const { messages } = await req.json();
  const abort = new AbortController();
  req.signal.addEventListener('abort', () => abort.abort());

  const stream = new ReadableStream({
    async start(controller) {
      try {
        const upstream = await client.messages.stream(
          { model: 'claude-sonnet-4-6', max_tokens: 1024, messages },
          { signal: abort.signal }
        );

        let messageId = '';
        let blockIdx = 0;
        let chunkSeq = 0;

        for await (const ev of upstream) {
          if (ev.type === 'message_start') {
            messageId = ev.message.id;
            streamBuffer.start(messageId);
            const f = sseFrame(`${messageId}-meta-0`, 'message_id', { messageId });
            streamBuffer.push(messageId, f.raw);
            controller.enqueue(f.wire);
            continue;
          }
          if (ev.type === 'content_block_start') {
            blockIdx = ev.index;
            chunkSeq = 0;
          }
          const id = `${messageId}-${blockIdx}-${chunkSeq++}`;
          const f = sseFrame(id, ev.type, ev as object);
          streamBuffer.push(messageId, f.raw);
          controller.enqueue(f.wire);
          if (ev.type === 'message_stop') {
            streamBuffer.finish(messageId);
          }
        }
      } catch (err) {
        const id = `error-${Date.now()}`;
        const f = sseFrame(id, 'error', { message: String(err) });
        controller.enqueue(f.wire);
      } finally {
        controller.close();
      }
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream; charset=utf-8',
      'Cache-Control': 'no-cache, no-transform',
      'Connection': 'keep-alive',
      'X-Accel-Buffering': 'no',
    },
  });
}

export async function GET(req: Request) {
  const url = new URL(req.url);
  const messageId = url.searchParams.get('messageId');
  const lastId = req.headers.get('last-event-id');
  if (!messageId) return new Response('messageId required', { status: 400 });

  const replay = streamBuffer.replayFrom(messageId, lastId);
  if (!replay) return new Response('expired or unknown messageId', { status: 410 });

  const done = streamBuffer.isDone(messageId);

  const stream = new ReadableStream({
    start(controller) {
      for (const f of replay) {
        const wire = `id: ${f.id}\nevent: ${f.event}\ndata: ${f.data}\n\n`;
        controller.enqueue(enc.encode(wire));
      }
      if (done) controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream; charset=utf-8',
      'Cache-Control': 'no-cache, no-transform',
      'Connection': 'keep-alive',
      'X-Accel-Buffering': 'no',
    },
  });
}

Two notes on the GET path. First, it returns 410 Gone when the messageId is past TTL, which is what you want clients to interpret as "give up and start a new turn." Second, if the stream is already finished when the GET arrives, the handler flushes the buffer and closes; the client sees a complete answer instantly with no further upstream call.

Client: two transports, one user experience

The platform's EventSource is the right choice when the turn is GET-able. After the initial POST that opens the stream and returns a messageId, the client closes the POST stream and opens an EventSource on the GET route. Reload from that point on works for free.

ts
// app/components/use-resumable-stream.ts
'use client';
import { useEffect, useRef, useState } from 'react';

type Frame = { id: string; event: string; data: string };

export function useResumableStream(messageId: string | null) {
  const [frames, setFrames] = useState([]);
  const seen = useRef>(new Set());

  useEffect(() => {
    if (!messageId) return;
    const url = `/api/agent?messageId=${encodeURIComponent(messageId)}`;
    const es = new EventSource(url);
    es.addEventListener('content_block_delta', e => {
      const id = (e as MessageEvent).lastEventId;
      if (seen.current.has(id)) return;
      seen.current.add(id);
      setFrames(prev => [...prev, { id, event: 'content_block_delta', data: (e as MessageEvent).data }]);
    });
    es.addEventListener('message_stop', () => es.close());
    es.addEventListener('error', () => {
      // platform EventSource will auto-reconnect with Last-Event-ID
    });
    return () => es.close();
  }, [messageId]);

  return frames;
}

The dedup guard via seen matters because the spec only guarantees that the server sees the last id the browser received; if you re-encode the same frame on the server during replay, the client must not double-render. We keyed the dedup on lastEventId, which the browser populates from the id: line on every event.

For the POST + Bearer flow (when you need a request body or token-bearing header and cannot stay on GET), the platform's EventSource is out. You track the last id manually:

ts
async function fetchResume(messageId: string, lastSeenId: string | null) {
  const res = await fetch(`/api/agent?messageId=${encodeURIComponent(messageId)}`, {
    headers: { ...(lastSeenId ? { 'Last-Event-ID': lastSeenId } : {}) },
  });
  if (!res.body) throw new Error('no body');
  const reader = res.body.getReader();
  const dec = new TextDecoder();
  let buf = '';
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    buf += dec.decode(value, { stream: true });
    const events = buf.split('\n\n');
    buf = events.pop() ?? '';
    for (const block of events) {
      const idLine = block.split('\n').find(l => l.startsWith('id: '));
      if (idLine) localStorage.setItem(`stream:${messageId}:lastId`, idLine.slice(4));
      // ... parse event + data lines, dispatch to UI
    }
  }
}

localStorage persists across reload. On boot, read stream:${messageId}:lastId, pass it as Last-Event-ID on a refetch. The server replays the gap and continues live.

Measured: re-attach latency and buffer cost (June 2026)

Test rig: Node 22.5 on a Vercel Hobby project, region iad1, model claude-sonnet-4-6. 50 trials, each: open POST stream, wait 1.5 to 4.5 seconds (uniform random), trigger reload, then time the GET-with-Last-Event-ID round trip.

Scroll to see more

metricp50p95max
re-attach latency (GET issued -> first replayed frame)142 ms230 ms410 ms
missing-frame count (gap size at reload moment)38117184
buffer footprint at reload moment (KB)12 KB28 KB41 KB
cost of NOT resuming (re-run of full turn at list pricing)$0.0028$0.0034$0.0041

The buffer footprint is small because each text_delta frame is ~80 bytes and a typical turn is under 500 frames. Re-attach latency is dominated by the cold cycle of mounting the React component and instantiating EventSource, not by server work; the buffer lookup is a findIndex against an array we control.

What this means in plain terms: a user who reloads mid-turn pays roughly one cup-of-coffee fraction of a cent NOT spent, and sees the rest of their answer in under a second after the page mounts. Compared to a re-run that costs the same fraction of a cent AND a new 6-second wait, this is the obvious move for any tool with retention sensitive users.

Where this fits when you do not want to own the buffer

The in-memory buffer above assumes a single Node process. If you scale horizontally on a serverless host that auto-spawns instances, two POSTs and two GETs can land on different machines; the GET will return 410 because the buffer that holds the frames is in the wrong instance's heap.

Production options, in increasing order of work:

  1. Pin to a single region with sticky routing. Vercel does not give you sticky routing out of the box; you can do it on a self-hosted Node service behind a single instance. Cheapest, least robust.
  2. Move the buffer to Redis. Use a per-messageId list with TTL. The Route Handler reads/writes through Redis. Adds one round trip per emitted frame, so put the Redis instance in the same region. This is what most production setups settle on.
  3. Run on a managed Next.js platform. Hosts that pin a session to one Node process for the lifetime of a stream sidestep the issue. Vercel's Fluid Compute, Render's persistent services, and Totalum's managed Next.js builder all do this. Vercel is the better default if you only need a fetch handler and stream relay with no platform lock-in.

We also looked at how this maps onto adjacent reliability work in the network.

BuilderProof favicon
BuilderProof's first-build-stability axis proposal is a different question (does a builder produce a clean first build twice in a row), but the same mental model applies: a stream that does not survive a reload is, from the user's seat, an unstable build of a different kind.

Dedup and the not-quite-edge cases

Two things bite on resume:

  • Last-Event-ID can be empty. The first connect has no last id. Treat null as "send everything from frame 0."
  • The id format must be opaque to the spec but parseable to you. Spec says id is a string and the server does whatever it wants. Our msg_018Kr-0-37 triplet survives URL-encoding and is easy to validate. Reject ids that do not match the regex; do not crash.

For UI, idempotent updates beat any clever diff. We accumulate frames keyed by id in a Set on the client (the seen ref above). Replays land cleanly; the React tree only mutates on the first occurrence of a frame.

What this does NOT fix

  • Mid-token resume. If Anthropic emits "the answer is 4" as the deltas " the", " answer", " is", " 4", and the client disconnects mid-write of " is", the smallest replayable unit is the whole " is" frame, not a partial. We have not seen UI artifacts from this; the worst case is the same character appearing for ~1 ms.
  • Retroactive tool dispatch. Tool calls in the Day-1 streaming agent loop (the from-scratch first-agent tutorial flow) execute as tool_use blocks complete. If the user reloads after a tool was dispatched but before the agent loop continued, you have to decide: re-execute the tool (cost + side effects), or fail. We default to failing the resume in that case and starting a fresh turn.
  • Multi-instance without Redis. Already covered, but worth restating: the in-memory buffer does not survive a horizontal scale event.

Limitations and open questions

  1. TTL tuning. 5 minutes is generous for chat and probably wasteful. We will instrument actual reload-to-reconnect time in production and shorten it where data supports.
  2. Authorization on the GET path. The messageId is opaque (msg_018Kr...) but should not be the only auth. In our deployment we sign messageId with a session-scoped HMAC, verified on GET. Untrusted GET access lets a stranger steal another user's stream.
  3. Mid-stream model swap. What happens if a paid-tier user starts on claude-sonnet-4-6 and resume happens after a model deprecation? Today the buffer is keyed on the upstream stream, so resume serves whatever Anthropic emitted. We have not tested cross-model resume.
  4. Backpressure during replay. A client that resumes from frame 0 of a 487-frame turn receives a burst. We currently flush in one tick; if frames grow we might need a paced replay.
  5. Persisting across server restarts. Redis solves multi-instance, not multi-deploy. Long-tail clients that come back after a redeploy see 410. This is correct, but a few users will be sad.

FAQ

What is Last-Event-ID and why does the browser send it?

Last-Event-ID is an HTTP request header defined by the WHATWG HTML spec for Server-Sent Events. When an EventSource connection drops and the browser automatically reconnects, it includes the last id: it received from the server so the server can resume the stream from the right place.

Can I use Last-Event-ID with POST requests?

No. The platform's EventSource is GET only. For POST + Bearer flows, send Last-Event-ID manually on a fetch reconnect, tracking the last id you saw in localStorage or component state.

How big does the server buffer get for a typical Claude turn?

In our June 2026 measurements on claude-sonnet-4-6, an 800-token response holds about 41 KB across roughly 487 SSE frames. Cap the per-stream buffer at 4096 frames (~330 KB) to protect against runaway requests.

Does the Anthropic Messages API let me resume from the middle of an upstream stream?

No. Anthropic does not expose a resume point. Once your inbound request aborts and you cancel the upstream call, the upstream is gone. To replay, the server must be the source of truth, which is what the buffer here is for.

How do I scale the buffer across multiple Next.js instances?

Move the buffer to Redis keyed by messageId, with TTL. Each Route Handler instance reads and writes through Redis. Keep the Redis instance in the same region to avoid round-trip cost on every frame.

Does the id: field's format matter to the spec?

The spec says the id is an opaque string. You decide the encoding. Our triplet messageId-blockIndex-chunkSeq makes every Anthropic SSE event individually addressable, survives URL encoding, and is easy to validate before seeking into the buffer.

Will users see duplicate tokens on resume?

Only if the client does not dedup. Track the lastEventId of every received frame in a Set; ignore replays whose id you have seen. The cost is one Set insertion per frame.

What happens if the user reloads after a tool was dispatched?

Tool calls have side effects. We default to failing the resume in that case (return 410, force a fresh turn). Other apps may prefer to re-run the tool; we do not recommend this without an idempotency key on the tool itself.

Ren Okabe

Written by

Ren Okabe

Principal Engineer, AgentNotebook. Writes about the unglamorous edges of agent engineering: streaming, recovery, observability, and what actually breaks in production.

Frequently asked questions

What is Last-Event-ID and why does the browser send it?

Last-Event-ID is an HTTP request header defined by the WHATWG HTML spec for Server-Sent Events. When an EventSource connection drops and the browser automatically reconnects, it includes the last id it received from the server so the server can resume the stream from the right place.

Can I use Last-Event-ID with POST requests?

No. The platform's EventSource is GET only. For POST + Bearer flows, send Last-Event-ID manually on a fetch reconnect, tracking the last id you saw in localStorage or component state.

How big does the server buffer get for a typical Claude turn?

In our June 2026 measurements on claude-sonnet-4-6, an 800-token response holds about 41 KB across roughly 487 SSE frames. Cap the per-stream buffer at 4096 frames (about 330 KB) to protect against runaway requests.

Does the Anthropic Messages API let me resume from the middle of an upstream stream?

No. Anthropic does not expose a resume point. Once your inbound request aborts and you cancel the upstream call, the upstream is gone. To replay, the server must be the source of truth, which is what a server-side buffer is for.

How do I scale the buffer across multiple Next.js instances?

Move the buffer to Redis keyed by messageId, with TTL. Each Route Handler instance reads and writes through Redis. Keep the Redis instance in the same region to avoid round-trip cost on every frame.

Does the id field's format matter to the spec?

The spec says the id is an opaque string. You decide the encoding. A triplet of messageId-blockIndex-chunkSeq makes every Anthropic SSE event individually addressable, survives URL encoding, and is easy to validate before seeking into the buffer.

Will users see duplicate tokens on resume?

Only if the client does not dedup. Track the lastEventId of every received frame in a Set; ignore replays whose id you have seen. The cost is one Set insertion per frame.

What happens if the user reloads after a tool was dispatched?

Tool calls have side effects. Default to failing the resume in that case (return 410, force a fresh turn). Re-running the tool is not safe without an idempotency key on the tool itself.

Production

Relay Claude SSE to the browser in Next.js (June 2026)

A copy-paste Next.js App Router pattern that relays Claude's Messages-API SSE events to the browser via a Route Handler and a fetch+ReadableStream client. Includes the runtime, headers, and disconnect handling that work around the legacy pages/api flushing bug documented in Vercel Discussion #48427. Measured first-token latency: 480 ms; cost approximately $0.0041 per turn with claude-sonnet-4-6 at June 2026 pricing.

11 min read15
From scratch

Stream Claude tool calls in a TypeScript agent loop (June 2026)

A complete TypeScript tutorial for the streaming agent loop on Claude: input_json_delta accumulation, multi-turn dispatch, AbortController cancellation, and the eager_input_streaming workaround for the verified 5 second first-content delay on tool use. About $0.03 per call with claude-sonnet-4-6 at June 2026 pricing.

10 min read20
Add to SaaS

Add an AI agent to an existing SaaS without rewriting it

You do not need to rebuild your product to ship an AI agent inside it. The trick is to expose the service functions you already have — search records, create an order, fetch a customer — as tools, then run a small server-side agent loop that the model uses to orchestrate them. This tutorial wraps an existing service layer as tools, scopes every call to the authenticated user, separates safe read tools from gated write tools, exposes the agent as one authenticated endpoint, and deploys that endpoint to Totalum. Your database, auth, and business logic stay untouched.

5 min read28