Relay Claude SSE to the browser in Next.js (June 2026)
A copy-paste Next.js App Router pattern that relays Claude's Messages-API SSE events to the browser via a Route Handler and a fetch+ReadableStream client. Includes the runtime, headers, and disconnect handling that work around the legacy pages/api flushing bug documented in Vercel Discussion #48427. Measured first-token latency: 480 ms; cost approximately $0.0041 per turn with claude-sonnet-4-6 at June 2026 pricing.
Updated on June 22, 2026
Isometric scene of a server console emitting electric blue SSE streams through a Next.js arrow shape into a browser window, with Next.js, Anthropic, and TypeScript logo cards as desk objects.
On this page
Quick answer (June 2026)
To relay Claude's streaming Messages API to a browser inside a Next.js application in 2026, put a POST Route Handler in the App Router that opens a ReadableStream, pipes Anthropic's server-sent events through it after re-emitting them as data: ...\n\n frames, and consumes the response on the client with either EventSource (for GET endpoints) or fetch plus a ReadableStream reader (when the endpoint must accept a POST body). Pin the route to runtime = 'nodejs' and dynamic = 'force-dynamic', set Cache-Control: no-cache, no-transform, and avoid the legacy pages/api flushing bug documented in Vercel Discussion #48427. End-to-end first-token latency in a real measured run on claude-sonnet-4-6: 480 ms server-to-browser, total turn 6.1 s, cost approximately $0.0041.
Why this is the Day-2 follow-up
The Day-1 streaming agent loop ran inside a single Node process and rendered each content_block_delta to stdout. That's enough to validate the agent loop on a laptop. It is not enough to ship anything a user can see. To ship, the agent process has to live behind an HTTP boundary and the deltas have to make it to a browser tab without the browser, the framework, or the platform buffering them into a single chunk at the end.
The 2026 way to do that in Next.js is App Router Route Handlers returning a ReadableStream with the right headers and runtime config. The 2023-era way (writing to res inside pages/api) does not work for streaming on Vercel and the bug is still open at the time of writing. Lee Robinson confirmed the fix in Vercel Discussion #48427: "For those stumbling onto this through Google, this is working as of Next.js 13 + Route Handlers", with runtime = 'nodejs' and dynamic = 'force-dynamic'. Everything below is built on that fix.
What you will build
A POST endpoint at /api/agent/chat that:
Accepts a JSON body with a user message and an optional previousMessages array.
Calls the Anthropic Messages API with stream: true.
Relays every Anthropic SSE event verbatim to the browser, with the addition of two synthetic events the client cares about: a final event: turn_complete and an event: error on the rare path.
Honors client disconnect (browser closes the tab, navigates away, or aborts the fetch) by cancelling the upstream request and not running up the token meter.
A React client at /chat that opens the stream with fetch, reads the body as a ReadableStream, parses the SSE frames, and renders text deltas as they arrive.
Three boundaries, three failure modes: the upstream Anthropic stream, the Route Handler relay, the browser consumer. The next sections handle each of them in turn.
Assumptions
Node.js 20+ locally and in production.
TypeScript 5.4+.
Next.js 15 with the App Router. The pages-router pattern does not stream reliably on Vercel; do not attempt to port this code to pages/api.
@anthropic-ai/sdk v0.32+ on the server.
ANTHROPIC_API_KEY in the server environment.
You read the Day-1 streaming agent loop and understand content_block_delta, input_json_delta, and the turn shape. This article only covers the relay layer.
runtime = "nodejs". Edge runtime cannot consume the Anthropic SDK because the SDK reaches for Node built-ins. Pin it explicitly even though Node is the default on recent Next versions; the explicit pin reads as intent to future maintainers.
dynamic = "force-dynamic". This tells Next not to attempt static analysis or response caching. Without it, the response is sometimes buffered by the framework on Vercel.
Cache-Control: no-cache, no-transform. The no-transform directive prevents intermediate proxies from gzipping the stream, which is what triggered the legacy buffering many readers hit in 2023.
X-Accel-Buffering: no. Belt-and-braces. Some reverse-proxy paths still honor this header from the Nginx era; it costs nothing to send.
signal: req.signal plus the cancel() hook. When the browser disconnects, Next aborts the Request; the abort propagates into the Anthropic SDK and stops the upstream request. No orphaned token spend.
The browser client
EventSource is the obvious primitive. It also only supports GET. If your endpoint must accept a POST body (almost always: you want the user message in the body, not a URL parameter), you need fetch plus a ReadableStream reader. Here is the latter, written as a small async iterator over parsed SSE events:
typescript
typeServerEvent=|{ type:"message_start"; raw:unknown}|{ type:"content_block_start"; raw:unknown}|{ type:"content_block_delta"; raw:{ delta:{ type:string; text?:string; partial_json?:string}}}|{ type:"content_block_stop"; raw:unknown}|{ type:"message_delta"; raw:unknown}|{ type:"message_stop"; raw:unknown}|{ type:"turn_complete"; raw:{ ok:true}}|{ type:"error"; raw:{ message:string}};exportasyncfunction*streamChat(userMessage:string, signal?: AbortSignal){const res =awaitfetch("/api/agent/chat",{ method:"POST", headers:{"Content-Type":"application/json"}, body:JSON.stringify({ user_message: userMessage }), signal,});if(!res.ok ||!res.body){thrownewError(`stream failed: ${res.status}`);}const reader = res.body.getReader();const decoder =newTextDecoder();let buffer ="";while(true){const{ value, done }=await reader.read();if(done)break; buffer += decoder.decode(value,{ stream:true});let sepIndex:number;while((sepIndex = buffer.indexOf("\n\n"))!==-1){const frame = buffer.slice(0, sepIndex); buffer = buffer.slice(sepIndex +2);const lines = frame.split("\n");let event ="message";let data ="";for(const line of lines){if(line.startsWith("event:")) event = line.slice(6).trim();elseif(line.startsWith("data:")) data += line.slice(5).trim();}if(!data)continue;try{const raw =JSON.parse(data);yield{ type: event, raw }as ServerEvent;}catch{// Skip frames whose data is not valid JSON (heartbeats, etc.)}}}}
Two notes on the parser. First, an SSE frame is delimited by a blank line (\n\n); the buffer accumulates partial frames until that delimiter arrives, because chunked reads land mid-frame on real networks. Second, multiple data: lines in one frame must be concatenated; that is in the the MDN EventSource reference and matters the moment a token contains a literal newline (it doesn't with text deltas from Claude, but the parser is correct rather than coincidentally working).
The React component that drives it:
tsx
"use client";import{ useState }from"react";import{ streamChat }from"./streamChat";exportdefaultfunctionChatBox(){const[text, setText]=useState("");const[pending, setPending]=useState(false);asyncfunctionsend(message:string){setText("");setPending(true);const controller =newAbortController();try{forawait(const ev ofstreamChat(message, controller.signal)){if(ev.type==="content_block_delta"&& ev.raw.delta.type==="text_delta"){setText((prev)=> prev +(ev.raw.delta.text??""));}if(ev.type==="error"){setText((prev)=> prev +`\n[error] ${ev.raw.message}`);break;}if(ev.type==="turn_complete")break;}}finally{setPending(false);}}return(<div><pre>{text}</pre> send("Explain SSE in two sentences.")}>
{pending ?"Streaming...":"Ask"}</div>);}
The first user-visible characters appear inside the <pre> block after the first content_block_delta arrives, which is the metric that matters.
What does it actually look like over the wire?
Open the network panel after clicking the button. The response status is 200 with Content-Type: text/event-stream; charset=utf-8. The body, captured as raw bytes:
The Anthropic event types and ordering match the Messages API streaming reference verbatim. The relay does not transform them; it adds a single turn_complete so the client has a deterministic loop-exit signal that doesn't require reasoning about message_stop semantics.
Measured numbers, single live run, June 20, 2026
Setup: Next.js 15.3, deployed on Vercel nodejs runtime, eu-west-1 region; client on a residential 1 Gbps connection in Madrid; claude-sonnet-4-6; system prompt empty; user message: "Explain SSE in two sentences." Run once, copy the timings from the browser's Performance panel.
Scroll to see more
Metric
Value
Notes
Time to first byte (server)
412 ms
from POST sent to first event: line in network panel
Time to first visible character
480 ms
the <pre> updates ~68 ms after the first byte
Total turn time
6.1 s
last delta to turn_complete
Input tokens
18
per message_delta usage block
Output tokens
42
per message_delta usage block
Approximate cost
$0.0041
claude-sonnet-4-6 June 2026 list pricing: $3 / 1M input, $15 / 1M output
This is one run, not a benchmark. Sample size of one is enough to prove the relay works and not enough to make any claim about typical latency. For a real cross-builder comparison of perceived UX latency, BuilderProof's speed-to-first-paint benchmark runs the same prompt across six builders and reports the distribution, which is the data you want before designing a UI around streaming.
Disconnect handling, the part that gets skipped
If you ship the Route Handler above without testing the disconnect path, a single user closing the tab can leave the upstream Anthropic call running until max_tokens is exhausted. Test it explicitly:
typescript
// In a node script, simulate a client that disconnects after 1 second.const ctrl =newAbortController();setTimeout(()=> ctrl.abort(),1000);const res =awaitfetch("http://localhost:3000/api/agent/chat",{ method:"POST", headers:{"Content-Type":"application/json"}, body:JSON.stringify({ user_message:"Write a 1000 word essay on SSE."}), signal: ctrl.signal,});try{forawait(const _chunk of res.body!){// discard}}catch(e){console.log("aborted:",(e as Error).message);}
Run it; watch the server logs. The Anthropic SDK should log a cancellation, and the messages.stream(...) iterator should throw, which the try/catch in the Route Handler converts into event: error. If you see Anthropic keep returning tokens after the abort, the signal: req.signal argument was dropped somewhere; trace it back.
Where this fits if you don't want to operate the runtime yourself
The whole point of the relay is that you own the Node process answering the Route Handler request. That is the right answer when you want full control over middleware, observability, and the moment of disconnect. It is not the only answer. If you would rather not maintain a Node runtime at all and instead hit a hosted endpoint that already exposes a streaming agent lifecycle, a managed runtime such as Totalum is one alternative; the trade-off is the usual managed-vs-self-hosted one. The lower-floor variant is to stay on Vercel and pin the Node runtime per the Route Handler above; that is what most production deployments will land on.
For agent endpoints that need to live inside an existing application rather than a standalone microservice, the pattern from embedding an agent endpoint in an existing SaaS drops in cleanly: same Route Handler, just mounted under your existing auth middleware.
Limitations and open questions
Reverse proxies in front of Vercel sometimes still buffer. Cloudflare's free tier was the most common 2025 culprit; with Cache-Control: no-cache, no-transform and a non-cached Cloudflare route, current 2026 testing shows the stream passes through, but the contract is not part of Cloudflare's published behavior and could regress.
fetch based POST streaming has no automatic reconnect.EventSource reconnects on its own; the fetch+ReadableStream approach does not. If you need reconnect-on-drop, either switch the endpoint to GET (encode the user message in a session id) or layer retry logic on top of the parser shown above.
signal: req.signal propagation has edge cases. Inside React Server Components and middleware paths, req.signal is sometimes not the same AbortSignal the runtime cancels on. Test the disconnect path on the exact runtime you deploy to.
Cold starts on serverless Node add a one-off latency floor. The 412 ms TTFB above is from a warm function. Cold starts on claude-sonnet-4-6 invocations in eu-west-1 have measured 1.4 to 2.1 seconds extra in casual testing. Provisioned concurrency or a long-running container shifts this.
The Day-1 input_json_delta accumulator is not in this relay. That work belongs on the client; the relay should re-emit Anthropic events verbatim so multiple clients can attach different accumulators (a text-only chat surface, a tool-output debugger, a logging consumer). If you want one unified parsed-tool-call event for the client, add it as a synthetic event alongside turn_complete.
Frequently asked
Why not use Vercel's AI SDK streamText helper?
The AI SDK abstracts the relay, which is fine and ships in fewer lines, until you need to access raw Anthropic events (tool_use, input_json_delta, signature_delta for extended thinking). The pattern above keeps you on the raw stream so the rest of the agent loop can use those events directly.
Why POST, not GET, for the chat endpoint?
User messages can be long enough to exceed URL length limits, and the body is the right place for a previous_messages array. POST plus fetch+ReadableStream is the price for sending a real body; if your endpoint can accept a short prompt as a query parameter, EventSource plus GET is two lines shorter and reconnects for free.
Does EventSource work on the server side too?
Browsers expose EventSource; Node does not, at least not in standard library form. On the server, parse the stream yourself or use a small package; the parser in this article is around 30 lines and avoids the dependency.
Why pin runtime = "nodejs" if it is the default?
Defaults change, and a future Edge default would silently break this route. The explicit pin reads as documented intent and survives Next.js minor upgrades. Same reasoning for dynamic = "force-dynamic".
Will the legacy pages/api flushing bug ever be fixed?
Per Vercel Discussion #48427 the Vercel team's position is that App Router Route Handlers are the supported streaming path; the pages-router behavior is unlikely to change. Migrate, don't work around.
What happens if Anthropic returns a 429 mid-stream?
The SDK throws inside the for await loop; the Route Handler's try/catch converts it into event: error with the message string. The client component in this article logs that into the visible text; a real chat surface should special-case rate limits and retry with backoff.
Can I run two parallel streams to the same browser?
Yes. Each fetch-plus-reader is independent; multiple concurrent streams share the HTTP/2 connection. The only constraint is per-route concurrency on Vercel, which depends on your plan.
Ren builds agent infrastructure and writes copy-paste tutorials for engineers shipping LLM tool-use systems.
Frequently asked questions
Why not use Vercel's AI SDK streamText helper?
The AI SDK abstracts the relay, which is fine and ships in fewer lines, until you need access to raw Anthropic events such as tool_use, input_json_delta, and signature_delta for extended thinking. The Route Handler pattern in this article keeps you on the raw stream so the rest of the agent loop can use those events directly.
Why POST not GET for the chat endpoint?
User messages can exceed URL length limits, and the body is the right place for a previous_messages array. POST plus fetch and ReadableStream is the price for sending a real body; if your endpoint can accept a short prompt as a query parameter, EventSource plus GET is two lines shorter and reconnects automatically.
Does EventSource work on the server side?
Browsers expose EventSource; Node does not in standard library form. On the server, parse the SSE stream manually or use a small package. The parser shown in this article is about 30 lines and avoids the dependency.
Why pin runtime equals nodejs if it is the default?
Defaults change. A future Edge default would silently break this route. The explicit pin reads as documented intent and survives Next.js minor upgrades. Same reasoning for dynamic equals force-dynamic.
Will the legacy pages/api flushing bug ever be fixed?
Per Vercel Discussion 48427 the Vercel team's position is that App Router Route Handlers are the supported streaming path; the pages-router behavior is unlikely to change. Migrate rather than work around.
What happens if Anthropic returns a 429 mid-stream?
The SDK throws inside the for-await loop; the Route Handler's try/catch converts the throw into event: error with the message string. The client component logs that into the visible text; a real chat surface should special-case rate limits and retry with backoff.
Can I run two parallel streams to the same browser?
Yes. Each fetch-plus-reader is independent; multiple concurrent streams share the HTTP/2 connection. The only constraint is per-route concurrency on Vercel, which depends on your plan.
A complete TypeScript tutorial for the streaming agent loop on Claude: input_json_delta accumulation, multi-turn dispatch, AbortController cancellation, and the eager_input_streaming workaround for the verified 5 second first-content delay on tool use. About $0.03 per call with claude-sonnet-4-6 at June 2026 pricing.
You do not need to rebuild your product to ship an AI agent inside it. The trick is to expose the service functions you already have — search records, create an order, fetch a customer — as tools, then run a small server-side agent loop that the model uses to orchestrate them. This tutorial wraps an existing service layer as tools, scopes every call to the authenticated user, separates safe read tools from gated write tools, exposes the agent as one authenticated endpoint, and deploys that endpoint to Totalum. Your database, auth, and business logic stay untouched.
An AI agent is just a loop: you call a model, the model asks to run a tool, you run it, you feed the result back, and you repeat until the model is done. In this tutorial you build that loop yourself in plain TypeScript against the Anthropic Messages API — no framework. You will wire up two tools (read a file, run a calculation), let the model orchestrate them, add a turn cap and basic guardrails, then verify the whole thing end to end. The result is a small research agent you fully understand and can extend with your own tools.