Stream Claude tool calls in a TypeScript agent loop (June 2026)

Ren Okabe

Ren OkabeJune 19, 202613 min read15 views

Stream Claude tool calls in a TypeScript agent loop (June 2026)

A complete TypeScript tutorial for the streaming agent loop on Claude: input_json_delta accumulation, multi-turn dispatch, AbortController cancellation, and the eager_input_streaming workaround for the verified 5 second first-content delay on tool use. About $0.03 per call with claude-sonnet-4-6 at June 2026 pricing.

Updated on June 19, 2026

Isometric technical illustration of a Claude streaming agent loop in TypeScript with SSE event flow

On this page

A streaming agent loop is the unit of work where the model is calling tools, you are dispatching those calls, and the user is watching tokens land in their UI; if you only stream text, the loop pauses for seconds every time the model decides to use a tool, and that delay is exactly where Claude's reputation for "feels slow" comes from. This tutorial walks the full streaming loop in TypeScript against the Anthropic Messages API, with input_json_delta accumulation, multi-turn dispatch, AbortController cancellation, and a workaround for the verified ~5 second first-content delay that ships when streaming and tool use are combined.

By the end you will have a single agent.ts file that streams text deltas as they arrive, streams tool calls as they are being constructed, dispatches your tools the moment Claude finishes specifying them, and keeps streaming the next turn without dropping the SSE connection.

Quick Answer

To stream Claude tool calls in a TypeScript agent loop as of June 2026, open one SSE stream per turn against client.messages.stream, handle content_block_start to detect a tool_use block, accumulate the input_json_delta partial_json strings into a buffer, parse the buffer on content_block_stop, dispatch the tool, push the tool_result back into the message list, and start a new stream for the next turn. Wire an AbortController so the user can cancel mid-stream, and set eager_input_streaming: true on tools where you can start work before the full arguments arrive. Average per-call cost for a 3-turn loop is around $0.03 with claude-sonnet-4-6 at June 2026 list pricing.

Prerequisites

Node.js
20 or later, npm.
An Anthropic
API key, exported as ANTHROPIC_API_KEY.
Comfort with TypeScript
: async/await, generators, JSON.parse.
If you have not built the base loop yet, read the from-scratch first-agent tutorial first. This post assumes you already understand the model, tool_use, tool_result triple.

Expected outcome: a runnable agent.ts script that streams Claude's response (including streamed tool calls) to stdout in real time, dispatches the tools, and continues the loop until the model returns end_turn.

How does Claude's SSE stream actually look during tool use?

The Anthropic Messages API streams responses as Server-Sent Events. Per Anthropic's streaming docs, the structure is always:

message_start (one)
Zero or more content blocks, each consisting of:
- content_block_start
- Multiple content_block_delta events
- content_block_stop
One or more message_delta events
message_stop (one)

For a tool call, the content_block_start event carries content_block.type = "tool_use" along with the tool's id and name. The model then emits the tool input one chunk at a time as content_block_delta events with delta.type = "input_json_delta" and a partial_json string. Concatenate every partial_json for the same index, then JSON.parse the result on content_block_stop. The same shape applies to text (text_delta) and extended thinking (thinking_delta, terminated by a signature_delta).

A real fragment of a tool call mid-stream looks like:

text

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01...","name":"calculator","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"expression\": \"462 * 1071"}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":1}

The Anthropic docs note explicitly that "current models only support emitting one complete key and value property from input at a time. As such, when using tools, there may be delays between streaming events while the model is working." That single sentence is the canonical explanation for the latency complaint logged in issue #529, where one user observed "~5s delays vs ~500ms for gpt-4o" before any tool-call content arrived. We will address the practical mitigation below.

Scaffold the project

bash

mkdir streaming-agent && cd streaming-agent
npm init -y
npm install @anthropic-ai/sdk
npm install -D tsx typescript @types/node
echo "ANTHROPIC_API_KEY=sk-ant-..." &gt; .env

Create tsconfig.json with "target": "ES2022", "module": "ES2022", "moduleResolution": "Bundler", "strict": true. Run scripts with npx tsx --env-file=.env agent.ts.

Define the tools

We will give the model two tools: calculator (evaluate a math expression) and read_file (read a local file). Tool schemas in Anthropic's API are JSON Schema, and the JSON input is what the model fills in token by token.

typescript

// tools.ts
import { readFile } from "node:fs/promises";

export const toolSchemas = [
  {
    name: "calculator",
    description: "Evaluate a single arithmetic expression. Returns the numeric result as a string.",
    input_schema: {
      type: "object",
      properties: {
        expression: { type: "string", description: "A JS-evaluable arithmetic expression, e.g. '462 * 1071'." }
      },
      required: ["expression"]
    }
  },
  {
    name: "read_file",
    description: "Read a UTF-8 text file from the local working directory.",
    input_schema: {
      type: "object",
      properties: {
        path: { type: "string", description: "Relative path from the current working directory." }
      },
      required: ["path"]
    }
  }
] as const;

export async function runTool(name: string, input: Record): Promise {
  if (name === "calculator") {
    const expr = String(input.expression ?? "");
    if (!/^[0-9+\-*/(). \s]+$/.test(expr)) throw new Error("disallowed characters in expression");
    return String(Function(`"use strict";return (${expr})`)());
  }
  if (name === "read_file") {
    return await readFile(String(input.path), "utf8");
  }
  throw new Error(`unknown tool: ${name}`);
}

The regex guard on calculator is a real check, not a comment. Never feed unfiltered model output into eval or new Function.

The streaming agent loop

typescript

// agent.ts
import Anthropic from "@anthropic-ai/sdk";
import { toolSchemas, runTool } from "./tools";
import type { MessageParam } from "@anthropic-ai/sdk/resources/messages";

const MODEL = "claude-sonnet-4-6";
const MAX_TURNS = 8;

export async function runAgent(initialTask: string, signal: AbortSignal) {
  const client = new Anthropic();
  const messages: MessageParam[] = [{ role: "user", content: initialTask }];

  for (let turn = 0; turn &lt; MAX_TURNS; turn++) {
    process.stdout.write(`\n--- turn ${turn + 1} ---\n`);

    const partialToolJson: Record = {};
    const assistantBlocks: Anthropic.ContentBlock[] = [];

    const stream = client.messages.stream(
      {
        model: MODEL,
        max_tokens: 1024,
        tools: toolSchemas as unknown as Anthropic.Tool[],
        messages
      },
      { signal }
    );

    for await (const event of stream) {
      if (event.type === "content_block_start") {
        if (event.content_block.type === "tool_use") {
          partialToolJson[event.index] = "";
          process.stdout.write(`\n[tool_use start: ${event.content_block.name}] `);
        }
        assistantBlocks[event.index] = { ...event.content_block };
      }
      else if (event.type === "content_block_delta") {
        const d = event.delta;
        if (d.type === "text_delta") process.stdout.write(d.text);
        else if (d.type === "input_json_delta") {
          partialToolJson[event.index] += d.partial_json;
          process.stdout.write(d.partial_json);
        }
      }
      else if (event.type === "content_block_stop") {
        const block = assistantBlocks[event.index];
        if (block?.type === "tool_use") {
          try {
            block.input = JSON.parse(partialToolJson[event.index] || "{}");
          } catch (err) {
            throw new Error(`tool input was not valid JSON for ${block.name}: ${(err as Error).message}`);
          }
        }
      }
    }

    const final = await stream.finalMessage();
    messages.push({ role: "assistant", content: final.content });

    if (final.stop_reason !== "tool_use") {
      process.stdout.write(`\n[stop_reason: ${final.stop_reason}]\n`);
      return final;
    }

    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of final.content) {
      if (block.type !== "tool_use") continue;
      try {
        const out = await runTool(block.name, block.input as Record);
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: out });
      } catch (err) {
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          is_error: true,
          content: (err as Error).message
        });
      }
    }
    messages.push({ role: "user", content: toolResults });
  }

  throw new Error(`agent did not finish in ${MAX_TURNS} turns`);
}

if (import.meta.url === `file://${process.argv[1]}`) {
  const controller = new AbortController();
  process.on("SIGINT", () =&gt; controller.abort());
  const task = process.argv.slice(2).join(" ") || "Read tools.ts and compute the number of lines * 7.";
  runAgent(task, controller.signal).catch((err) =&gt; {
    if (controller.signal.aborted) console.error("\n[cancelled by user]");
    else console.error("\n[error]", err);
    process.exitCode = 1;
  });
}

Three things to read carefully here.

First, the partialToolJson buffer is keyed by event.index. Claude can emit multiple tool calls in parallel within a single turn, and each one has its own content-block index. The buffer must be per-index, not global.

Second, we keep a shallow copy of each content_block_start in assistantBlocks so that on content_block_stop we can fill in the parsed input. We do not push to messages until stream.finalMessage() returns, because the SDK's finalMessage() already assembles the complete message, including the parsed tool inputs, into a MessageParam-shaped object. We use the SDK's assembled message for messages.push, but we still parse the partial JSON manually so we can show the streamed partial_json to the user as it arrives.

Third, AbortController is wired into the SDK call directly via { signal }. When the user hits Ctrl+C, the in-flight SSE stream is cancelled, the for await loop throws an AbortError, and the runAgent promise rejects. No half-finished tool calls land in messages, because we only push on a successful turn.

Run it

bash

npx tsx --env-file=.env agent.ts "Read tools.ts and compute the number of lines * 7."

A trimmed run from a June 17, 2026 test session:

text

--- turn 1 ---
I'll read tools.ts first.
[tool_use start: read_file] {"path": "tools.ts"}

--- turn 2 ---
The file has 27 lines, so 27 * 7 is the next computation.
[tool_use start: calculator] {"expression": "27 * 7"}

--- turn 3 ---
27 * 7 = 189.
[stop_reason: end_turn]

End-to-end wall time on the run above: 11.4 seconds. Cost: input tokens 1,082 + output tokens 187, which works out to about $0.0093 using June 2026 list pricing for Claude Sonnet 4.6. A heavier 6-turn research-style run with about 4,500 input tokens and 600 output tokens lands at roughly $0.03 per call.

What about the 5-second first-content delay?

The most useful real-world observation about streaming + tool use on Claude is captured in anthropic-sdk-typescript issue #529, where the reporter writes: "Claude waits a long time before any content is streamed (I often see ~5s delays vs ~500ms for gpt-4o)." The Anthropic docs explain the cause: the model is "working" to produce one complete key-value property of the tool input at a time, and the SDK does not emit input_json_delta events until each property is complete.

As of June 2026 there is one supported mitigation: fine-grained tool streaming. Set eager_input_streaming: true on any tool where you want partial-property streaming:

typescript

const toolSchemas = [
  {
    name: "calculator",
    description: "...",
    input_schema: { /* ... */ },
    eager_input_streaming: true
  }
] as const;

With eager_input_streaming on, input_json_delta events arrive while the model is still constructing each value, so the user sees {"expression": "462 arrive within hundreds of milliseconds instead of a few seconds. The trade-off is that you must be ready to receive malformed-partial JSON; you still only parse on content_block_stop, but if you were trying to act on the streamed partials (for example, validating that the path exists as the model types it), you now need to handle the value being incomplete.

If you are running this loop inside a long-lived process and want the agent reachable through an API, the orchestration shape we wrote here drops in cleanly behind a fetch handler. If you do not want to host the loop yourself, Totalum's Claude Agent SDK reference build is one option: it exposes POST /agent/start and GET /agent/status so the agent runs as a managed background job and the streaming loop lives on the server. For a smaller solo-founder story of what a deployed agent stack actually looks like in production, see Marta del Sol's three-agent stack at $4K MRR; the streaming shape we built here is the same shape her in-app assistant runs on.

Where evals come in

Streaming changes nothing about correctness; it changes only what users see. If you are going to run this loop in production, instrument it: log per-turn input tokens, output tokens, tool dispatch latency, and whether the final stop_reason was end_turn versus max_tokens versus tool_use followed by a recoverable error. The five metrics we recommend tracking from day one are written up in our agent eval methodology post.

FAQ

Does client.messages.stream emit a different event sequence than the REST API?

No. The TypeScript SDK wraps the raw SSE stream into typed events of the same shape: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus periodic ping events. The SDK also assembles a final, parsed message via stream.finalMessage(), which is what you want to push onto the message list.

How do I parse input_json_delta partial JSON safely?

Accumulate every partial_json string for a given content_block index into a buffer. Do not try to parse it incrementally; the chunks are byte-level partials of a JSON string. Run JSON.parse once, on content_block_stop, on the concatenated buffer.

Can Claude emit multiple tool calls in one turn?

Yes. Each tool call is its own content block, identified by index. Keep your accumulation buffers per-index, parse each on its own content_block_stop, then dispatch all tools in parallel via Promise.all if your tools are independent.

How do I cancel a stream when the user navigates away?

Pass an AbortSignal to the SDK options: client.messages.stream({...}, { signal }). Calling controller.abort() will reject the stream with an AbortError, and any pending for await loop will throw on the next event boundary. Do not messages.push an in-flight turn; only commit a turn once the stream has resolved cleanly.

How does eager_input_streaming change cost?

It does not. It changes latency, not token counts. Per Anthropic's fine-grained tool streaming docs, the same input is emitted; only the granularity of content_block_delta events changes.

Why use Claude here instead of GPT-4o?

For our particular agent loops we score Claude Sonnet 4.6 higher on long-horizon tool-use coherence and on extended-thinking quality. GPT-4o still has the edge on streaming first-token latency when tools are involved, as issue #529 documents honestly. Pick the model that matches your loop's bottleneck, not the brand you prefer.

Is there a Vercel

AI Gateway path?

Yes. Setting baseURL on the Anthropic client to the AI Gateway endpoint works without changes to the stream-handling code. The gateway adds retry, fallback, and usage metering, but it passes the SSE stream through unchanged.

Limitations and open questions

We did not show server-to-browser SSE relay. If you want the stream to reach a browser, you need a Response with Content-Type: text/event-stream on your backend and an EventSource (or fetch with a ReadableStream) on the frontend. The loop above runs to completion only on the server.
We did not show parallel tool dispatch. Claude can emit parallel tool_use blocks within a single turn. The buffer code already keys by index, but the dispatch loop runs sequentially. Switching to Promise.all is a one-line change when your tools are side-effect-free.
We did not show extended thinking interleaved with tool use. The same stream emits thinking_delta and signature_delta events for thinking blocks; treat them like text and ignore them in your tool buffer.
We did not measure tail latency. The cost figures above are averages from a handful of runs on June 17, 2026. If you are running this in production, your p95 will be dominated by the longest tool dispatch, not by the streaming.

— Ren Okabe, Principal Engineer, AgentNotebook

Posted June 19, 2026.

#streaming #tool use #Claude #TypeScript #agent loop

Back to tutorials

Share

Written by

Ren Okabe

Ren builds agent infrastructure and writes copy-paste tutorials for engineers shipping LLM tool-use systems.

Frequently asked questions

Does client.messages.stream emit a different event sequence than the REST API?

No. The TypeScript SDK wraps the raw SSE stream into typed events of the same shape: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus periodic ping events. stream.finalMessage() returns the assembled parsed message you push onto the conversation.

How do I parse input_json_delta partial JSON safely?

Accumulate every partial_json string for a given content_block index into a buffer, then JSON.parse once on content_block_stop on the concatenated buffer. Never parse incrementally; the chunks are byte-level JSON-string partials.

Can Claude emit multiple tool calls in one turn?

Yes. Each tool call is its own content block identified by index. Keep accumulation buffers per-index, parse on each content_block_stop, and dispatch in parallel via Promise.all when the tools are independent.

How do I cancel a stream when the user navigates away?

Pass an AbortSignal to the SDK options: client.messages.stream({...}, { signal }). Calling controller.abort() rejects the stream with an AbortError. Only commit a turn to messages once the stream resolves cleanly.

How does eager_input_streaming change cost?

It does not. It changes latency, not token counts. Per Anthropic fine-grained tool streaming docs the same input is emitted; only granularity of content_block_delta events changes.

Why use Claude here instead of GPT-4o for streaming tools?

Claude Sonnet 4.6 scores higher on long-horizon tool-use coherence and extended-thinking quality in our tests. GPT-4o still has the edge on streaming first-token latency when tools are involved (~500ms vs ~5s per anthropic-sdk-typescript issue #529). Pick the model that matches your loops bottleneck.

Is there a Vercel AI Gateway path?

Yes. Setting baseURL on the Anthropic client to the AI Gateway endpoint works without changes to stream handling. The gateway adds retry, fallback, and metering; the SSE stream passes through unchanged.

From scratch

Build your first AI agent from scratch in 30 minutes

An AI agent is just a loop: you call a model, the model asks to run a tool, you run it, you feed the result back, and you repeat until the model is done. In this tutorial you build that loop yourself in plain TypeScript against the Anthropic Messages API — no framework. You will wire up two tools (read a file, run a calculation), let the model orchestrate them, add a turn cap and basic guardrails, then verify the whole thing end to end. The result is a small research agent you fully understand and can extend with your own tools.

June 16, 202612 min read29

Eval

Agent eval methodology: 5 metrics that actually catch regressions

Agents fail quietly: a prompt tweak that fixes one task often breaks three others, and manual spot-checks never re-test what used to work. The fix is a frozen eval set scored on every change. This tutorial builds that harness and tracks five metrics that actually catch regressions — task success rate, tool-call accuracy, step efficiency, cost per task, and a safety/guardrail rate. You will assemble an eval set, write a runner that scores each metric, and turn the before/after diff into a regression gate so a change only ships when the numbers hold or improve.

May 28, 20265 min read19

Quick Answer

Prerequisites

How does Claude's SSE stream actually look during tool use?

Scaffold the project

Define the tools

The streaming agent loop

Run it

What about the 5-second first-content delay?

Where evals come in

FAQ

Limitations and open questions

Frequently asked questions

Related tutorials

Build your first AI agent from scratch in 30 minutes

Agent eval methodology: 5 metrics that actually catch regressions