Streaming response patterns for chat interfaces

Overview

Streaming response patterns for chat interfaces deliver model output token-by-token as it is generated, so the user sees text within a few hundred milliseconds instead of waiting for the full completion. The two transports are server-sent events (SSE) and a streamed fetch reading a ReadableStream. The frontend’s job is to render increments, parse partial output safely, and let the user cancel or recover. This page covers the transport; the broader UI rules are in ai-first-applications and llm-product-ux.

Pick SSE or streamed fetch deliberately

For one-way server-to-client token streams, both work; choose by need.

SSE (EventSource or a fetch-based reader) gives automatic reconnection and a simple event protocol; it is text-only and one-directional.
Streamed fetch over a ReadableStream lets you POST a large request body, set headers, and abort with an AbortController; you implement reconnection yourself.

Serve the stream from a route handler that returns a streaming response; see nextjs-route-handlers. For agent tool-streaming over HTTP, see mcp-streamable-http.

Render increments without re-rendering the world

Append tokens to the in-progress message and re-render only that node. Buffer very high-frequency chunks to one paint per animation frame so the UI does not thrash. Keep the streaming message in local state and commit it to the message list on completion; see react-state-management.

Parse partial structured output defensively

When the stream carries JSON for the UI to render, it arrives incomplete. Do not JSON.parse mid-stream. Accumulate the buffer, attempt a tolerant parse for preview, and validate against a schema only when the stream closes. See structured-output.

Make streams cancellable

Wire a stop button to an AbortController that cancels the fetch and tells the server to halt generation, which also stops billing for unused tokens. Cancel the in-flight stream automatically when the user sends a new message. Cancellation is a core affordance of llm-product-ux.

Handle reconnection and incomplete streams

Networks drop mid-stream. Detect a truncated response, mark the message incomplete, and offer resume or retry rather than showing a half-sentence as final. With SSE, use the last event ID to resume; with fetch, retry from a checkpoint. Wrap the stream in an error boundary; see react-error-boundaries.

Apply backpressure on fast streams

If tokens arrive faster than the UI can paint, batch them. Throttle state updates to the frame rate and coalesce chunks, or a long response will jank the main thread. Pair with Suspense boundaries for the initial load; see react-suspense.

Pitfalls

Calling JSON.parse on a partial chunk and crashing the render.
A stop button that hides the UI but does not abort the request or stop server generation.
One React state update per token, thrashing the main thread.
Treating a dropped connection as a completed message.

LLM Best Practices

Explorer

Streaming response patterns for chat interfaces

Overview

Pick SSE or streamed fetch deliberately

Render increments without re-rendering the world

Parse partial structured output defensively

Make streams cancellable

Handle reconnection and incomplete streams

Apply backpressure on fast streams

Pitfalls

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Streaming response patterns for chat interfaces

Overview

Pick SSE or streamed fetch deliberately

Render increments without re-rendering the world

Parse partial structured output defensively

Make streams cancellable

Handle reconnection and incomplete streams

Apply backpressure on fast streams

Pitfalls

Related

Graph View

Table of Contents

Backlinks