Overview

Streaming response patterns for chat interfaces deliver model output token-by-token as it is generated, so the user sees text within a few hundred milliseconds instead of waiting for the full completion. The two transports are server-sent events (SSE) and a streamed fetch reading a ReadableStream. The frontend’s job is to render increments, parse partial output safely, and let the user cancel or recover. This page covers the transport; the broader UI rules are in ai-first-applications and llm-product-ux.

Pick SSE or streamed fetch deliberately

For one-way server-to-client token streams, both work; choose by need.

  • SSE (EventSource or a fetch-based reader) gives automatic reconnection and a simple event protocol; it is text-only and one-directional.
  • Streamed fetch over a ReadableStream lets you POST a large request body, set headers, and abort with an AbortController; you implement reconnection yourself.

Serve the stream from a route handler that returns a streaming response; see nextjs-route-handlers. For agent tool-streaming over HTTP, see mcp-streamable-http.

Render increments without re-rendering the world

Append tokens to the in-progress message and re-render only that node. Buffer very high-frequency chunks to one paint per animation frame so the UI does not thrash. Keep the streaming message in local state and commit it to the message list on completion; see react-state-management.

Parse partial structured output defensively

When the stream carries JSON for the UI to render, it arrives incomplete. Do not JSON.parse mid-stream. Accumulate the buffer, attempt a tolerant parse for preview, and validate against a schema only when the stream closes. See structured-output.

Make streams cancellable

Wire a stop button to an AbortController that cancels the fetch and tells the server to halt generation, which also stops billing for unused tokens. Cancel the in-flight stream automatically when the user sends a new message. Cancellation is a core affordance of llm-product-ux.

Handle reconnection and incomplete streams

Networks drop mid-stream. Detect a truncated response, mark the message incomplete, and offer resume or retry rather than showing a half-sentence as final. With SSE, use the last event ID to resume; with fetch, retry from a checkpoint. Wrap the stream in an error boundary; see react-error-boundaries.

Apply backpressure on fast streams

If tokens arrive faster than the UI can paint, batch them. Throttle state updates to the frame rate and coalesce chunks, or a long response will jank the main thread. Pair with Suspense boundaries for the initial load; see react-suspense.

Pitfalls

  • Calling JSON.parse on a partial chunk and crashing the render.
  • A stop button that hides the UI but does not abort the request or stop server generation.
  • One React state update per token, thrashing the main thread.
  • Treating a dropped connection as a completed message.