Overview
Streaming response patterns for chat interfaces deliver model output token-by-token as it is generated, so the user sees text within a few hundred milliseconds instead of waiting for the full completion. The two transports are server-sent events (SSE) and a streamed fetch reading a ReadableStream. The frontend’s job is to render increments, parse partial output safely, and let the user cancel or recover. This page covers the transport; the broader UI rules are in ai-first-applications and llm-product-ux.
Pick SSE or streamed fetch deliberately
For one-way server-to-client token streams, both work; choose by need.
- SSE (
EventSourceor a fetch-based reader) gives automatic reconnection and a simple event protocol; it is text-only and one-directional. - Streamed
fetchover aReadableStreamlets you POST a large request body, set headers, and abort with anAbortController; you implement reconnection yourself.
Serve the stream from a route handler that returns a streaming response; see nextjs-route-handlers. For agent tool-streaming over HTTP, see mcp-streamable-http.
Render increments without re-rendering the world
Append tokens to the in-progress message and re-render only that node. Buffer very high-frequency chunks to one paint per animation frame so the UI does not thrash. Keep the streaming message in local state and commit it to the message list on completion; see react-state-management.
Parse partial structured output defensively
When the stream carries JSON for the UI to render, it arrives incomplete. Do not JSON.parse mid-stream. Accumulate the buffer, attempt a tolerant parse for preview, and validate against a schema only when the stream closes. See structured-output.
Make streams cancellable
Wire a stop button to an AbortController that cancels the fetch and tells the server to halt generation, which also stops billing for unused tokens. Cancel the in-flight stream automatically when the user sends a new message. Cancellation is a core affordance of llm-product-ux.
Handle reconnection and incomplete streams
Networks drop mid-stream. Detect a truncated response, mark the message incomplete, and offer resume or retry rather than showing a half-sentence as final. With SSE, use the last event ID to resume; with fetch, retry from a checkpoint. Wrap the stream in an error boundary; see react-error-boundaries.
Apply backpressure on fast streams
If tokens arrive faster than the UI can paint, batch them. Throttle state updates to the frame rate and coalesce chunks, or a long response will jank the main thread. Pair with Suspense boundaries for the initial load; see react-suspense.
Pitfalls
- Calling
JSON.parseon a partial chunk and crashing the render. - A stop button that hides the UI but does not abort the request or stop server generation.
- One React state update per token, thrashing the main thread.
- Treating a dropped connection as a completed message.