Frontend best practices for AI-first applications

Overview

Frontend best practices for AI-first applications start from a different premise than classic web UIs: responses are slow, streamed, nondeterministic, and sometimes wrong. The frontend’s job is to make that tolerable, by streaming output as it arrives, showing what the model is doing, and keeping the human able to correct it. This is the pillar for the AI-frontend cluster; for interaction design see llm-product-ux and for the streaming transport see streaming-chat-interfaces.

Stream the response, never block on it

A multi-second blank wait reads as broken. Stream tokens as they arrive so the user sees progress within a few hundred milliseconds. Render a typing indicator before the first token and incremental text after. The transport is server-sent events or a streamed fetch from a route handler; see streaming-chat-interfaces and nextjs-route-handlers.

Show model state, not just a spinner

The user should always know which phase the system is in: thinking, calling a tool, retrieving, or writing. Surface tool calls and retrieved sources as they happen. Visible state turns latency into legible work and builds trust; an opaque spinner does the opposite. See llm-product-ux.

Treat every output as a draft the user can edit

LLM output is a starting point, not a final answer. Let users edit, regenerate, copy, and undo. Make the generated artifact a first-class editable object rather than read-only text. This keeps the human in control when the model is wrong.

Parse streamed structure defensively

When the model emits structured output for the UI to render, the stream arrives as partial, possibly malformed JSON. Parse incrementally, tolerate incomplete objects mid-stream, and validate against a schema before committing to state. See structured-output.

Handle errors and bad generations as the common case

Network drops, timeouts, refusals, and off-topic generations are normal traffic, not edge cases. Wrap generation UI in error boundaries, offer retry, and never let a failed stream wedge the interface. See react-error-boundaries.

Manage cost and rate limits in the UI

The frontend shapes spend. Debounce input, cancel in-flight requests when the user edits, cache idempotent results, and disable submit while a request is open. Surface rate-limit and quota errors clearly. See cost-control.

Keep it accessible and interruptible

Streaming text must be announced to screen readers without spamming them; use a polite live region and let users stop generation at any time. Accessibility is not optional for AI UIs; see accessibility.

Pitfalls

Blocking the whole screen on a model call instead of streaming.
Rendering raw model markdown without sanitizing it.
Read-only outputs the user cannot fix when the model is wrong.
Assuming valid JSON mid-stream; partial output crashes the parser. Build on react-suspense and react-server-components for the rendering model.

LLM Best Practices

Explorer

Frontend best practices for AI-first applications

Overview

Stream the response, never block on it

Show model state, not just a spinner

Treat every output as a draft the user can edit

Parse streamed structure defensively

Handle errors and bad generations as the common case

Manage cost and rate limits in the UI

Keep it accessible and interruptible

Pitfalls

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Frontend best practices for AI-first applications

Overview

Stream the response, never block on it

Show model state, not just a spinner

Treat every output as a draft the user can edit

Parse streamed structure defensively

Handle errors and bad generations as the common case

Manage cost and rate limits in the UI

Keep it accessible and interruptible

Pitfalls

Related

Graph View

Table of Contents

Backlinks