Overview
Frontend best practices for AI-first applications start from a different premise than classic web UIs: responses are slow, streamed, nondeterministic, and sometimes wrong. The frontend’s job is to make that tolerable, by streaming output as it arrives, showing what the model is doing, and keeping the human able to correct it. This is the pillar for the AI-frontend cluster; for interaction design see llm-product-ux and for the streaming transport see streaming-chat-interfaces.
Stream the response, never block on it
A multi-second blank wait reads as broken. Stream tokens as they arrive so the user sees progress within a few hundred milliseconds. Render a typing indicator before the first token and incremental text after. The transport is server-sent events or a streamed fetch from a route handler; see streaming-chat-interfaces and nextjs-route-handlers.
Show model state, not just a spinner
The user should always know which phase the system is in: thinking, calling a tool, retrieving, or writing. Surface tool calls and retrieved sources as they happen. Visible state turns latency into legible work and builds trust; an opaque spinner does the opposite. See llm-product-ux.
Treat every output as a draft the user can edit
LLM output is a starting point, not a final answer. Let users edit, regenerate, copy, and undo. Make the generated artifact a first-class editable object rather than read-only text. This keeps the human in control when the model is wrong.
Parse streamed structure defensively
When the model emits structured output for the UI to render, the stream arrives as partial, possibly malformed JSON. Parse incrementally, tolerate incomplete objects mid-stream, and validate against a schema before committing to state. See structured-output.
Handle errors and bad generations as the common case
Network drops, timeouts, refusals, and off-topic generations are normal traffic, not edge cases. Wrap generation UI in error boundaries, offer retry, and never let a failed stream wedge the interface. See react-error-boundaries.
Manage cost and rate limits in the UI
The frontend shapes spend. Debounce input, cancel in-flight requests when the user edits, cache idempotent results, and disable submit while a request is open. Surface rate-limit and quota errors clearly. See cost-control.
Keep it accessible and interruptible
Streaming text must be announced to screen readers without spamming them; use a polite live region and let users stop generation at any time. Accessibility is not optional for AI UIs; see accessibility.
Pitfalls
- Blocking the whole screen on a model call instead of streaming.
- Rendering raw model markdown without sanitizing it.
- Read-only outputs the user cannot fix when the model is wrong.
- Assuming valid JSON mid-stream; partial output crashes the parser. Build on react-suspense and react-server-components for the rendering model.