Skip to main content

LLM Best Practices

❯

AI Agents & LLM Engineering

AI Agents & LLM Engineering

Jun 16, 20264 min read

Building with LLMs as collaborators and as services. Start at claude-code for day-to-day workflows; jump to rag or mcp-servers when shipping retrieval or tool-using systems.

Pages

claude-code: Claude Code workflow patterns.
claude-code-claude-md: CLAUDE.md as durable session memory; mission, voice rules, schema, and folder map.
claude-code-hooks: pre-tool, post-tool, stop, and user-prompt-submit hooks in settings.json for automation.
claude-code-mcp: connect MCP servers via settings.json, scope tools to repos, pass secrets through env.
claude-code-permissions: the five permission modes via permissions.defaultMode, allowlists in settings.json, per-project tool scope.
claude-code-pitfalls: the six most common Claude Code failure patterns and the brief-level guardrails that prevent them.
claude-code-skills: user-invocable skills in .claude/skills/ to package reusable prompts and hook sequences.
claude-code-subagents: define subagents as .claude/agents/*.md frontmatter files; isolate file-writing workers in git worktrees; coordinate through branches.
Prompt Engineering: Best practices, templates, evals, injection defense, chains, caching, reasoning models. Includes the moved [[prompt-engineering/prompt-design]] and [[prompt-engineering/chain-of-thought]] pages.
system-prompts: What belongs in system vs user; structure; versioning.
role-framing: Set the role; prime expertise; defeat sycophancy.
few-shot: When examples help; three to five; diversity over volume.
structured-output: JSON Schema via tool use; strict mode; Pydantic plus instructor.
prompt-injection-defense: Untrusted text is data; delimiters; sandboxed tools (threat-model framing). Complements [[prompt-engineering/prompt-injection-defense]].
examples-vs-rules: When examples beat rules; combining both safely.
multi-agent: Orchestrator-worker and planner-executor patterns.
agent-architecture-patterns: The pillar for agent structures: augmented LLM, workflows, orchestrator-worker, planner-executor, and evaluator-optimizer, and when each fits.
agentic-workflow-patterns: The five workflow patterns when the path is known: chaining, routing, parallelization, orchestrator-worker, evaluator-optimizer.
reliable-agents-in-production: Scope narrowly, constrain tools, bound loops, evaluate, observe, and degrade gracefully to ship an agent to production.
tool-use-and-function-calling: Define, describe, and guard tools so the model calls them correctly: clear schemas, validation, actionable errors, minimal surface.
rag: Chunking, retrieval, reranking, evaluation.
rag-chunking: Semantic boundaries, 200 to 800 token sweet spot, overlap, chunk metadata.
rag-retrieval: Dense plus sparse hybrid, top-k tuning, metadata pre-filters, multi-query, HyDE.
rag-reranking: Cross-encoder rerankers, retrieve broad and rerank narrow, latency budgets.
rag-eval: Recall@k for retrieval, faithfulness for generation, golden sets, regression dashboards.
rag-citations: Cite every fact, chunk IDs, verifiable links, defending against hallucinated citations.
rag-vector-databases: Pinecone, Qdrant, Weaviate, pgvector, ChromaDB; HNSW tuning; metric choice.
evaluation: Golden sets, LLM-as-judge, eval-driven prompts.
cost-control: Caching, model routing, batch APIs, fallback ladders.
mcp-servers: Designing and shipping MCP servers.
embeddings: Model choice, dimensionality, similarity.
embeddings-cost-control: batch APIs, content-addressed caching, deduplication, input truncation, and budget caps.
embeddings-dimensionality: Matryoshka truncation, storage vs recall trade-offs, and the halve-and-verify rule.
embeddings-eval: golden sets, MRR, recall@k, nDCG, and A/B testing for retrieval systems.
embeddings-hybrid-search: dense plus BM25 with Reciprocal Rank Fusion, query routing, when hybrid beats pure dense.
embeddings-normalization: L2-normalize for cosine equivalence, silent retrieval bugs, library defaults.
embeddings-semantic-cache: cache LLM responses on input embeddings, cosine thresholds, invalidation, pgvector store.
embeddings-voyage-vs-openai: Voyage 4 vs OpenAI text-embedding-3-large: accuracy, cost, multilingual, dimensionality.
openai-sdk-vs-langchain: When to call the OpenAI/Anthropic SDK directly versus wrapping it with LangChain.
ollama: Running local LLMs with Ollama.
ollama-model-selection: How to pick the right Ollama model for your hardware and task: Llama 3.3, Qwen 2.5, and Mistral compared by VRAM, quality, and use case.
ollama-modelfile: How to write an Ollama Modelfile to pin a base model, system prompt, and generation parameters into a versioned local image.
ollama-quantization: How quantization levels affect memory footprint and output quality in Ollama, and which level to pick for each use case.
ollama-serving: How to use Ollama’s REST API, OpenAI-compatible endpoints, streaming responses, and concurrent request handling.
ollama-deployment: How to deploy Ollama for local development, shared GPU servers, and production: Docker, systemd, and reverse proxy with authentication.
mcp-protocol: How the Model Context Protocol is framed over JSON-RPC 2.0, what capabilities a server can advertise, and how client and server roles divide responsibility.
mcp-tool-design: Rules for naming, typing, and describing MCP tools so the model calls them correctly the first time.
mcp-transports: Choose stdio for local servers and Streamable HTTP for remote ones; understand reconnection, lifecycle, and multi-tenant patterns for each transport.
mcp-streamable-http: The current remote MCP transport: single endpoint, Mcp-Session-Id sessions, resumable streams via Last-Event-ID, migration off deprecated HTTP+SSE.
mcp-elicitation: The elicitation capability: servers request structured user input mid-session via a flat JSON Schema, with accept/decline/cancel responses and security implications.
mcp-resources: Design MCP resources as stable URI-addressed data sources for read-only context, separate from tools that perform actions.
mcp-security: Authenticate at the transport, redact secrets from tool outputs, rate-limit per session, and sandbox the filesystem surface to ship MCP servers safely.
mcp-logging: Emit structured logs per tool call with correlation IDs and latency; use MCP Inspector for interactive debugging; know the common failure modes.

Related MOCs

Coding
Backend
Tooling

51 items under this folder.

Jun 15, 2026
AI agent architecture patterns
Jun 15, 2026
Agentic workflow design patterns
Jun 15, 2026
Claude Code: MCP Integration
Jun 15, 2026
Claude Code: Permissions
Jun 15, 2026
Claude Code: Pitfalls
Jun 15, 2026
Claude Code: Subagents
Jun 15, 2026
MCP: Elicitation
Jun 15, 2026
MCP: Streamable HTTP Transport
Jun 15, 2026
How to build reliable AI agents in production
Jun 15, 2026
Tool use and function calling best practices
May 29, 2026
Embeddings: Semantic Cache
May 29, 2026
Embeddings: Voyage 4 vs OpenAI text-embedding-3-large
May 29, 2026
RAG: Chunking
May 29, 2026
RAG: Reranking
May 29, 2026
RAG: Retrieval
May 29, 2026
RAG Best Practices
May 29, 2026
Cost Control
May 29, 2026
Embeddings: Cost Control
May 29, 2026
Embeddings: Dimensionality
May 29, 2026
Embeddings: Evaluation
May 29, 2026
Embeddings
May 29, 2026
MCP: Protocol Fundamentals
May 29, 2026
MCP Servers
May 29, 2026
MCP: Transports
May 21, 2026
Claude Code: CLAUDE.md
May 21, 2026
Claude Code: Hooks
May 21, 2026
Claude Code: Skills
May 21, 2026
Claude Code: Workflow Patterns
May 21, 2026
Agent Evaluation
May 21, 2026
Examples vs Rules
May 21, 2026
Few-Shot Examples
May 21, 2026
MCP: Tool Design
May 21, 2026
Multi-Agent Patterns
May 21, 2026
Ollama: Model Selection
May 21, 2026
Ollama: Modelfile
May 21, 2026
Ollama: Quantization
May 21, 2026
Ollama: REST API and Serving
May 21, 2026
Ollama Best Practices
May 21, 2026
Prompt Injection Defense
May 21, 2026
Role Framing
May 21, 2026
Structured Output
May 21, 2026
System Prompts
May 14, 2026
Ollama: Deployment
May 14, 2026
MCP: Logging and Debugging
May 14, 2026
MCP: Resources
May 14, 2026
MCP: Security
May 14, 2026
Embeddings: Hybrid Search
May 14, 2026
Embeddings: Normalization
May 14, 2026
RAG: Vector Databases
May 14, 2026
RAG: Citations
May 14, 2026
RAG: Evaluation

Created with Quartz v4.5.2 © 2026

GitHub
Hey AI, learn about us
/llms.txt