RAG, retrieval, and verification — grounding LLM responses

Why retrieval — the parametric-memory ceiling

An LLM trained on Wikipedia knows what was in Wikipedia at training time. It doesn't know about events after the cut-off. It can't cite specific sources. It hallucinates dates and parameter counts confidently when its parametric memory is fuzzy. Retrieval-Augmented Generation (Lewis et al. 2020) was the first widely-cited answer: combine a frozen pretrained model with a non-parametric memory you control + update.

The grounding stack

Modern grounding pipelines have three layers. Retrieval — embed your corpus (often with FAISS, Pinecone, Weaviate, Qdrant), retrieve top-K at query time. Augmentation — splice retrieved chunks into the prompt. Verification — check the model's output against a source-of-truth (this is where SourceScore VERITAS sits). Self-RAG (Asai et al. 2023) is the in-model variant; signed claim verification is the out-of-model variant.

The framework ecosystem

LangChain (Harrison Chase 2022-10) and LlamaIndex (Jerry Liu 2022-11) emerged within two weeks of each other as the dominant Python orchestration layers. DSPy (Stanford 2023) takes the programs-not-prompts approach. Pydantic AI (2024) adds type-safety. Anthropic's Model Context Protocol (2024-11) is the cross-vendor standard. Each framework has its own primitives but ultimately wires the same retrieval + augmentation + verification loop.

Defined terms (4)

RAG

Retrieval-Augmented Generation — pulling relevant documents from a corpus at query time, augmenting the LLM prompt with them, then generating an answer.

Vector database

A database optimized for storing and similarity-searching dense vector embeddings. Foundational to RAG retrieval at scale.

Embedding

A dense numerical vector that represents a chunk of text (or image, etc.) such that semantically similar chunks produce numerically similar vectors.

Self-RAG

A variant of RAG where the model is fine-tuned to emit special reflection tokens deciding when to retrieve and when to self-critique.

RAG, retrieval, and verification — grounding LLM responses

Why retrieval — the parametric-memory ceiling

The grounding stack

The framework ecosystem

Defined terms (4)

All claims in this topic (25)

Related

Other topic hubs

Concept pillars

Framework integrations