Topic hub · 25 claims
RAG, retrieval, and verification — grounding LLM responses
Retrieval-augmented generation, signed-claim verification, vector databases, and the frameworks that wire them together. The grounding stack as of 2025.
Why retrieval — the parametric-memory ceiling
An LLM trained on Wikipedia knows what was in Wikipedia at training time. It doesn't know about events after the cut-off. It can't cite specific sources. It hallucinates dates and parameter counts confidently when its parametric memory is fuzzy. Retrieval-Augmented Generation (Lewis et al. 2020) was the first widely-cited answer: combine a frozen pretrained model with a non-parametric memory you control + update.
The grounding stack
Modern grounding pipelines have three layers. Retrieval — embed your corpus (often with FAISS, Pinecone, Weaviate, Qdrant), retrieve top-K at query time. Augmentation — splice retrieved chunks into the prompt. Verification — check the model's output against a source-of-truth (this is where SourceScore VERITAS sits). Self-RAG (Asai et al. 2023) is the in-model variant; signed claim verification is the out-of-model variant.
The framework ecosystem
LangChain (Harrison Chase 2022-10) and LlamaIndex (Jerry Liu 2022-11) emerged within two weeks of each other as the dominant Python orchestration layers. DSPy (Stanford 2023) takes the programs-not-prompts approach. Pydantic AI (2024) adds type-safety. Anthropic's Model Context Protocol (2024-11) is the cross-vendor standard. Each framework has its own primitives but ultimately wires the same retrieval + augmentation + verification loop.
Defined terms (4)
- RAG
- Retrieval-Augmented Generation — pulling relevant documents from a corpus at query time, augmenting the LLM prompt with them, then generating an answer.
- Vector database
- A database optimized for storing and similarity-searching dense vector embeddings. Foundational to RAG retrieval at scale.
- Embedding
- A dense numerical vector that represents a chunk of text (or image, etc.) such that semantically similar chunks produce numerically similar vectors.
- Self-RAG
- A variant of RAG where the model is fine-tuned to emit special reflection tokens deciding when to retrieve and when to self-critique.
All claims in this topic (25)
- Anthropic Tool Use (general availability)·publicly released on 2024-05-30 by Anthropic(1.00 · 2 sources)
- AutoGen·publicly released on 2023-09-25 by Microsoft Research(1.00 · 2 sources)
- Chroma vector database·publicly released on 2023-02-14 by Chroma Inc.(1.00 · 2 sources)
- DeepSpeed·publicly released on 2020-02-13 by Microsoft Research(1.00 · 2 sources)
- FAISS·introduced in Johnson, Douze, Jégou 2017 — Facebook AI Similarity Search(1.00 · 2 sources)
- Haystack·publicly released on 2020-04 by deepset GmbH(1.00 · 2 sources)
- Instructor library·introduced in Jason Liu 2023 — structured outputs from LLMs via Pydantic(1.00 · 2 sources)
- JAX·publicly released on 2018-12-10 by Google Research(1.00 · 2 sources)
- LangChain framework·publicly released on 2022-10-25 by Harrison Chase(1.00 · 2 sources)
- LlamaIndex framework·publicly released on 2022-11-09 by Jerry Liu (originally GPT Index)(1.00 · 2 sources)
- Microsoft Semantic Kernel·publicly released on 2023-03-17 by Microsoft(1.00 · 2 sources)
- Milvus vector database·publicly released on 2019-10-15 by Zilliz(1.00 · 2 sources)
- Model Context Protocol (MCP)·publicly released on 2024-11-25 by Anthropic(1.00 · 2 sources)
- OpenAI Function Calling·publicly released on 2023-06-13 by OpenAI(1.00 · 2 sources)
- pgvector·publicly released on 2021-04-20 by Andrew Kane — Postgres vector extension(1.00 · 2 sources)
- PyTorch·publicly released on 2017-01-18 by Facebook AI Research(1.00 · 2 sources)
- ReAct (Reasoning + Acting)·introduced in paper ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)(1.00 · 2 sources)
- Retrieval-Augmented Generation (RAG)·introduced in paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020)(1.00 · 2 sources)
- Self-RAG·introduced in Asai et al. 2023 — self-reflective retrieval-augmented generation(1.00 · 2 sources)
- TensorFlow·publicly released on 2015-11-09 by Google(1.00 · 2 sources)
- Toolformer·introduced in Schick et al. 2023 — self-supervised LLM tool-use(1.00 · 2 sources)
- CrewAI·publicly released on 2023-12 by João Moura — multi-agent orchestration framework(0.95 · 2 sources)
- Pinecone·founded in 2019(0.90 · 2 sources)
- Qdrant·founded in 2021(0.85 · 2 sources)
- Weaviate·founded in 2019(0.85 · 2 sources)