SourceScore

Topic hub · 25 claims

RAG, retrieval, and verification — grounding LLM responses

Retrieval-augmented generation, signed-claim verification, vector databases, and the frameworks that wire them together. The grounding stack as of 2025.

Why retrieval — the parametric-memory ceiling

An LLM trained on Wikipedia knows what was in Wikipedia at training time. It doesn't know about events after the cut-off. It can't cite specific sources. It hallucinates dates and parameter counts confidently when its parametric memory is fuzzy. Retrieval-Augmented Generation (Lewis et al. 2020) was the first widely-cited answer: combine a frozen pretrained model with a non-parametric memory you control + update.

The grounding stack

Modern grounding pipelines have three layers. Retrieval — embed your corpus (often with FAISS, Pinecone, Weaviate, Qdrant), retrieve top-K at query time. Augmentation — splice retrieved chunks into the prompt. Verification — check the model's output against a source-of-truth (this is where SourceScore VERITAS sits). Self-RAG (Asai et al. 2023) is the in-model variant; signed claim verification is the out-of-model variant.

The framework ecosystem

LangChain (Harrison Chase 2022-10) and LlamaIndex (Jerry Liu 2022-11) emerged within two weeks of each other as the dominant Python orchestration layers. DSPy (Stanford 2023) takes the programs-not-prompts approach. Pydantic AI (2024) adds type-safety. Anthropic's Model Context Protocol (2024-11) is the cross-vendor standard. Each framework has its own primitives but ultimately wires the same retrieval + augmentation + verification loop.

Defined terms (4)

RAG
Retrieval-Augmented Generation — pulling relevant documents from a corpus at query time, augmenting the LLM prompt with them, then generating an answer.
Vector database
A database optimized for storing and similarity-searching dense vector embeddings. Foundational to RAG retrieval at scale.
Embedding
A dense numerical vector that represents a chunk of text (or image, etc.) such that semantically similar chunks produce numerically similar vectors.
Self-RAG
A variant of RAG where the model is fine-tuned to emit special reflection tokens deciding when to retrieve and when to self-critique.

All claims in this topic (25)

Related

Framework integrations