Concept · 2026-05-16
RAG vs signed-claim verification (VERITAS) — when to use each
RAG retrieves prose chunks. VERITAS retrieves typed atomic claims with signatures. They're different shapes, different use cases, different cost models. Most production systems use both.
TL;DR
RAG is the right tool when your knowledge lives in prose — documentation, articles, customer-support tickets, internal wikis. You vector-index it, retrieve semantically similar chunks at query time, paste them into the prompt as context. Works at any scale; tolerates messy unstructured input.
VERITAS (or any signed-claim system) is the right tool when you need atomic facts with verification — specific assertions like "GPT-4 was released on 2023-03-14" with sources you can cite and signatures you can re-verify. Bounded coverage; high precision on what it covers.
They're complementary. RAG covers breadth; VERITAS covers atoms. Most production LLM applications eventually run both.
The shape difference
The two systems retrieve different things:
| Aspect | RAG | VERITAS |
|---|---|---|
| Retrieves | Prose chunks (200-2000 tokens) | Atomic claims (subject + predicate + object) |
| Returns | Semantically similar text | Verified facts with confidence + signature |
| Trust model | Trust the corpus you indexed | Trust the curation methodology + HMAC signature |
| Scale | Millions of chunks easy | Limited by curation effort (today: 91, target Q3: ~150) |
| Hallucination on covered domain | ~10-15% | <1% |
| Coverage | Whatever you index | Whatever the catalog covers (AI/ML today; new verticals Y2) |
| Citation precision | Chunk-level (paragraph at best) | Fact-level (single assertion) |
| Auditability | Manual | Programmatic (signature) |
| Infra requirement | Vector DB + embedding model | HTTP fetch (zero infra) |
Why RAG alone leaks
RAG works well most of the time and fails in a specific class of cases:
- Semantic-similar but factually-wrong chunks. A chunk about "OpenAI launched ChatGPT in 2022" retrieves on a query about GPT-4. Embeddings see the same topic; the dates are different. The model stitches the retrieved date into the wrong context.
- Multiple chunks contradict. Two retrieved chunks disagree on a fact. The model picks one — sometimes the wrong one — without surfacing the contradiction.
- Chunk boundaries split facts. The relevant fact is at the boundary between two retrieved chunks. The model gets half of it; fabricates the other half.
- Indexed corpus is wrong. RAG retrieves faithfully from a corpus. If the corpus has wrong facts, RAG confidently surfaces them as authoritative.
- Citation drift. The model cites "chunk #4 says X" but actually emitted X from its parametric memory and pasted the chunk citation as plausible cover.
Signed-claim verification doesn't have these failure modes because the unit is the typed fact, not a prose chunk. Either the catalog has the claim (return it, confidence-stamped) or it doesn't (return null, force fallback path).
Why VERITAS alone is insufficient
The signed-claim approach has its own limits:
- Coverage gaps. If the user asks about something the catalog doesn't cover — say, a specific product feature or a niche academic result — VERITAS returns null and your code falls through to whatever else you have. Without RAG as the fallback, your application is silent.
- Curation latency. New facts (a model released yesterday) take time to verify and enter the catalog. RAG over a freshly-indexed news corpus is faster.
- Prose vs claim shape. Some queries genuinely want explanatory prose, not a typed fact. "How does attention work?" isn't an atomic claim — it's a paragraph. RAG over the right corpus answers; VERITAS doesn't.
- Subjective questions. "Is GPT-4 better than Claude 3 for code generation?" isn't a fact — it's an evaluation. VERITAS explicitly doesn't ship performance-comparison claims (see the methodology post). RAG over benchmark reports + community discussion fills this gap.
The hybrid pattern
Three layers, ordered by precision:
# Pseudocode
async def answer(question):
# Layer 1 — VERITAS for atomic facts (high precision)
veritas_claims = await veritas_search(question, limit=3)
# Layer 2 — RAG for explanatory context
rag_chunks = await vector_search(question, limit=5)
# Layer 3 — model prompt with both, ordered by trust
context = ""
if veritas_claims:
context += "Verified atomic facts (cite [claim_id]):\n"
context += "\n".join(f"- {c.statement} [{c.id}]" for c in veritas_claims)
if rag_chunks:
context += "\n\nDocumentation context (cite [chunk_id]):\n"
context += "\n".join(f"- {c.text} [{c.id}]" for c in rag_chunks)
return await llm.generate(
f"Use the verified facts first. Cite every assertion.\n\n{context}\n\nQ: {question}"
)The model is instructed to prefer verified facts over RAG chunks when both cover the same assertion. Verification badges in the UI distinguish the two — clicking [claim_id] opens the canonical SourceScore page; clicking [chunk_id] opens your indexed source.
Cost comparison
For a typical production query (~1,000 queries/day):
- RAG: embedding model API ~$0.0001/query + vector DB hosted ~$30/mo + storage. ~$0.001/query total.
- VERITAS Free tier: 1,000 calls/mo free, then ~€0.0004/call on the next tier (Indie €19 for 50,000 calls = ~€0.00038/call). Volume tiers (€99 / €499) drop per-call cost further.
- Hybrid: additive. ~$0.0018/query for both layers.
The marginal cost of adding VERITAS to an existing RAG stack is small relative to the value of reducing hallucination on covered atoms. The ROI is dominated by your hallucination cost — if it's zero, neither matters; if it's high, both are cheap.
When to use RAG only (skip VERITAS)
- Your domain isn't AI/ML and isn't covered by any other signed-claim catalog.
- Your queries are explanatory, not factual (how does X work, why is Y, summarize Z).
- You have a high-quality indexed corpus you trust completely.
- Latency is dominant; you can't afford the verification step.
When to use VERITAS only (skip RAG)
- Your application is bounded to AI/ML facts.
- You don't have time to build a RAG pipeline.
- You need zero-infra grounding — VERITAS is one HTTP call, no vector DB.
- You need programmatic auditability of cited facts (signatures).
Is RAG dead?
No. The "RAG is dead" takes you see on Twitter are marketing for the framing-of-the-month, not a methodological shift. RAG remains the default pattern for indexing your own corpus of prose, and that need isn't going away.
What's changed is the recognition that RAG isn't a complete grounding solution — it's one layer in a stack. Signed-claim verification, tool-use grounding, prompt-stuffed invariants, and structured-output schemas all sit alongside RAG in a production-grade pipeline.
Getting started
If you have RAG today: add VERITAS as a parallel retrieval step. Five lines of code change, no infrastructure change. See the 5-line Python tutorial.
If you don't have RAG yet: start with VERITAS for atomic facts (covers ~40-60% of typical AI/ML domain queries). Add RAG once your application has shipped and you've seen which queries fall outside the VERITAS catalog. The data tells you which layer to invest in.
Further reading
- LLM grounding — the broader concept
- LLM hallucination — what grounding fixes
- 5-line Python verification tutorial
- LangChain integration — including a retrieve-then-cite pattern that mirrors classic RAG flow
- Quickstart