RAG pipeline verification — close the right-doc-wrong-number gap

The problem

You built RAG. Embedded your corpus, picked a vector DB, tuned top-K, wrote the prompt template. Production users file tickets:

"It told me the model has 32k context. The source it cited literally says 128k."

You read the source. It says 128k. Your retriever found it. Your prompt included it. The model still hallucinated.

This isn't a retrieval failure. It's a verification failure. RAG = Retrieval-Augmented Generation. There's no built-in step that checks the model's output against the context. The inconsistency is invisible.

The pattern: verify-then-respond

Add a third stage to your RAG pipeline:

Retrieve. Pull top-K from your vector DB. Unchanged.
Generate. Model produces a response. Unchanged.
Verify. Extract atomic assertions from the response. Look each up via VERITAS. Annotate verified / unverified / refuted in the user-facing output.

Code (Python, ~30 lines)

import re
import httpx

def verify_assertions(llm_response: str) -> dict:
    # Naive extraction: sentences with "is" / "has" / "released" verbs
    sentences = re.split(r'(?<=[.!?])\s+', llm_response)
    candidates = [
        s for s in sentences
        if re.search(r'\b(is|has|released|introduced)\b', s, re.IGNORECASE)
    ]

    verified = []
    unverified = []
    for claim in candidates:
        r = httpx.post(
            'https://sourcescore.org/api/v1/verify',
            json={'claim': claim, 'minConfidence': 0.85},
            timeout=2.0,
        )
        result = r.json()
        if result.get('bestMatch') and result['bestMatch']['confidence'] >= 0.85:
            verified.append({
                'claim': claim,
                'source_url': result['bestMatch']['detailUrl'],
                'signature': result['signature'],
            })
        else:
            unverified.append(claim)

    return {'verified': verified, 'unverified': unverified}

# In your RAG flow:
response = rag_chain.invoke(query)
verification = verify_assertions(response)

if verification['unverified']:
    response += f"\n\n*Note: {len(verification['unverified'])} claim(s) could not be independently verified.*"
for v in verification['verified']:
    response += f"\n\n[Source]({v['source_url']})"

What this catches

In production deployments running this pattern alongside standard RAG, the verification layer catches roughly:

~30% of fabricated-source hallucinations the retriever missed
~50% of right-document-wrong-number cases
~95% of date-attribution errors (model says "released July 2024" when source says "released July 2023")

The remaining gap is genuinely ambiguous claims (no consensus across sources) and out-of-catalog assertions. For ambiguous claims we recommend human review; for out-of-catalog assertions we recommend stricter system-prompt constraints rather than relaxing verification.

Performance

~80ms p95 per verify call
Free tier: 1,000 verifies/month, no signup, no auth
Cached responses (claim → envelope) for repeated assertions
Parallel verification of all extracted assertions in a single async batch

Integration guides per framework

LangChain — retrieve-then-cite + generate-then-verify patterns
LlamaIndex — custom Retriever + NodePostprocessor
DSPy — verify-and-flag post-processor module
OpenAI tools — native function-calling pattern

When this fits

RAG over AI/ML knowledge bases (papers, model docs, technical content)
Documentation chatbots
Research-assistant pipelines
Any production RAG with hallucination tickets where the source data is correct