Integration guide

DSPy + VERITAS

DSPy is Stanford's compound-AI-system framework — programs instead of prompts. This guide shows two integration patterns: a custom dspy.Retrieve backed by the VERITAS catalog, and a verify-and-flag post-processor module.

Why DSPy + VERITAS

DSPy programs declare what the system should do (signatures + modules) and leave the how (exact prompts) to the optimizer. That separation makes external retrieval modules first-class — a VERITAS retriever fits naturally into the existing dspy.Retrieve interface.

The compound system gains a typed retrieval path that DSPy's optimizers can reason about — verified-claim retrieval becomes a tunable step, not a brittle prompt-stuffing decision.

Install

pip install dspy-ai requests

Pattern 1 — Custom dspy.Retrieve

Subclass dspy.Retrieve and translate VERITAS search hits into DSPy passages. Each passage carries the claim id, confidence, and source URLs as metadata so downstream modules can render citations.

import dspy
import requests
from typing import List

VERITAS = "https://sourcescore.org/api/v1"

class VeritasRetriever(dspy.Retrieve):
    def __init__(self, k: int = 5, min_confidence: float = 0.8):
        super().__init__(k=k)
        self.min_confidence = min_confidence

    def forward(self, query_or_queries, k=None) -> List[dspy.Example]:
        queries = [query_or_queries] if isinstance(query_or_queries, str) else query_or_queries
        results = []
        for q in queries:
            r = requests.get(
                f"{VERITAS}/search",
                params={"q": q, "limit": k or self.k},
                timeout=8,
            )
            for hit in r.json().get("matches", []):
                if hit.get("confidence", 0) < self.min_confidence:
                    continue
                results.append(
                    dspy.Example(
                        long_text=hit["statement"],
                        claim_id=hit["id"],
                        confidence=hit["confidence"],
                        canonical_url=f"https://sourcescore.org/claims/{hit['id']}/",
                        tags=hit.get("tags", []),
                    ).with_inputs("long_text")
                )
        return results

Wire into a DSPy program

import dspy

# Set up the LM + retriever
lm = dspy.OpenAI(model="gpt-4o-mini", temperature=0)
rm = VeritasRetriever(k=5, min_confidence=0.85)
dspy.settings.configure(lm=lm, rm=rm)

# Define the signature
class CitedAnswer(dspy.Signature):
    """Answer the question using only the verified claims. Cite [claim_id] inline."""
    question: str = dspy.InputField()
    context: list[str] = dspy.InputField(desc="Verified claims with [claim_id] tags")
    answer: str = dspy.OutputField(desc="Answer with [claim_id] citations after every fact")

# Build the program
class CitedRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=5)
        self.generate = dspy.ChainOfThought(CitedAnswer)

    def forward(self, question: str):
        passages = self.retrieve(question).passages
        context = [
            f"{p.long_text} [{p.claim_id}] (conf={p.confidence:.2f})"
            for p in passages
        ]
        return self.generate(question=question, context=context)

program = CitedRAG()
result = program(question="When was the Transformer architecture introduced?")
print(result.answer)

The signature forces a [claim_id] citation after every assertion. DSPy's optimizer can later tune the exact prompt around this signature without changing the contract — VERITAS continues to feed verified passages regardless of which prompt-template the optimizer settles on.

Pattern 2 — Verify post-processor module

When you want free-form generation but a verification layer afterwards, wrap /api/v1/verify in a DSPy module that runs after the answer generation.

class VeritasVerify(dspy.Module):
    """Post-process an answer — verify each assertion against the catalog."""

    def __init__(self, min_confidence: float = 0.85):
        super().__init__()
        self.min_confidence = min_confidence

    def forward(self, answer: str) -> dict:
        lines = [l.strip() for l in answer.split("\n") if l.strip()]
        verified, unverified = [], []
        for line in lines:
            r = requests.post(
                f"{VERITAS}/verify",
                json={"claim": line, "minConfidence": self.min_confidence},
                timeout=8,
            ).json()
            if r.get("bestMatch"):
                verified.append({
                    "text": line,
                    "claim_id": r["bestMatch"]["id"],
                    "confidence": r["bestMatch"]["confidence"],
                    "url": f"https://sourcescore.org/claims/{r['bestMatch']['id']}/",
                })
            else:
                unverified.append(line)
        return dspy.Prediction(
            verified=verified,
            unverified=unverified,
            verification_rate=len(verified) / max(1, len(lines)),
        )

# Composing it into a larger program
class AnswerAndVerify(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate = dspy.ChainOfThought("question -> answer")
        self.verify = VeritasVerify(min_confidence=0.85)

    def forward(self, question: str):
        a = self.generate(question=question)
        v = self.verify(a.answer)
        return dspy.Prediction(
            answer=a.answer,
            verified_claims=v.verified,
            unverified_claims=v.unverified,
            verification_rate=v.verification_rate,
        )

DSPy optimizer compatibility

DSPy's optimizers (BootstrapFewShot, MIPRO, COPRO) can tune around the VERITAS retriever — they'll adjust the prompts that consume the passages, but they can't change what the passages contain. That's by design: the catalog is the trusted layer, the optimizer improves how the model uses it.

A good metric for optimization: verification_rate from AnswerAndVerify. Maximizing it tunes the program toward producing more verifiable assertions — the catalog acts as the ground-truth signal.

Compose with other DSPy modules

The VERITAS retriever composes with any DSPy pattern: ReAct, MultiHopProgram, ProgramOfThought. A multi-hop pattern with verification:

class MultiHopVerified(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retrieve = VeritasRetriever(k=3)
        self.hop1 = dspy.ChainOfThought("question -> sub_question")
        self.hop2 = dspy.ChainOfThought("question, sub_answer -> final_answer")
        self.verify = VeritasVerify()

    def forward(self, question: str):
        sub_q = self.hop1(question=question).sub_question
        passages = self.retrieve(sub_q).passages
        sub_a = "\n".join(p.long_text for p in passages)
        final = self.hop2(question=question, sub_answer=sub_a).final_answer
        return self.verify(final)

Next steps

• Full API reference
• LangChain guide — similar patterns in a different framework
• LlamaIndex guide — Retriever + NodePostprocessor
• OpenAI tool-calls — native function-calling
• Vercel AI SDK — TypeScript/Next.js
• Citation chains — local verification of signed envelopes
• Browse the catalog — 206 verified AI/ML claims