Verified claim · AI-ML · 100% confidence
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
Last verified 2026-05-16 · Methodology veritas-v0.1 · a3e691683a4577af
Structured fields
- Subject
- Direct Preference Optimization (DPO)
- Predicate
introduced_in_paper- Object
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)
- Confidence
- 100%
- Tags
- dpo · alignment · foundational · rafailov · 2023 · nips · stanford
Sources (2)
[1] preprint · arXiv (Rafailov, Sharma, Mitchell, Ermon, Manning, Finn) · 2023-05-29
Direct Preference Optimization: Your Language Model is Secretly a Reward Model“In this paper, we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss.”
[2] peer reviewed · NeurIPS Foundation · 2023-12-10
Direct Preference Optimization (NeurIPS 2023 proceedings)
Cite this claim
Ready-to-paste citation (Markdown / plain text):
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023). — SourceScore Claim a3e691683a4577af (verified 2026-05-16). https://sourcescore.org/api/v1/claims/a3e691683a4577af.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/a3e691683a4577af/" width="100%" height="360" frameborder="0" loading="lazy" title="Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 100% confidence · shares 3 tags (alignment, foundational, nips)
Transformer architecture introduced in paper: Attention Is All You Need (Vaswani et al., 2017).
ad17e76a8baad7a1 · 100% confidence · shares 2 tags (foundational, nips)
Retrieval-Augmented Generation (RAG) introduced in paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020).
d15057ced937a103 · 100% confidence · shares 2 tags (foundational, nips)
Chinchilla scaling laws introduced in paper: Training Compute-Optimal Large Language Models (Hoffmann et al., 2022).
8befcae6bce01a95 · 100% confidence · shares 2 tags (foundational, nips)
Mamba state-space model introduced in paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu, Dao, 2023).
3518f8aa40cb0d36 · 100% confidence · shares 2 tags (foundational, 2023)
Use this claim in your code
Fetch this signed envelope from your application. The response includes the verbatim excerpt, primary source URLs, and an HMAC-SHA256 signature you can verify locally for audit trails.
cURL
curl https://sourcescore.org/api/v1/claims/a3e691683a4577af.jsonJavaScript / TypeScript
const r = await fetch("https://sourcescore.org/api/v1/claims/a3e691683a4577af.json");
const envelope = await r.json();
console.log(envelope.claim.statement);
// "Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)."Python
import httpx
r = httpx.get("https://sourcescore.org/api/v1/claims/a3e691683a4577af.json")
envelope = r.json()
print(envelope["claim"]["statement"])
# "Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023)."LangChain (retrieve-then-cite)
from langchain_core.tools import tool
import httpx
@tool
def get_direct_preference_optimization_dpo_fact() -> dict:
"""Fetch the verified SourceScore claim for Direct Preference Optimization (DPO)."""
r = httpx.get("https://sourcescore.org/api/v1/claims/a3e691683a4577af.json")
return r.json()