Verified claim · AI-ML · 100% confidence
Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017).
Last verified 2026-05-16 · Methodology veritas-v0.1 · 00f224e1ccc158ef
Structured fields
- Subject
- Proximal Policy Optimization (PPO)
- Predicate
introduced_in_paper- Object
- Proximal Policy Optimization Algorithms (Schulman et al., 2017)
- Confidence
- 100%
- Tags
- ppo · reinforcement-learning · foundational · schulman · 2017 · openai · rlhf
Sources (2)
[1] preprint · arXiv (Schulman, Wolski, Dhariwal, Radford, Klimov) · 2017-07-20
Proximal Policy Optimization Algorithms“We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.”
[2] official blog · OpenAI · 2017-07-20
Proximal Policy Optimization
Cite this claim
Ready-to-paste citation (Markdown / plain text):
Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017). — SourceScore Claim 00f224e1ccc158ef (verified 2026-05-16). https://sourcescore.org/api/v1/claims/00f224e1ccc158ef.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/00f224e1ccc158ef/" width="100%" height="360" frameborder="0" loading="lazy" title="Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017)."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 100% confidence · shares 3 tags (rlhf, foundational, 2017)
InstructGPT introduced in: Ouyang et al. 2022 — RLHF-tuned GPT-3, direct ancestor of ChatGPT.
590b9de765b8126e · 100% confidence · shares 3 tags (openai, rlhf, foundational)
Transformer architecture introduced in paper: Attention Is All You Need (Vaswani et al., 2017).
ad17e76a8baad7a1 · 100% confidence · shares 2 tags (foundational, 2017)
InstructGPT methodology introduced in paper: Training language models to follow instructions with human feedback (Ouyang et al., 2022).
5da8f8dffc038b8e · 100% confidence · shares 2 tags (openai, rlhf)
GPT-2 introduced in paper: Language Models are Unsupervised Multitask Learners (Radford et al., 2019).
859551dc078c46f8 · 100% confidence · shares 2 tags (foundational, openai)
Use this claim in your code
Fetch this signed envelope from your application. The response includes the verbatim excerpt, primary source URLs, and an HMAC-SHA256 signature you can verify locally for audit trails.
cURL
curl https://sourcescore.org/api/v1/claims/00f224e1ccc158ef.jsonJavaScript / TypeScript
const r = await fetch("https://sourcescore.org/api/v1/claims/00f224e1ccc158ef.json");
const envelope = await r.json();
console.log(envelope.claim.statement);
// "Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017)."Python
import httpx
r = httpx.get("https://sourcescore.org/api/v1/claims/00f224e1ccc158ef.json")
envelope = r.json()
print(envelope["claim"]["statement"])
# "Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017)."LangChain (retrieve-then-cite)
from langchain_core.tools import tool
import httpx
@tool
def get_proximal_policy_optimization_ppo_fact() -> dict:
"""Fetch the verified SourceScore claim for Proximal Policy Optimization (PPO)."""
r = httpx.get("https://sourcescore.org/api/v1/claims/00f224e1ccc158ef.json")
return r.json()