Tag

reinforcement-learning

5 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.

Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017).
00f224e1ccc158ef · 2 sources · 100% confidence
AlphaGo defeated: Lee Sedol 4-1 in March 2016.
0318700337f0906d · 2 sources · 100% confidence
AlphaZero published in: Science journal December 2018.
b2dbbb7283a89f21 · 2 sources · 100% confidence
DeepSeek-R1 released on: 2025-01-20 with reasoning chain-of-thought capabilities.
c6660e2e910f2680 · 2 sources · 100% confidence
Group Relative Policy Optimization (GRPO) introduced in paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024).
f73e50d63643df21 · 3 sources · 92% confidence

Related tags

foundational3 deepmind2 reasoning2 rlhf2 deepseek2 released_on1 20241 openai1 20251 20171