Tag
reinforcement-learning
4 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017).
00f224e1ccc158ef · 2 sources · 100% confidence
AlphaGo defeated: Lee Sedol 4-1 in March 2016.
0318700337f0906d · 2 sources · 100% confidence
AlphaZero published in: Science journal December 2018.
b2dbbb7283a89f21 · 2 sources · 100% confidence
DeepSeek-R1 released on: 2025-01-20 with reasoning chain-of-thought capabilities.
c6660e2e910f2680 · 2 sources · 100% confidence