Verified claim · AI-ML · 100% confidence
HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021).
Last verified 2026-05-16 · Methodology veritas-v0.1 · 71ec42731d2c9e0c
Structured fields
- Subject
- HumanEval benchmark
- Predicate
introduced_in_paper- Object
- Evaluating Large Language Models Trained on Code (Chen et al., 2021)
- Confidence
- 100%
- Tags
- humaneval · benchmark · codex · openai · chen · 2021 · code-generation
Sources (2)
[1] preprint · arXiv (Chen et al., OpenAI) · 2021-07-07
Evaluating Large Language Models Trained on Code“We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.”
[2] github release · OpenAI · 2021-07-07
openai/human-eval repository
Cite this claim
Ready-to-paste citation (Markdown / plain text):
HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021). — SourceScore Claim 71ec42731d2c9e0c (verified 2026-05-16). https://sourcescore.org/api/v1/claims/71ec42731d2c9e0c.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/71ec42731d2c9e0c/" width="100%" height="360" frameborder="0" loading="lazy" title="HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021)."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
Codex introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021).
79be9b25cd64f250 · 100% confidence · shares 4 tags (codex, code-generation, openai…)
GitHub Copilot publicly released on: 2021-06-29 (technical preview).
1ddbde847e500ac5 · 100% confidence · shares 3 tags (openai, codex, 2021)
CLIP (Contrastive Language-Image Pretraining) introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
85a3ca745eaf4ee0 · 100% confidence · shares 2 tags (2021, openai)
CLIP introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
bcdef949cc6d3644 · 100% confidence · shares 2 tags (2021, openai)
Low-Rank Adaptation (LoRA) introduced in paper: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021).
d7b97d1b93d8d8bc · 100% confidence · shares 1 tag (2021)
Use this claim in your code
Fetch this signed envelope from your application. The response includes the verbatim excerpt, primary source URLs, and an HMAC-SHA256 signature you can verify locally for audit trails.
cURL
curl https://sourcescore.org/api/v1/claims/71ec42731d2c9e0c.jsonJavaScript / TypeScript
const r = await fetch("https://sourcescore.org/api/v1/claims/71ec42731d2c9e0c.json");
const envelope = await r.json();
console.log(envelope.claim.statement);
// "HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021)."Python
import httpx
r = httpx.get("https://sourcescore.org/api/v1/claims/71ec42731d2c9e0c.json")
envelope = r.json()
print(envelope["claim"]["statement"])
# "HumanEval benchmark introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021)."LangChain (retrieve-then-cite)
from langchain_core.tools import tool
import httpx
@tool
def get_humaneval_benchmark_fact() -> dict:
"""Fetch the verified SourceScore claim for HumanEval benchmark."""
r = httpx.get("https://sourcescore.org/api/v1/claims/71ec42731d2c9e0c.json")
return r.json()