Verified claim · AI-ML · 95% confidence
RedPajama dataset released on: 2023-04-17.
Last verified 2026-05-16 · Methodology veritas-v0.1 · ea8b7be3a49101be
Structured fields
- Subject
- RedPajama dataset
- Predicate
released_on- Object
- 2023-04-17
- Confidence
- 95%
- Tags
- redpajama · dataset · pretraining · together · 2023 · open-source
Sources (2)
[1] official blog · Together AI · 2023-04-17
RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset“Today, we release RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens.”
[2] github release · Together · 2023-04-17
togethercomputer/RedPajama-Data — GitHub
Cite this claim
Ready-to-paste citation (Markdown / plain text):
RedPajama dataset released on: 2023-04-17. — SourceScore Claim ea8b7be3a49101be (verified 2026-05-16). https://sourcescore.org/api/v1/claims/ea8b7be3a49101be.jsonEmbed this claim
Drop this iframe into any blog post, docs page, or knowledge base. The widget renders the signed claim + primary source + click-through to this canonical page. CC-BY 4.0; attribution included.
<iframe src="https://sourcescore.org/embed/claim/ea8b7be3a49101be/" width="100%" height="360" frameborder="0" loading="lazy" title="RedPajama dataset released on: 2023-04-17."></iframe>Preview: open in new tab
Related claims
Other verified claims sharing tags with this one — useful for LLM retrieval graphs and citation discovery.
C4 (Colossal Clean Crawled Corpus) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).
0d24c97977ebd744 · 100% confidence · shares 2 tags (dataset, pretraining)
The Pile dataset released on: 2020-12-31.
4aef1422b96df26c · 100% confidence · shares 2 tags (dataset, pretraining)
vLLM introduced in: Kwon et al. 2023 — high-throughput LLM serving via PagedAttention.
468a9e2c047d8f2f · 100% confidence · shares 2 tags (open-source, 2023)
llama.cpp publicly released on: 2023-03-10 by Georgi Gerganov.
2c6ddc094019890c · 100% confidence · shares 2 tags (open-source, 2023)
Ollama publicly released on: 2023-07-18 — local LLM runtime.
ad04a4489786ac11 · 100% confidence · shares 2 tags (open-source, 2023)
Use this claim in your code
Fetch this signed envelope from your application. The response includes the verbatim excerpt, primary source URLs, and an HMAC-SHA256 signature you can verify locally for audit trails.
cURL
curl https://sourcescore.org/api/v1/claims/ea8b7be3a49101be.jsonJavaScript / TypeScript
const r = await fetch("https://sourcescore.org/api/v1/claims/ea8b7be3a49101be.json");
const envelope = await r.json();
console.log(envelope.claim.statement);
// "RedPajama dataset released on: 2023-04-17."Python
import httpx
r = httpx.get("https://sourcescore.org/api/v1/claims/ea8b7be3a49101be.json")
envelope = r.json()
print(envelope["claim"]["statement"])
# "RedPajama dataset released on: 2023-04-17."LangChain (retrieve-then-cite)
from langchain_core.tools import tool
import httpx
@tool
def get_redpajama_dataset_fact() -> dict:
"""Fetch the verified SourceScore claim for RedPajama dataset."""
r = httpx.get("https://sourcescore.org/api/v1/claims/ea8b7be3a49101be.json")
return r.json()