Foundational AI/ML papers — the canonical reading list

Why a canonical reading list matters

Production AI engineers don't have time to triangulate dates from sometimes-wrong blog posts. "When was the transformer paper published?" should be a 100ms lookup, not a 10-minute SERP triangulation. This hub catalogs the foundational papers with verified dates, authors, venues, and verbatim excerpts — every claim has ≥2 primary sources.

Pre-Transformer era

The deep-learning revival ran on architectures and ideas that pre-date the Transformer. LSTM (Hochreiter & Schmidhuber 1997), Dropout (Hinton et al. 2014), GloVe (Pennington, Socher, Manning 2014), Word2Vec (Mikolov et al. 2013) — the recurrent + embedding foundation that 2015-2017 transformer work would surpass but not erase.

Transformer + pretraining era (2017-2020)

Attention Is All You Need (Vaswani et al. 2017) opened the door. BERT (Devlin et al. 2019) closed the encoder-only branch. GPT-2 (Radford et al. 2019) shipped the decoder-only architecture that would eventually power frontier models. T5 (Raffel et al. 2020), RoBERTa, DistilBERT, ELECTRA each refined the pretraining recipe.

Frontier methods (2021-2025)

Once architectures stabilized, the innovation moved to alignment (RLHF, Constitutional AI, DPO), efficient inference (FlashAttention, LoRA, QLoRA, GPTQ, vLLM), retrieval grounding (RAG, Self-RAG, ReAct), and tool-use (Toolformer, MCP). Each claim here is a paper that downstream work compounds against.

Defined terms (3)

Foundational paper

A research paper that other AI/ML papers cite as the canonical reference for an architecture, method, or technique.

Pretraining

Training a model on a large general dataset before fine-tuning for a downstream task.

RLHF

Reinforcement learning from human feedback — the alignment technique that produced InstructGPT and ChatGPT.

Foundational AI/ML papers — the canonical reading list

Why a canonical reading list matters

Pre-Transformer era

Transformer + pretraining era (2017-2020)

Frontier methods (2021-2025)

Defined terms (3)

All claims in this topic (85)

Related

Other topic hubs

Concept pillars

Framework integrations