Tag
foundational
68 verified claims carrying this tag. Each has 2+ primary sources and an HMAC-SHA256 signature.
Transformer architecture introduced in paper: Attention Is All You Need (Vaswani et al., 2017).
ad17e76a8baad7a1 · 3 sources · 100% confidence
Reinforcement Learning from Human Feedback (RLHF) introduced in paper: Deep Reinforcement Learning from Human Preferences (Christiano et al., 2017).
67866330cd60e54d · 3 sources · 100% confidence
Retrieval-Augmented Generation (RAG) introduced in paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020).
d15057ced937a103 · 2 sources · 100% confidence
Low-Rank Adaptation (LoRA) introduced in paper: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021).
d7b97d1b93d8d8bc · 2 sources · 100% confidence
Direct Preference Optimization (DPO) introduced in paper: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (Rafailov et al., 2023).
a3e691683a4577af · 2 sources · 100% confidence
BERT (Bidirectional Encoder Representations from Transformers) introduced in paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018).
4c1ee70007dc89c1 · 2 sources · 100% confidence
GPT-2 introduced in paper: Language Models are Unsupervised Multitask Learners (Radford et al., 2019).
859551dc078c46f8 · 2 sources · 100% confidence
ResNet (Residual Networks) introduced in paper: Deep Residual Learning for Image Recognition (He et al., 2015).
4f55f77c4bfb316e · 2 sources · 100% confidence
T5 (Text-to-Text Transfer Transformer) introduced in paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019).
ef28341c3b308737 · 2 sources · 100% confidence
Sparsely-Gated Mixture-of-Experts (MoE) introduced in paper: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (Shazeer et al., 2017).
2d6d7f61f1db6493 · 1 source · 100% confidence
Switch Transformer introduced in paper: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Fedus et al., 2021).
3d9c14b9379038c9 · 2 sources · 100% confidence
Chinchilla scaling laws introduced in paper: Training Compute-Optimal Large Language Models (Hoffmann et al., 2022).
8befcae6bce01a95 · 2 sources · 100% confidence
Proximal Policy Optimization (PPO) introduced in paper: Proximal Policy Optimization Algorithms (Schulman et al., 2017).
00f224e1ccc158ef · 2 sources · 100% confidence
Mamba state-space model introduced in paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Gu, Dao, 2023).
3518f8aa40cb0d36 · 2 sources · 100% confidence
Chain-of-Thought prompting introduced in paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022).
3af924da138ff84c · 2 sources · 100% confidence
Adam optimizer introduced in paper: Adam: A Method for Stochastic Optimization (Kingma, Ba, 2014).
dffbe905003cc581 · 2 sources · 100% confidence
AlexNet introduced in paper: ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky, Sutskever, Hinton, 2012).
98b6e774be89d967 · 2 sources · 100% confidence
ImageNet dataset introduced in paper: ImageNet: A Large-Scale Hierarchical Image Database (Deng et al., 2009).
045e628def62181d · 2 sources · 100% confidence
Vision Transformer (ViT) introduced in paper: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Dosovitskiy et al., 2020).
d3681b0981e0b700 · 2 sources · 100% confidence
Generative Adversarial Networks (GANs) introduced in paper: Generative Adversarial Networks (Goodfellow et al., 2014).
5b0c0612bd9e55b0 · 2 sources · 100% confidence
Variational Autoencoder (VAE) introduced in paper: Auto-Encoding Variational Bayes (Kingma, Welling, 2013).
62789e45973ab631 · 2 sources · 100% confidence
Denoising Diffusion Probabilistic Models (DDPM) introduced in paper: Denoising Diffusion Probabilistic Models (Ho, Jain, Abbeel, 2020).
e700f81fff6f38c7 · 2 sources · 100% confidence
Word2Vec introduced in paper: Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013).
4978f76d228a3db1 · 2 sources · 100% confidence
Byte-Pair Encoding (BPE) for Neural Machine Translation introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
e942c93d70a4dab2 · 2 sources · 100% confidence
ReAct (Reasoning + Acting) introduced in paper: ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022).
fceea64fa7d04d3a · 2 sources · 100% confidence
LoRA (Low-Rank Adaptation) introduced in paper: LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021).
f191b2876790dc6e · 2 sources · 100% confidence
QLoRA introduced in paper: QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023).
767cbe41c961be1a · 2 sources · 100% confidence
Rotary Position Embedding (RoPE) introduced in paper: RoFormer: Enhanced Transformer with Rotary Position Embedding (Su et al., 2021).
f8d64457ba9fd35b · 2 sources · 100% confidence
Byte-Pair Encoding (BPE) for NMT introduced in paper: Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015).
aede848e23c8de8e · 2 sources · 100% confidence
SentencePiece tokenizer introduced in paper: SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Kudo & Richardson, 2018).
0d47bb8eb637a2e4 · 2 sources · 100% confidence
CLIP introduced in paper: Learning Transferable Visual Models From Natural Language Supervision (Radford et al., 2021).
bcdef949cc6d3644 · 2 sources · 100% confidence
ELMo (Embeddings from Language Models) introduced in paper: Deep contextualized word representations (Peters et al., 2018).
ee150c6e44364a3d · 2 sources · 100% confidence
Latent Diffusion Models (LDM) introduced in paper: High-Resolution Image Synthesis with Latent Diffusion Models (Rombach et al., 2021).
1aacbf0bf9248dc7 · 2 sources · 100% confidence
ELECTRA introduced in paper: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (Clark et al., 2020).
2f9c79357e9d4da9 · 2 sources · 100% confidence
GPT-3 introduced in paper: Language Models are Few-Shot Learners (Brown et al., 2020).
7d3e6a39b1656571 · 2 sources · 100% confidence
Codex introduced in paper: Evaluating Large Language Models Trained on Code (Chen et al., 2021).
79be9b25cd64f250 · 2 sources · 100% confidence
SuperGLUE benchmark introduced in paper: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (Wang et al., 2019).
1a1e87145608c91a · 2 sources · 100% confidence
GLUE benchmark introduced in paper: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (Wang et al., 2018).
aa113b5e61d5c214 · 2 sources · 100% confidence
Dropout introduced in paper: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava et al., 2014).
18409e7f8a6d7aac · 2 sources · 100% confidence
Batch Normalization introduced in paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe & Szegedy, 2015).
56c451642ab41e68 · 2 sources · 100% confidence
Layer Normalization introduced in paper: Layer Normalization (Ba, Kiros, Hinton, 2016).
f72db86c784a1b32 · 1 source · 100% confidence
Sequence-to-Sequence Learning (seq2seq) introduced in paper: Sequence to Sequence Learning with Neural Networks (Sutskever, Vinyals, Le, 2014).
ff80a25ed7e83b45 · 2 sources · 100% confidence
Longformer introduced in paper: Longformer: The Long-Document Transformer (Beltagy, Peters, Cohan, 2020).
c3d2ec81d9faf837 · 2 sources · 100% confidence
Reformer introduced in paper: Reformer: The Efficient Transformer (Kitaev, Kaiser, Levskaya, 2020).
76f7f00e79bc18c8 · 2 sources · 100% confidence
BLEU score introduced in paper: BLEU: a Method for Automatic Evaluation of Machine Translation (Papineni et al., 2002).
bf5bdd9756278449 · 2 sources · 100% confidence
ROUGE score introduced in paper: ROUGE: A Package for Automatic Evaluation of Summaries (Lin, 2004).
b0eb5c8ac5b4b21e · 2 sources · 100% confidence
AdamW optimizer introduced in paper: Decoupled Weight Decay Regularization (Loshchilov & Hutter, 2017).
b6d51eba4fc7f918 · 2 sources · 100% confidence
PaLM introduced in paper: PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022).
d58d505fd9d705fe · 2 sources · 100% confidence
Imagen introduced in paper: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Saharia et al., 2022).
30fdfa95f8684ca5 · 2 sources · 100% confidence
Long Short-Term Memory (LSTM) introduced in: 1997 by Hochreiter and Schmidhuber.
97ec4d132871224b · 2 sources · 100% confidence
AlphaGo defeated: Lee Sedol 4-1 in March 2016.
0318700337f0906d · 2 sources · 100% confidence
AlphaZero published in: Science journal December 2018.
b2dbbb7283a89f21 · 2 sources · 100% confidence
RoBERTa introduced in: Liu et al. 2019 — A Robustly Optimized BERT Pretraining Approach.
d4fecb26a4c9cdca · 2 sources · 100% confidence
DistilBERT introduced in: Sanh et al. 2019 — a smaller, faster, cheaper BERT via knowledge distillation.
245af747a3d21061 · 2 sources · 100% confidence
BART introduced in: Lewis et al. 2019 — denoising sequence-to-sequence pretraining.
f5b422e3255fd7c0 · 2 sources · 100% confidence
GloVe introduced in: Pennington, Socher, Manning 2014 — global vectors for word representation.
7f9254f3c0612ed0 · 2 sources · 100% confidence
Backpropagation algorithm popularized in: Rumelhart, Hinton, Williams 1986 — Nature paper.
e5471a750d13a672 · 2 sources · 100% confidence
U-Net introduced in: Ronneberger, Fischer, Brox 2015 — biomedical image segmentation.
4f19829aa2036770 · 2 sources · 100% confidence
AlphaFold 1 introduced in: Senior et al. 2020 — DeepMind protein structure prediction.
a77a8dd48941a53d · 2 sources · 100% confidence
VAE (Variational Autoencoder) introduced in: Kingma & Welling 2013 — auto-encoding variational Bayes.
f1e5afb457a428c6 · 2 sources · 100% confidence
Knowledge Distillation popularized in: Hinton, Vinyals, Dean 2015 — distilling the knowledge in a neural network.
f14acb906ba6c12f · 2 sources · 100% confidence
InstructGPT introduced in: Ouyang et al. 2022 — RLHF-tuned GPT-3, direct ancestor of ChatGPT.
590b9de765b8126e · 2 sources · 100% confidence
Mixture of Experts (MoE) revival popularized in: Shazeer et al. 2017 — outrageously large neural networks via sparse gating.
f068236101568ad7 · 2 sources · 100% confidence
Speculative decoding introduced in: Leviathan, Kalman, Matias 2023 — Google Research.
6cdc7730bf41bb3d · 2 sources · 100% confidence
Anthropic Constitutional AI Harmlessness introduced in paper: Bai et al. 2022 — training a helpful and harmless assistant.
6fa575eb9df5ac32 · 2 sources · 100% confidence
ColBERT introduced in: Khattab & Zaharia 2020 — late-interaction retrieval.
2335984b07f28cac · 2 sources · 100% confidence
ARC-AGI benchmark introduced in: Chollet 2019 — abstraction and reasoning corpus.
cc5df3c14d35fa49 · 2 sources · 100% confidence
GraphRAG introduced in: Edge et al. 2024 — Microsoft Research knowledge-graph RAG.
58a9c41f05c73a22 · 2 sources · 100% confidence