vLLM introduced in: Kwon et al. 2023 — high-throughput LLM serving via PagedAttention. — SourceScore VERITAS embed · SourceScore

SourceScore VERITAS · verified claim100% confidence

vLLM introduced in: Kwon et al. 2023 — high-throughput LLM serving via PagedAttention.

Kwon et al. 2023 — high-throughput LLM serving via PagedAttention

Primary source · preprint · 2023-09-12

Efficient Memory Management for Large Language Model Serving with PagedAttention — arXiv (Kwon, Li, Zhuang, Sheng, Zheng, Yu, Gonzalez, Zhang, Stoica / UC Berkeley)

Last verified 2026-05-16 · 2 sources · 468a9e2c047d8f2fView full claim →