GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Paper • 2305.13245 • Published May 22, 2023 • 6
Fast Transformer Decoding: One Write-Head is All You Need Paper • 1911.02150 • Published Nov 6, 2019 • 9
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published Jun 13, 2024 • 6
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 33
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI Paper • 2410.18441 • Published Oct 24, 2024 • 7