Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ahmed-ali 's Collections
Inference-Optimization
Utilities

Inference-Optimization

updated 4 days ago
Upvote
-

  • GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

    Paper • 2305.13245 • Published May 22, 2023 • 6

  • Weighted Grouped Query Attention in Transformers

    Paper • 2407.10855 • Published Jul 15, 2024

  • Fast Transformer Decoding: One Write-Head is All You Need

    Paper • 1911.02150 • Published Nov 6, 2019 • 9

  • MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

    Paper • 2406.09297 • Published Jun 13, 2024 • 6

  • Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

    Paper • 2405.12981 • Published May 21, 2024 • 33

  • Effectively Compress KV Heads for LLM

    Paper • 2406.07056 • Published Jun 11, 2024 • 1

  • The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI

    Paper • 2410.18441 • Published Oct 24, 2024 • 7

  • TransMLA: Multi-head Latent Attention Is All You Need

    Paper • 2502.07864 • Published Feb 11 • 57
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs