Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs Paper • 2310.01801 • Published Oct 3, 2023 • 3
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 259