InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Paper • 2509.24663 • Published Sep 29, 2025 • 14
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 29 items • Updated Sep 8, 2025 • 82
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published Feb 17, 2025 • 4
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published Feb 20, 2025 • 8