stereoplegic 's Collections Shared params
updated
Matryoshka Diffusion Models
Paper
• 2310.15111
• Published
• 45
SortedNet, a Place for Every Network and Every Network in its Place:
Towards a Generalized Solution for Training Many-in-One Neural Networks
Paper
• 2309.00255
• Published
• 1
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT)
Paper
• 2309.08968
• Published
• 24
Matryoshka Representation Learning
Paper
• 2205.13147
• Published
• 25
Matryoshka: Stealing Functionality of Private ML Data by Hiding Models
in Model
Paper
• 2206.14371
• Published
• 3
MatFormer: Nested Transformer for Elastic Inference
Paper
• 2310.07707
• Published
• 4
Paper
• 1312.4400
• Published
• 1
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Paper
• 2110.07560
• Published
• 2
Visual Programming: Compositional visual reasoning without training
Paper
• 2211.11559
• Published
• 1
One Wide Feedforward is All You Need
Paper
• 2309.01826
• Published
• 34
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with
Architecture-Routed Mixture-of-Experts
Paper
• 2306.04845
• Published
• 4
Improving Differentiable Architecture Search via Self-Distillation
Paper
• 2302.05629
• Published
• 1
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression
For On-device ASR Models
Paper
• 2309.01947
• Published
• 1
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Paper
• 2310.19820
• Published
• 1
Beyond Universal Transformer: block reusing with adaptor in Transformer
for automatic speech recognition
Paper
• 2303.13072
• Published
• 1
Learning a Consensus Sub-Network with Polarization Regularization and
One Pass Training
Paper
• 2302.10798
• Published
• 1
An EMO Joint Pruning with Multiple Sub-networks: Fast and Effect
Paper
• 2303.16212
• Published
• 1
Looped Transformers are Better at Learning Learning Algorithms
Paper
• 2311.12424
• Published
• 1
Looped Transformers as Programmable Computers
Paper
• 2301.13196
• Published
• 1
Learning Stackable and Skippable LEGO Bricks for Efficient,
Reconfigurable, and Variable-Resolution Diffusion Modeling
Paper
• 2310.06389
• Published
• 1
Sliced Recursive Transformer
Paper
• 2111.05297
• Published
• 1
Transformer in Transformer
Paper
• 2103.00112
• Published
• 1
Go Wider Instead of Deeper
Paper
• 2107.11817
• Published
• 1
Sparse Universal Transformer
Paper
• 2310.07096
• Published
Matryoshka Multimodal Models
Paper
• 2405.17430
• Published
• 34
MoEUT: Mixture-of-Experts Universal Transformers
Paper
• 2405.16039
• Published
• 3
Beyond KV Caching: Shared Attention for Efficient LLMs
Paper
• 2407.12866
• Published
• 1