Ambrose Robinson
Ambroser53
AI & ML interests
None yet
Recent Activity
updated
a collection
5 days ago
RL
new activity
20 days ago
mrfakename/Ministral-3-3B-Base-2512-Llamafied-TextOnly:Not all heroes wear capes
liked
a model
20 days ago
mrfakename/Ministral-3-3B-Base-2512-Llamafied-TextOnly
Organizations
grpo
-
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Paper • 2509.22601 • Published • 29 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 47 -
GEM: A Gym for Agentic LLMs
Paper • 2510.01051 • Published • 89 -
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Paper • 2510.04996 • Published • 15
Embed
Vision
-
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Paper • 2401.13313 • Published • 5 -
BAAI/Bunny-v1_0-4B
Text Generation • 4B • Updated • 65 • 10 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37
Speech
-
parler-tts/parler_tts_mini_v0.1
Text-to-Speech • 0.6B • Updated • 2.9k • 358 -
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Paper • 2405.08317 • Published • 12 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 12 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38
Alignment
-
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 18 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 22 -
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Paper • 2405.19107 • Published • 15 -
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 33
SSM
RL
-
Efficient World Models with Context-Aware Tokenization
Paper • 2406.19320 • Published • 8 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 61 -
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 10
context
RAG
quantisation
LoRA
-
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2 -
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 27 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 90
Commercial
active learning
-
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Paper • 2404.05623 • Published • 3 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 22 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7 -
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Paper • 2406.10023 • Published • 2
Embodiment
pretraining
TTS
-
Autoregressive Speech Synthesis without Vector Quantization
Paper • 2407.08551 • Published • 17 -
Stable Audio Open
Paper • 2407.14358 • Published • 26 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 22.9k • 422 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69
eval
RAG
grpo
-
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Paper • 2509.22601 • Published • 29 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 47 -
GEM: A Gym for Agentic LLMs
Paper • 2510.01051 • Published • 89 -
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
Paper • 2510.04996 • Published • 15
quantisation
Embed
LoRA
-
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2 -
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 27 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 90
Vision
-
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Paper • 2401.13313 • Published • 5 -
BAAI/Bunny-v1_0-4B
Text Generation • 4B • Updated • 65 • 10 -
What matters when building vision-language models?
Paper • 2405.02246 • Published • 103 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37
Commercial
Speech
-
parler-tts/parler_tts_mini_v0.1
Text-to-Speech • 0.6B • Updated • 2.9k • 358 -
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Paper • 2405.08317 • Published • 12 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 12 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38
active learning
-
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Paper • 2404.05623 • Published • 3 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 22 -
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM
Paper • 2406.12168 • Published • 7 -
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Paper • 2406.10023 • Published • 2
Alignment
-
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 18 -
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper • 2405.19332 • Published • 22 -
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Paper • 2405.19107 • Published • 15 -
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 33
Embodiment
SSM
pretraining
RL
-
Efficient World Models with Context-Aware Tokenization
Paper • 2406.19320 • Published • 8 -
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
Paper • 2510.19363 • Published • 61 -
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper • 2512.17008 • Published • 10
TTS
-
Autoregressive Speech Synthesis without Vector Quantization
Paper • 2407.08551 • Published • 17 -
Stable Audio Open
Paper • 2407.14358 • Published • 26 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 22.9k • 422 -
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 69
context