igormolybog 's Collections Datasets
updated
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
• 2311.06783
• Published
• 28
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
Tuning
Paper
• 2311.07574
• Published
• 16
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
Concept Understanding
Paper
• 2401.04575
• Published
• 18
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
• 2402.00159
• Published
• 65
Aya Dataset: An Open-Access Collection for Multilingual Instruction
Tuning
Paper
• 2402.06619
• Published
• 57
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
• 2402.07625
• Published
• 16
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
• 2402.10176
• Published
• 38
StarCoder 2 and The Stack v2: The Next Generation
Paper
• 2402.19173
• Published
• 152
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
• 2405.01470
• Published
• 64
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
• 2405.01481
• Published
• 30