CelesteChen
's Collections
multimodal
updated
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per
Token via Reinforcement Learning
Paper
•
2510.15110
•
Published
•
15
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
•
2510.14528
•
Published
•
111
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully
Open MLLMs
Paper
•
2510.13795
•
Published
•
57
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Paper
•
2510.13515
•
Published
•
11
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Paper
•
2510.12709
•
Published
•
12
HoneyBee: Data Recipes for Vision-Language Reasoners
Paper
•
2510.12225
•
Published
•
10
Paper
•
2511.05491
•
Published
•
51
DeepEyesV2: Toward Agentic Multimodal Model
Paper
•
2511.05271
•
Published
•
42
NVIDIA Nemotron Nano V2 VL
Paper
•
2511.03929
•
Published
•
27
SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Paper
•
2511.02280
•
Published
•
3
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for
Visual Chain-of-Thought
Paper
•
2511.02779
•
Published
•
58
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image
Generation
Paper
•
2510.21583
•
Published
•
30
Towards Universal Video Retrieval: Generalizing Video Embedding via
Synthesized Multimodal Pyramid Curriculum
Paper
•
2510.27571
•
Published
•
17