Yongming Rao

raoyongming

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

Unleashing Text-to-Image Diffusion Models for Visual Perception

authored a paper about 1 month ago

TCOVIS: Temporally Consistent Online Video Instance Segmentation

authored a paper about 1 month ago

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

View all activity

Organizations

None yet

authored 13 papers about 1 month ago

UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

Paper • 2302.04867 • Published Feb 9, 2023

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

Paper • 2111.14819 • Published Nov 29, 2021

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Paper • 2207.14284 • Published Jul 28, 2022

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries

Paper • 2503.12446 • Published Mar 16 • 1

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

Paper • 1903.02874 • Published Mar 7, 2019

R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

Paper • 2505.02018 • Published May 4 • 3

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Paper • 2506.05344 • Published Jun 5 • 16

X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again

Paper • 2507.22058 • Published Jul 29 • 39

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15 • 57

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Paper • 2511.15705 • Published Nov 19 • 92

authored a paper 11 months ago

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 29

authored a paper about 1 year ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

authored 3 papers over 1 year ago

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19, 2024 • 25

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Paper • 2408.00754 • Published Aug 1, 2024 • 23

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17

authored 2 papers about 2 years ago

Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Paper • 2312.06655 • Published Dec 11, 2023 • 24

Yongming Rao

AI & ML interests

Recent Activity

Organizations

raoyongming's activity