MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26 • 9
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated 18 days ago • 277k • 1.55k
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 14
Runtime error Featured 142 TextDiffuser 2 📚 142 Generate images from text prompts with layout planning
stabilityai/japanese-stable-clip-vit-l-16 Feature Extraction • 0.4B • Updated Jul 10, 2024 • 3.99k • 27