Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Paper β’ 2504.03735 β’ Published Apr 1, 2025 β’ 1
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness Paper β’ 2510.01670 β’ Published Oct 2, 2025 β’ 6
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! +1 Jun 6, 2025 β’ 55
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis Paper β’ 2502.20383 β’ Published Feb 27, 2025 β’ 3
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper β’ 2503.05132 β’ Published Mar 7, 2025 β’ 57
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 129
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. β’ 26 items β’ Updated May 1, 2025 β’ 574
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 Jul 23, 2024 β’ 241