Evaluating Cultural and Social Awareness of LLM Web Agents Paper • 2410.23252 • Published Oct 30, 2024 • 1
New Job, New Gender? Measuring the Social Bias in Image Generation Models Paper • 2401.00763 • Published Jan 1, 2024
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding Paper • 2502.11492 • Published Feb 17 • 2
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models Paper • 2403.12027 • Published Mar 18, 2024
GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness Paper • 2510.00536 • Published Oct 1 • 6
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion Paper • 2510.22768 • Published Oct 26 • 7
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Paper • 2510.14949 • Published Oct 16 • 5
Large Language Models Struggle to Learn Long-Tail Knowledge Paper • 2211.08411 • Published Nov 15, 2022 • 3
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Paper • 2510.14949 • Published Oct 16 • 5
Decoupling Task-Solving and Output Formatting in LLM Generation Paper • 2510.03595 • Published Oct 4 • 1
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making Paper • 2410.07166 • Published Oct 9, 2024 • 3
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence Paper • 2503.05037 • Published Mar 6 • 4
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings Paper • 1904.10635 • Published Apr 24, 2019
The Woman Worked as a Babysitter: On Biases in Language Generation Paper • 1909.01326 • Published Sep 3, 2019
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems Paper • 2310.05280 • Published Oct 8, 2023 • 1
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems Paper • 2305.07797 • Published May 12, 2023