Yiheng Xu
ranpox
AI & ML interests
None yet
Organizations
LayoutLM and Document Intelligence
-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 5 -
microsoft/layoutlm-base-uncased
0.1B • Updated • 98k • 61 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 3 -
microsoft/layoutlmv2-base-uncased
Updated • 449k • 66
AGUVIS: Unified Pure Vision GUI Agents
https://aguvis-project.github.io
Awesome Computer Use Agents
https://github.com/ranpox/awesome-computer-use
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 24
AgentTrek: Browser-Use Agent Data Synthesis
AGUVIS: Unified Pure Vision GUI Agents
https://aguvis-project.github.io
LayoutLM and Document Intelligence
-
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Paper • 1912.13318 • Published • 5 -
microsoft/layoutlm-base-uncased
0.1B • Updated • 98k • 61 -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Paper • 2012.14740 • Published • 3 -
microsoft/layoutlmv2-base-uncased
Updated • 449k • 66
Awesome Computer Use Agents
https://github.com/ranpox/awesome-computer-use
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 71 -
Tree Search for Language Model Agents
Paper • 2407.01476 • Published • 1 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 5 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 24