Maxwell Yao's picture

11

Maxwell Yao

MaxwellJryao

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 8 hours ago

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

upvoted a paper 8 days ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

upvoted a paper 3 months ago

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

View all activity

Organizations

authored 2 papers 9 months ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18, 2025 • 37

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 19