WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 11 days ago • 100
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 11 days ago • 100
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 28 days ago • 236
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published 28 days ago • 29
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills Paper • 2605.23899 • Published 28 days ago • 29
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 28 days ago • 236 • 4
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 28 days ago • 236 • 4
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 28 days ago • 236
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published May 12 • 16
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation Paper • 2604.08540 • Published Apr 9 • 5