RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 4 days ago • 7
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation Paper • 2402.03216 • Published Feb 5, 2024 • 10
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models Paper • 2605.11887 • Published May 12 • 15
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Paper • 2606.13578 • Published 5 days ago • 53
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding? Paper • 2606.08063 • Published 10 days ago • 75
InterleaveThinker: Reinforcing Agentic Interleaved Generation Paper • 2606.13679 • Published 5 days ago • 77
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution Paper • 2606.10917 • Published 6 days ago • 76
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 6 days ago • 85
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 5 days ago • 94
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 5 days ago • 131
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking Paper • 2509.25787 • Published Jan 27 • 3
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21, 2025 • 65
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 50
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models Paper • 2401.13919 • Published Jan 25, 2024 • 33