| GTA1: GUI Test-time Scaling Agent | Jul 8, 2025 | Reinforcement Learning (RL)Task Planning | CodeCode Available | 2 |
| MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification | Jun 26, 2025 | Image SegmentationLarge Language Model | —Unverified | 0 |
| VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models | Jun 21, 2025 | Action GenerationContinual Learning | —Unverified | 0 |
| Towards AI Search Paradigm | Jun 20, 2025 | Decision MakingRetrieval-augmented Generation | —Unverified | 0 |
| Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning | Jun 20, 2025 | Computational EfficiencyTask Planning | —Unverified | 0 |
| A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications | Jun 14, 2025 | Information RetrievalSurvey | CodeCode Available | 3 |
| Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills | Jun 12, 2025 | Large Language ModelTask Planning | —Unverified | 0 |
| VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning | Jun 10, 2025 | Task PlanningVisual Reasoning | —Unverified | 0 |
| Language-Vision Planner and Executor for Text-to-Visual Reasoning | Jun 9, 2025 | In-Context LearningMME | —Unverified | 0 |
| Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search | Jun 8, 2025 | Common Sense ReasoningMotion Planning | CodeCode Available | 0 |