| Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models | May 27, 2025 | Concept Alignmentobject-detection | CodeCode Available | 2 |
| SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation | May 27, 2025 | Object TrackingSegmentation | CodeCode Available | 2 |
| Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment | May 27, 2025 | Adversarial AttackClustering | CodeCode Available | 2 |
| TimePro: Efficient Multivariate Long-term Time Series Forecasting with Variable- and Time-Aware Hyper-state | May 27, 2025 | MambaTime Series | CodeCode Available | 2 |
| R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing | May 27, 2025 | Math | CodeCode Available | 2 |
| Improved Representation Steering for Language Models | May 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution | May 27, 2025 | Reinforcement Learning (RL) | CodeCode Available | 2 |
| Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? | May 27, 2025 | Multimodal Reasoning | CodeCode Available | 2 |
| DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue | May 26, 2025 | DiagnosticQuestion Answering | CodeCode Available | 2 |
| WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| One-shot Entropy Minimization | May 26, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 2 |
| Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration | May 26, 2025 | Domain GeneralizationHallucination | CodeCode Available | 2 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| WeatherEdit: Controllable Weather Editing with 4D Gaussian Field | May 26, 2025 | 3D Generation3DGS | CodeCode Available | 2 |
| EmoSphere-SER: Enhancing Speech Emotion Recognition Through Spherical Representation with Auxiliary Classification | May 26, 2025 | Emotion Recognitionregression | CodeCode Available | 2 |
| Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning | May 26, 2025 | Decision MakingHierarchical Reinforcement Learning | CodeCode Available | 2 |
| AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models | May 26, 2025 | | CodeCode Available | 2 |
| The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project | May 26, 2025 | | CodeCode Available | 2 |
| A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions | May 26, 2025 | Speech Enhancement | CodeCode Available | 2 |
| SAEs Are Good for Steering -- If You Select the Right Features | May 26, 2025 | | CodeCode Available | 2 |
| CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features | May 26, 2025 | | CodeCode Available | 2 |
| Chain-of-Thought for Autonomous Driving: A Comprehensive Survey and Future Prospects | May 26, 2025 | Autonomous DrivingLogical Reasoning | CodeCode Available | 2 |
| Training-Free Multi-Step Audio Source Separation | May 26, 2025 | Audio Source SeparationDenoising | CodeCode Available | 2 |
| FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching | May 26, 2025 | QuantizationSpeech Enhancement | CodeCode Available | 2 |
| MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability | May 26, 2025 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 2 |
| DiSA: Diffusion Step Annealing in Autoregressive Image Generation | May 26, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities | May 26, 2025 | Knowledge GraphsNatural Language Understanding | CodeCode Available | 2 |
| MAS-Zero: Designing Multi-Agent Systems with Zero Supervision | May 26, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond | May 26, 2025 | Logical ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment | May 26, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| The Missing Point in Vision Transformers for Universal Image Segmentation | May 26, 2025 | Image SegmentationInstance Segmentation | CodeCode Available | 2 |
| MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding | May 26, 2025 | Keyword Spotting | CodeCode Available | 2 |
| Jodi: Unification of Visual Generation and Understanding via Joint Modeling | May 25, 2025 | | CodeCode Available | 2 |
| MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems | May 25, 2025 | | CodeCode Available | 2 |
| I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts | May 25, 2025 | Mixture-of-Expertsmultimodal interaction | CodeCode Available | 2 |
| Benchmarking Laparoscopic Surgical Image Restoration and Beyond | May 25, 2025 | BenchmarkingImage Restoration | CodeCode Available | 2 |
| VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | May 25, 2025 | Multimodal ReasoningQuestion Answering | CodeCode Available | 2 |
| VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes | May 25, 2025 | 3DGS | CodeCode Available | 2 |
| Shifting AI Efficiency From Model-Centric to Data-Centric Compression | May 25, 2025 | Position | CodeCode Available | 2 |
| Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility | May 24, 2025 | Denoising | CodeCode Available | 2 |
| LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS | May 24, 2025 | | CodeCode Available | 2 |
| Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey | May 24, 2025 | Graph Learning | CodeCode Available | 2 |
| CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions | May 24, 2025 | Benchmarking | CodeCode Available | 2 |
| Spiking Transformers Need High Frequency Information | May 24, 2025 | Avg | CodeCode Available | 2 |
| Geometry Aware Operator Transformer as an Efficient and Accurate Neural Surrogate for PDEs on Arbitrary Domains | May 24, 2025 | Computational EfficiencyOperator learning | CodeCode Available | 2 |
| VeriThinker: Learning to Verify Makes Reasoning Model Efficient | May 23, 2025 | model | CodeCode Available | 2 |
| Managing FAIR Knowledge Graphs as Polyglot Data End Points: A Benchmark based on the rdf2pg Framework and Plant Biology Data | May 23, 2025 | Knowledge GraphsManagement | CodeCode Available | 2 |
| MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization | May 23, 2025 | Meta-Learning | CodeCode Available | 2 |
| ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback | May 23, 2025 | | CodeCode Available | 2 |
| DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding | May 23, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |