| Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning | May 18, 2025 | Reinforcement Learning (RL)Visual Grounding | CodeCode Available | 3 |
| Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward | May 18, 2025 | GPUGraph Matching | CodeCode Available | 3 |
| dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching | May 17, 2025 | Denoising | CodeCode Available | 3 |
| Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis | May 16, 2025 | Continual LearningRepresentation Learning | CodeCode Available | 3 |
| SongEval: A Benchmark Dataset for Song Aesthetics Evaluation | May 16, 2025 | | CodeCode Available | 3 |
| Visual Planning: Let's Think Only with Images | May 16, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 3 |
| Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking | May 16, 2025 | BenchmarkingManagement | CodeCode Available | 3 |
| MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | May 15, 2025 | Image AnimationVideo Generation | CodeCode Available | 3 |
| Parallel Scaling Law for Language Models | May 15, 2025 | | CodeCode Available | 3 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 |
| OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning | May 13, 2025 | Reinforcement Learning (RL)Visual Reasoning | CodeCode Available | 3 |
| Generative AI for Autonomous Driving: Frontiers and Opportunities | May 13, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain | May 12, 2025 | Multivariate Time Series ForecastingRepresentation Learning | CodeCode Available | 3 |
| Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | May 12, 2025 | Code Generation | CodeCode Available | 3 |
| CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments | May 10, 2025 | Pose Estimation | CodeCode Available | 3 |
| LLMs Get Lost In Multi-Turn Conversation | May 9, 2025 | | CodeCode Available | 3 |
| The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization | May 9, 2025 | Benchmarking | CodeCode Available | 3 |
| SOAP: Style-Omniscient Animatable Portraits | May 8, 2025 | Image to 3D | CodeCode Available | 3 |
| TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation | May 8, 2025 | Quantization | CodeCode Available | 3 |
| A Common Interface for Automatic Differentiation | May 8, 2025 | | CodeCode Available | 3 |
| FastMap: Revisiting Dense and Scalable Structure from Motion | May 7, 2025 | GPU | CodeCode Available | 3 |
| OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation | May 6, 2025 | Robot ManipulationVision-Language-Action | CodeCode Available | 3 |
| LiftFeat: 3D Geometry-Aware Local Feature Matching | May 6, 2025 | 3D geometryDepth Estimation | CodeCode Available | 3 |
| Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models | May 5, 2025 | Policy Gradient MethodsRAG | CodeCode Available | 3 |
| R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning | May 5, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | May 5, 2025 | ChatbotDecoder | CodeCode Available | 3 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 |
| Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | May 4, 2025 | Mixture-of-ExpertsNeRF | CodeCode Available | 3 |
| Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions | May 1, 2025 | Survey | CodeCode Available | 3 |
| Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models | May 1, 2025 | Large Language Model | CodeCode Available | 3 |
| Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing | Apr 30, 2025 | Image Generation | CodeCode Available | 3 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 |
| PixelHacker: Image Inpainting with Structural and Semantic Consistency | Apr 29, 2025 | DenoisingImage Generation | CodeCode Available | 3 |
| ReasonIR: Training Retrievers for Reasoning Tasks | Apr 29, 2025 | Information RetrievalMMLU | CodeCode Available | 3 |
| Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video | Apr 28, 2025 | | CodeCode Available | 3 |
| Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs | Apr 28, 2025 | Synthetic Data Generation | CodeCode Available | 3 |
| MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion | Apr 28, 2025 | | CodeCode Available | 3 |
| TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos | Apr 24, 2025 | MMEVideo MME | CodeCode Available | 3 |
| An Empirical Study on Prompt Compression for Large Language Models | Apr 24, 2025 | ArticlesMath | CodeCode Available | 3 |
| Tina: Tiny Reasoning Models via LoRA | Apr 22, 2025 | Reinforcement Learning (RL) | CodeCode Available | 3 |
| Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud Detection | Apr 22, 2025 | Contrastive LearningFraud Detection | CodeCode Available | 3 |
| Learning to Reason under Off-Policy Guidance | Apr 21, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 3 |
| OmniAudio: Generating Spatial Audio from 360-Degree Video | Apr 21, 2025 | Audio Generation | CodeCode Available | 3 |
| TAPIP3D: Tracking Any Point in Persistent 3D Geometry | Apr 20, 2025 | 3D geometryDepth And Camera Motion | CodeCode Available | 3 |
| Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D | Apr 19, 2025 | DecoderObject Localization | CodeCode Available | 3 |
| Generative AI Act II: Test Time Scaling Drives Cognition Engineering | Apr 18, 2025 | Prompt Engineering | CodeCode Available | 3 |
| LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models | Apr 18, 2025 | Feature Upsampling | CodeCode Available | 3 |
| Event-Enhanced Blurry Video Super-Resolution | Apr 17, 2025 | DeblurringMotion Estimation | CodeCode Available | 3 |
| IMAGGarment-1: Fine-Grained Garment Generation for Controllable Fashion Design | Apr 17, 2025 | | CodeCode Available | 3 |
| Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts | Apr 17, 2025 | Denoising | CodeCode Available | 3 |