| OR-LLM-Agent: Automating Modeling and Solving of Operations Research Optimization Problem with Reasoning Large Language Model | Mar 13, 2025 | AI AgentLanguage Modeling | CodeCode Available | 2 |
| DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding | Mar 13, 2025 | 4kAutonomous Driving | CodeCode Available | 2 |
| RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors | Mar 13, 2025 | 3DGS | CodeCode Available | 2 |
| EEdit: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing | Mar 13, 2025 | | CodeCode Available | 2 |
| OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer | Mar 13, 2025 | Decodermultimodal interaction | CodeCode Available | 2 |
| A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 | Mar 13, 2025 | | CodeCode Available | 2 |
| Unlocking Generalization Power in LiDAR Point Cloud Registration | Mar 13, 2025 | Autonomous DrivingPoint Cloud Registration | CodeCode Available | 2 |
| 3D Student Splatting and Scooping | Mar 13, 2025 | 3DGSNeural Rendering | CodeCode Available | 2 |
| Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection | Mar 13, 2025 | Anomaly Detectionzero-shot anomaly detection | CodeCode Available | 2 |
| VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Mar 13, 2025 | Motion GenerationVideo Generation | CodeCode Available | 2 |
| GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding | Mar 13, 2025 | DiversityLanguage Modeling | CodeCode Available | 2 |
| RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing | Mar 13, 2025 | Computational EfficiencyMamba | CodeCode Available | 2 |
| 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | Mar 13, 2025 | Large Language ModelObject | CodeCode Available | 2 |
| ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness | Mar 13, 2025 | 3D Human Pose Estimation3D Human Shape Estimation | CodeCode Available | 2 |
| Multi-Modal Mamba Modeling for Survival Prediction (M4Survive): Adapting Joint Foundation Model Representations | Mar 13, 2025 | Computational EfficiencyMamba | CodeCode Available | 2 |
| SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video | Mar 12, 2025 | Video Inpainting | CodeCode Available | 2 |
| KNighter: Transforming Static Analysis with LLM-Synthesized Checkers | Mar 12, 2025 | | CodeCode Available | 2 |
| PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Mar 12, 2025 | DiagnosticVideo Generation | CodeCode Available | 2 |
| Manify: A Python Library for Learning Non-Euclidean Representations | Mar 12, 2025 | Representation Learning | CodeCode Available | 2 |
| Teaching LMMs for Image Quality Scoring and Interpreting | Mar 12, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 |
| Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark | Mar 12, 2025 | Image RetrievalRetrieval | CodeCode Available | 2 |
| CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games | Mar 12, 2025 | Decision MakingVision-Language-Action | CodeCode Available | 2 |
| Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space | Mar 12, 2025 | Image-to-Image TranslationVideo Editing | CodeCode Available | 2 |
| Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter | Mar 12, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | Mar 12, 2025 | Multi-agent Reinforcement Learningreinforcement-learning | CodeCode Available | 2 |