| Improved 3D Point-Line Mapping Regression for Camera Relocalization | Feb 28, 2025 | Camera Relocalizationregression | CodeCode Available | 3 |
| Attention Distillation: A Unified Approach to Visual Characteristics Transfer | Feb 27, 2025 | DenoisingImage Generation | CodeCode Available | 3 |
| AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs | Feb 27, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| LongRoPE2: Near-Lossless LLM Context Window Scaling | Feb 27, 2025 | | CodeCode Available | 3 |
| InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions | Feb 27, 2025 | Human-Object Interaction DetectionObject | CodeCode Available | 3 |
| OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection | Feb 27, 2025 | Action DetectionBenchmarking | CodeCode Available | 3 |
| LangProBe: a Language Programs Benchmark | Feb 27, 2025 | | CodeCode Available | 3 |
| Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation | Feb 27, 2025 | Image Generationtoken-classification | CodeCode Available | 3 |
| The Mighty ToRR: A Benchmark for Table Reasoning and Robustness | Feb 26, 2025 | | CodeCode Available | 3 |
| BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction | Feb 26, 2025 | BenchmarkingTime Series | CodeCode Available | 3 |
| Self-rewarding correction for mathematical reasoning | Feb 26, 2025 | Mathematical Reasoning | CodeCode Available | 3 |
| Harnessing Multiple Large Language Models: A Survey on LLM Ensemble | Feb 25, 2025 | Survey | CodeCode Available | 3 |
| ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation | Feb 25, 2025 | Image Generation | CodeCode Available | 3 |
| S-Graphs 2.0 -- A Hierarchical-Semantic Optimization and Loop Closure for SLAM | Feb 25, 2025 | global-optimizationManagement | CodeCode Available | 3 |
| Chain of Draft: Thinking Faster by Writing Less | Feb 25, 2025 | | CodeCode Available | 3 |
| Verdict: A Library for Scaling Judge-Time Compute | Feb 25, 2025 | Fact CheckingHallucination | CodeCode Available | 3 |
| MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | Feb 24, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 3 |
| AnyTop: Character Animation Diffusion with Any Topology | Feb 24, 2025 | Denoising | CodeCode Available | 3 |
| DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks | Feb 24, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 3 |
| Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs | Feb 24, 2025 | Computer Security | CodeCode Available | 3 |
| AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay | Feb 24, 2025 | | CodeCode Available | 3 |
| KV-Edit: Training-Free Image Editing for Precise Background Preservation | Feb 24, 2025 | Text-based Image Editing | CodeCode Available | 3 |
| AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement | Feb 24, 2025 | | CodeCode Available | 3 |
| SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition | Feb 23, 2025 | Deep HashingGPU | CodeCode Available | 3 |
| Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents | Feb 22, 2025 | AI Agent | CodeCode Available | 3 |
| Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs | Feb 20, 2025 | Quantization | CodeCode Available | 3 |
| Prompt-to-Leaderboard | Feb 20, 2025 | ChatbotLanguage Modeling | CodeCode Available | 3 |
| Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation | Feb 20, 2025 | 3D Shape GenerationTexture Synthesis | CodeCode Available | 3 |
| Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition | Feb 20, 2025 | | CodeCode Available | 3 |
| PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data | Feb 20, 2025 | Style Transfer | CodeCode Available | 3 |
| CrossOver: 3D Scene Cross-Modal Alignment | Feb 20, 2025 | cross-modal alignmentObject | CodeCode Available | 3 |
| A Comprehensive Survey on Composed Image Retrieval | Feb 19, 2025 | AttributeImage Retrieval | CodeCode Available | 3 |
| Slamming: Training a Speech Language Model on One GPU in a Day | Feb 19, 2025 | GPULanguage Modeling | CodeCode Available | 3 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 |
| SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation | Feb 18, 2025 | Voice Cloning | CodeCode Available | 3 |
| Soundwave: Less is More for Speech-Text Alignment in LLMs | Feb 18, 2025 | | CodeCode Available | 3 |
| Personalized Image Generation with Deep Generative Models: A Decade Survey | Feb 18, 2025 | Image GenerationPersonalized Image Generation | CodeCode Available | 3 |
| PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths | Feb 18, 2025 | RAGRetrieval | CodeCode Available | 3 |
| Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | Feb 18, 2025 | graph constructionLarge Language Model | CodeCode Available | 3 |
| MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction | Feb 17, 2025 | 2kAutonomous Driving | CodeCode Available | 3 |
| TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Feb 17, 2025 | GSM8K | CodeCode Available | 3 |
| Intuitive physics understanding emerges from self-supervised pretraining on natural videos | Feb 17, 2025 | Video Prediction | CodeCode Available | 3 |
| Learning Getting-Up Policies for Real-World Humanoid Robots | Feb 17, 2025 | | CodeCode Available | 3 |
| Stonefish: Supporting Machine Learning Research in Marine Robotics | Feb 17, 2025 | Optical Flow Estimation | CodeCode Available | 3 |
| Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval | Feb 17, 2025 | Information RetrievalRetrieval | CodeCode Available | 3 |
| LIMR: Less is More for RL Scaling | Feb 17, 2025 | | CodeCode Available | 3 |
| Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding | Feb 14, 2025 | 3D Object Detection3D visual grounding | CodeCode Available | 3 |
| Strassen Multisystolic Array Hardware Architectures | Feb 14, 2025 | | CodeCode Available | 3 |
| Automated Hypothesis Validation with Agentic Sequential Falsifications | Feb 14, 2025 | Decision MakingHallucination | CodeCode Available | 3 |