| TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models | Feb 10, 2025 | 3D Generation3D Reconstruction | CodeCode Available | 5 |
| Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions | Feb 10, 2025 | | CodeCode Available | 5 |
| High-Fidelity Simultaneous Speech-To-Speech Translation | Feb 5, 2025 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 5 |
| LIMO: Less is More for Reasoning | Feb 5, 2025 | MathMathematical Reasoning | CodeCode Available | 5 |
| Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference | Feb 5, 2025 | | CodeCode Available | 5 |
| MedRAX: Medical Reasoning Agent for Chest X-ray | Feb 4, 2025 | AI AgentManagement | CodeCode Available | 5 |
| ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills | Feb 3, 2025 | | CodeCode Available | 5 |
| Process Reinforcement through Implicit Rewards | Feb 3, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 5 |
| FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Jan 24, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 5 |
| Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass | Jan 23, 2025 | 3D ReconstructionCamera Pose Estimation | CodeCode Available | 5 |
| VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | Jan 22, 2025 | PhilosophyVideo Question Answering | CodeCode Available | 5 |
| Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Jan 21, 2025 | Computational EfficiencyDepth Estimation | CodeCode Available | 5 |
| IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | Jan 19, 2025 | Navigate | CodeCode Available | 5 |
| PaSa: An LLM Agent for Comprehensive Academic Paper Search | Jan 17, 2025 | | CodeCode Available | 5 |
| SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation | Jan 16, 2025 | Benchmarking | CodeCode Available | 5 |
| OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML | Jan 15, 2025 | | CodeCode Available | 5 |
| Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | Jan 15, 2025 | Natural Language UnderstandingRAG | CodeCode Available | 5 |
| Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise | Jan 14, 2025 | Optical Flow Estimation | CodeCode Available | 5 |
| UnCommon Objects in 3D | Jan 13, 2025 | Object | CodeCode Available | 5 |
| Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens | Jan 13, 2025 | | CodeCode Available | 5 |
| MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation | Jan 12, 2025 | RAGRetrieval | CodeCode Available | 5 |
| Search-o1: Agentic Search-Enhanced Large Reasoning Models | Jan 9, 2025 | Code Generation | CodeCode Available | 5 |
| Transformer-Squared: Self-adaptive LLMs | Jan 9, 2025 | | CodeCode Available | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| NeuralSVG: An Implicit Representation for Text-to-Vector Generation | Jan 7, 2025 | Vector Graphics | CodeCode Available | 5 |
| Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives | Jan 7, 2025 | Autonomous DrivingGeneral Knowledge | CodeCode Available | 5 |
| Cosmos World Foundation Model Platform for Physical AI | Jan 7, 2025 | modelPosition | CodeCode Available | 5 |
| Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models | Jan 2, 2025 | Image Generation | CodeCode Available | 5 |
| Exploring GLU Expansion Ratios: A Study of Structured Pruning in LLaMA-3.2 Models | Dec 26, 2024 | Computational EfficiencyNetwork Pruning | CodeCode Available | 5 |
| HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs | Dec 25, 2024 | Reinforcement Learning (RL) | CodeCode Available | 5 |
| Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search | Dec 24, 2024 | | CodeCode Available | 5 |
| Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks | Dec 20, 2024 | AllRAG | CodeCode Available | 5 |
| LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks | Dec 19, 2024 | 8kIn-Context Learning | CodeCode Available | 5 |
| Noisereduce: Domain General Noise Reduction for Time Series Signals | Dec 19, 2024 | Time Series | CodeCode Available | 5 |
| Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference | Dec 18, 2024 | DecoderRetrieval | CodeCode Available | 5 |
| Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation | Dec 18, 2024 | 3D Reconstruction4k | CodeCode Available | 5 |
| DUET: Dual Clustering Enhanced Multivariate Time Series Forecasting | Dec 14, 2024 | Clusteringenergy management | CodeCode Available | 5 |
| SCBench: A KV Cache-Centric Analysis of Long-Context Methods | Dec 13, 2024 | MambaQuantization | CodeCode Available | 5 |
| Representing Long Volumetric Video with Temporal Gaussian Hierarchy | Dec 12, 2024 | GPU | CodeCode Available | 5 |
| SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos | Dec 12, 2024 | 3D Reconstruction | CodeCode Available | 5 |
| Arbitrary-steps Image Super-resolution via Diffusion Inversion | Dec 12, 2024 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 5 |
| Learning Flow Fields in Attention for Controllable Person Image Generation | Dec 11, 2024 | AttributeImage Generation | CodeCode Available | 5 |
| OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations | Dec 10, 2024 | AttributeBenchmarking | CodeCode Available | 5 |
| EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models | Dec 10, 2024 | | CodeCode Available | 5 |
| Training Large Language Models to Reason in a Continuous Latent Space | Dec 9, 2024 | Logical Reasoning | CodeCode Available | 5 |
| The BrowserGym Ecosystem for Web Agent Research | Dec 6, 2024 | Benchmarking | CodeCode Available | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |
| MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos | Dec 5, 2024 | Depth Estimation | CodeCode Available | 5 |
| MV-Adapter: Multi-view Consistent Image Generation Made Easy | Dec 4, 2024 | 3D GenerationImage Generation | CodeCode Available | 5 |
| Free Process Rewards without Process Labels | Dec 2, 2024 | Math | CodeCode Available | 5 |