| Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Mar 20, 2025 | BenchmarkingReinforcement Learning (RL) | CodeCode Available | 4 |
| Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | Mar 20, 2025 | Decision MakingLanguage Modeling | CodeCode Available | 4 |
| UniK3D: Universal Camera Monocular 3D Estimation | Mar 20, 2025 | 3D ReconstructionDisentanglement | CodeCode Available | 4 |
| Sonata: Self-Supervised Learning of Reliable Point Representations | Mar 20, 2025 | 3D Semantic SegmentationSelf-Supervised Learning | CodeCode Available | 4 |
| Cube: A Roblox View of 3D Intelligence | Mar 19, 2025 | Scene GenerationText Generation | CodeCode Available | 4 |
| DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework | Mar 19, 2025 | 8kAction Recognition | CodeCode Available | 4 |
| Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control | Mar 18, 2025 | | CodeCode Available | 4 |
| Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning | Mar 18, 2025 | 3D Face AnimationCommon Sense Reasoning | CodeCode Available | 4 |
| Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey | Mar 16, 2025 | Autonomous Drivingmultimodal generation | CodeCode Available | 4 |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | Mar 13, 2025 | Domain GeneralizationMath | CodeCode Available | 4 |
| R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Mar 13, 2025 | Multimodal Reasoning | CodeCode Available | 4 |
| Retrieval-Augmented Generation with Hierarchical Knowledge | Mar 13, 2025 | Multi-hop Question AnsweringQuestion Answering | CodeCode Available | 4 |
| VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary | Mar 12, 2025 | EgoSchemaRetrieval | CodeCode Available | 4 |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models | Mar 12, 2025 | DenoisingLanguage Modeling | CodeCode Available | 4 |
| LocAgent: Graph-Guided LLM Agents for Code Localization | Mar 12, 2025 | GitHub issue resolutionNavigate | CodeCode Available | 4 |
| PharMolixFM: All-Atom Foundation Models for Molecular Modeling and Generation | Mar 12, 2025 | AllDenoising | CodeCode Available | 4 |
| Towards All-in-One Medical Image Re-Identification | Mar 11, 2025 | All | CodeCode Available | 4 |
| Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models | Mar 11, 2025 | FormInformation Retrieval | CodeCode Available | 4 |
| LBM: Latent Bridge Matching for Fast Image-to-Image Translation | Mar 10, 2025 | Depth EstimationImage Relighting | CodeCode Available | 4 |
| MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | Mar 10, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 4 |
| WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation | Mar 10, 2025 | Common Sense ReasoningImage Generation | CodeCode Available | 4 |
| Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms | Mar 10, 2025 | | CodeCode Available | 4 |
| Inductive Moment Matching | Mar 10, 2025 | | CodeCode Available | 4 |
| LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL | Mar 10, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 4 |
| PointVLA: Injecting the 3D World into Vision-Language-Action Models | Mar 10, 2025 | Imitation LearningSpatial Reasoning | CodeCode Available | 4 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Mar 7, 2025 | Image InpaintingOptical Flow Estimation | CodeCode Available | 4 |
| R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | Mar 7, 2025 | RAGReinforcement Learning (RL) | CodeCode Available | 4 |
| R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model | Mar 7, 2025 | Multimodal Reasoningreinforcement-learning | CodeCode Available | 4 |
| Unified Reward Model for Multimodal Understanding and Generation | Mar 7, 2025 | Image Generationmodel | CodeCode Available | 4 |
| Factorio Learning Environment | Mar 6, 2025 | Program SynthesisSpatial Reasoning | CodeCode Available | 4 |
| ReasonGraph: Visualisation of Reasoning Paths | Mar 6, 2025 | | CodeCode Available | 4 |
| DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning | Feb 28, 2025 | Information Retrievalreinforcement-learning | CodeCode Available | 4 |
| OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Feb 27, 2025 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| UniTok: A Unified Tokenizer for Visual Generation and Understanding | Feb 27, 2025 | Quantization | CodeCode Available | 4 |
| HVI: A New color space for Low-light Image Enhancement | Feb 27, 2025 | Image EnhancementLow-Light Image Enhancement | CodeCode Available | 4 |
| Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | Feb 26, 2025 | Depth EstimationDiversity | CodeCode Available | 4 |
| ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | Feb 25, 2025 | Question AnsweringRAG | CodeCode Available | 4 |
| SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | Feb 25, 2025 | modelVideo Generation | CodeCode Available | 4 |
| R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep Reasoning | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| LettuceDetect: A Hallucination Detection Framework for RAG Applications | Feb 24, 2025 | 8kGPU | CodeCode Available | 4 |
| TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control | Feb 24, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 4 |
| Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation | Feb 23, 2025 | Benchmarking | CodeCode Available | 4 |
| REFINE: Inversion-Free Backdoor Defense via Model Reprogramming | Feb 22, 2025 | backdoor defense | CodeCode Available | 4 |
| Natural Language Generation | Feb 20, 2025 | Text Generation | CodeCode Available | 4 |
| SurveyX: Academic Survey Automation via Large Language Models | Feb 20, 2025 | Survey | CodeCode Available | 4 |
| LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | Feb 20, 2025 | | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| Craw4LLM: Efficient Web Crawling for LLM Pretraining | Feb 19, 2025 | 10-shot image generation | CodeCode Available | 4 |
| A deep learning framework for efficient pathology image analysis | Feb 18, 2025 | BenchmarkingDeep Learning | CodeCode Available | 4 |