| Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Mar 12, 2025 | Question AnsweringRAG | CodeCode Available | 7 |
| VACE: All-in-One Video Creation and Editing | Mar 10, 2025 | AllHuman-Domain Subject-to-Video | CodeCode Available | 7 |
| HuixiangDou2: A Robustly Optimized GraphRAG Approach | Mar 9, 2025 | RetrievalRetrieval-augmented Generation | CodeCode Available | 7 |
| AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems | Mar 9, 2025 | | CodeCode Available | 7 |
| EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test | Mar 3, 2025 | Prediction | CodeCode Available | 7 |
| Visual-RFT: Visual Reinforcement Fine-Tuning | Mar 3, 2025 | Few-Shot Object DetectionFine-Grained Image Classification | CodeCode Available | 7 |
| DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion | Mar 3, 2025 | Music Generation | CodeCode Available | 7 |
| LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Feb 28, 2025 | | CodeCode Available | 7 |
| Muon is Scalable for LLM Training | Feb 24, 2025 | Computational Efficiency | CodeCode Available | 7 |
| Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Feb 20, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |
| From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Feb 20, 2025 | Continual LearningKnowledge Graphs | CodeCode Available | 7 |
| S*: Test Time Scaling for Code Generation | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 7 |
| YOLOv12: Attention-Centric Real-Time Object Detectors | Feb 18, 2025 | GPUObject | CodeCode Available | 7 |
| MoBA: Mixture of Block Attention for Long-Context LLMs | Feb 18, 2025 | Mixture-of-Experts | CodeCode Available | 7 |
| Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction | Feb 17, 2025 | Instruction FollowingVoice Cloning | CodeCode Available | 7 |
| pySLAM: An Open-Source, Modular, and Extensible Framework for SLAM | Feb 17, 2025 | Depth EstimationDepth Prediction | CodeCode Available | 7 |
| Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Feb 14, 2025 | Video GenerationVideo Reconstruction | CodeCode Available | 7 |
| Large Language Diffusion Models | Feb 14, 2025 | In-Context LearningInstruction Following | CodeCode Available | 7 |
| LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Feb 11, 2025 | Large Language ModelMath | CodeCode Available | 7 |
| Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile | Feb 10, 2025 | Video Generation | CodeCode Available | 7 |
| Goku: Flow Based Video Generative Foundation Models | Feb 7, 2025 | Image GenerationText to Image Generation | CodeCode Available | 7 |
| Fast Video Generation with Sliding Tile Attention | Feb 6, 2025 | Video Generation | CodeCode Available | 7 |
| VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos | Feb 3, 2025 | Knowledge GraphsRAG | CodeCode Available | 7 |
| LLM-AutoDiff: Auto-Differentiate Any LLM Workflow | Jan 28, 2025 | Prompt EngineeringQuestion Answering | CodeCode Available | 7 |
| Training AI to be Loyal | Jan 27, 2025 | | CodeCode Available | 7 |
| EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning | Jan 25, 2025 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 7 |
| Rethinking the Sample Relations for Few-Shot Classification | Jan 23, 2025 | ClassificationContrastive Learning | CodeCode Available | 7 |
| DoMINO: A Decomposable Multi-scale Iterative Neural Operator for Modeling Large Scale Engineering Simulations | Jan 23, 2025 | | CodeCode Available | 7 |
| Kimi k1.5: Scaling Reinforcement Learning with LLMs | Jan 22, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |
| A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models | Jan 21, 2025 | RAGRetrieval | CodeCode Available | 7 |
| EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming | Jan 21, 2025 | Feature EngineeringGPU | CodeCode Available | 7 |
| PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond | Jan 19, 2025 | Deep LearningMulti-Task Learning | CodeCode Available | 7 |
| FoundationStereo: Zero-Shot Stereo Matching | Jan 17, 2025 | Depth EstimationDiversity | CodeCode Available | 7 |
| MiniMax-01: Scaling Foundation Models with Lightning Attention | Jan 14, 2025 | Mixture-of-Experts | CodeCode Available | 7 |
| rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | Jan 8, 2025 | Math | CodeCode Available | 7 |
| PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides | Jan 7, 2025 | | CodeCode Available | 7 |
| VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | Jan 3, 2025 | | CodeCode Available | 7 |
| Simulating 500 million years of evolution with a language model | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Revisiting PCA for time series reduction in temporal dimension | Dec 27, 2024 | Computational EfficiencyDimensionality Reduction | CodeCode Available | 7 |
| Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback | Dec 20, 2024 | AllInstruction Following | CodeCode Available | 7 |
| Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Dec 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 7 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 |
| 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting | Dec 17, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 7 |
| MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors | Dec 16, 2024 | 3D Reconstructiongraph construction | CodeCode Available | 7 |
| A Library for Learning Neural Operators | Dec 13, 2024 | Operator learning | CodeCode Available | 7 |
| Byte Latent Transformer: Patches Scale Better Than Tokens | Dec 13, 2024 | | CodeCode Available | 7 |
| AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era | Dec 13, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 7 |
| Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems | Dec 12, 2024 | | CodeCode Available | 7 |
| Large Concept Models: Language Modeling in a Sentence Representation Space | Dec 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 |