| R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models | Oct 23, 2024 | Diversity | CodeCode Available | 5 |
| Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients | Jul 11, 2024 | Quantization | CodeCode Available | 5 |
| GraphCast: Learning skillful medium-range global weather forecasting | Dec 24, 2022 | Decision MakingWeather Forecasting | CodeCode Available | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 |
| Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models | Jan 2, 2025 | Image Generation | CodeCode Available | 5 |
| Automated Design of Agentic Systems | Aug 15, 2024 | | CodeCode Available | 5 |
| EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models | Feb 5, 2024 | | CodeCode Available | 5 |
| ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | Oct 23, 2024 | | CodeCode Available | 5 |
| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| Off-Policy Primal-Dual Safe Reinforcement Learning | Jan 26, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 |
| XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech | May 31, 2023 | text-to-speechText to Speech | CodeCode Available | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 |
| When LLMs Meet Cybersecurity: A Systematic Literature Review | May 6, 2024 | Systematic Literature Review | CodeCode Available | 5 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| SpeechAlign: Aligning Speech Generation to Human Preferences | Apr 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos | Dec 12, 2024 | 3D Reconstruction | CodeCode Available | 5 |
| Search-o1: Agentic Search-Enhanced Large Reasoning Models | Jan 9, 2025 | Code Generation | CodeCode Available | 5 |
| MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBench | Aug 1, 2024 | Humanoid ControlMuJoCo | CodeCode Available | 5 |
| GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | Mar 6, 2024 | | CodeCode Available | 5 |
| Getting SMARTER for Motion Planning in Autonomous Driving Systems | Feb 20, 2025 | Autonomous DrivingMotion Planning | CodeCode Available | 5 |
| UnCommon Objects in 3D | Jan 13, 2025 | Object | CodeCode Available | 5 |
| Hybrid Transformers for Music Source Separation | Nov 15, 2022 | Music Source SeparationSpeech Enhancement | CodeCode Available | 5 |
| ImageBind: One Embedding Space To Bind Them All | May 9, 2023 | AllCross-Modal Retrieval | CodeCode Available | 5 |
| R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning | Mar 7, 2025 | Emotion RecognitionLanguage Modeling | CodeCode Available | 5 |
| rerankers: A Lightweight Python Library to Unify Ranking Methods | Aug 30, 2024 | Re-RankingRetrieval | CodeCode Available | 5 |
| Xwin-LM: Strong and Scalable Alignment Practice for LLMs | May 30, 2024 | | CodeCode Available | 5 |
| rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset | May 27, 2025 | | CodeCode Available | 5 |
| Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation | Sep 25, 2024 | text-to-speechText to Speech | CodeCode Available | 5 |
| IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation | Oct 9, 2024 | AttributeImage Generation | CodeCode Available | 5 |
| Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 5 |
| UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition | Jan 1, 2024 | Time SeriesTime Series Forecasting | CodeCode Available | 5 |
| VideoMamba: State Space Model for Efficient Video Understanding | Mar 11, 2024 | Action ClassificationMamba | CodeCode Available | 5 |
| Repetition Improves Language Model Embeddings | Feb 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Lanpaint: Training-Free Diffusion Inpainting with Exact and Fast Conditional Inference | Feb 5, 2025 | | CodeCode Available | 5 |
| KBLaM: Knowledge Base augmented Language Model | Oct 14, 2024 | 8kGPU | CodeCode Available | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 |
| M-Prometheus: A Suite of Open Multilingual LLM Judges | Apr 7, 2025 | Machine TranslationModel Selection | CodeCode Available | 5 |
| NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails | Oct 16, 2023 | Dialogue ManagementManagement | CodeCode Available | 5 |
| UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler | Feb 27, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 5 |
| Reinforcement Learning from Human Feedback | Apr 16, 2025 | MathPhilosophy | CodeCode Available | 5 |
| RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Oct 10, 2024 | Zero-shot Generalization | CodeCode Available | 5 |
| Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection | Feb 14, 2022 | Objectobject-detection | CodeCode Available | 5 |
| EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones | Oct 25, 2022 | Bandwidth ExtensionGenerative Adversarial Network | CodeCode Available | 5 |
| Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting | Jan 2, 2024 | Autonomous DrivingNeRF | CodeCode Available | 5 |
| Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security | Jan 10, 2024 | Task Planning | CodeCode Available | 5 |
| Point-E: A System for Generating 3D Point Clouds from Complex Prompts | Dec 16, 2022 | Generating 3D Point CloudsGPU | CodeCode Available | 5 |
| Segment Anything | Apr 5, 2023 | Event-based Object SegmentationImage Segmentation | CodeCode Available | 5 |
| Nougat: Neural Optical Understanding for Academic Documents | Aug 25, 2023 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 5 |
| HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO | Apr 1, 2025 | State Estimation | CodeCode Available | 5 |