| dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching | May 17, 2025 | Denoising | CodeCode Available | 3 | 5 |
| SkillMimic: Learning Basketball Interaction Skills from Demonstrations | Aug 12, 2024 | DiversityHuman-Object Interaction Detection | CodeCode Available | 3 | 5 |
| DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation | Jan 28, 2025 | 3D Generation | CodeCode Available | 3 | 5 |
| Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model | Aug 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| RMPE: Regional Multi-person Pose Estimation | Dec 1, 2016 | 2D Human Pose EstimationHuman Detection | CodeCode Available | 3 | 5 |
| Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks | Jun 12, 2024 | BenchmarkingChatbot | CodeCode Available | 3 | 5 |
| NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models | Mar 5, 2024 | QuantizationSpeech Synthesis | CodeCode Available | 3 | 5 |
| PAL: Program-aided Language Models | Nov 18, 2022 | Arithmetic ReasoningGSM8K | CodeCode Available | 3 | 5 |
| HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving | Dec 2, 2024 | Autonomous DrivingNovel View Synthesis | CodeCode Available | 3 | 5 |
| Learning and discovering multiple solutions using physics-informed neural networks with random initialization and deep ensemble | Mar 8, 2025 | Uncertainty Quantification | CodeCode Available | 3 | 5 |
| 3D Facial Expressions through Analysis-by-Neural-Synthesis | Apr 5, 2024 | 3D Face ReconstructionFace Reconstruction | CodeCode Available | 3 | 5 |
| ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions | Mar 13, 2024 | Instance SegmentationObject Detection | CodeCode Available | 3 | 5 |
| GLU Variants Improve Transformer | Feb 12, 2020 | | CodeCode Available | 3 | 5 |
| OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research | May 16, 2023 | Philosophyreinforcement-learning | CodeCode Available | 3 | 5 |
| DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Mar 11, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 | 5 |
| ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate | Nov 5, 2024 | Deep Reinforcement Learningimage-classification | CodeCode Available | 3 | 5 |
| FlashSpeech: Efficient Zero-Shot Speech Synthesis | Apr 23, 2024 | RhythmSpeech Synthesis | CodeCode Available | 3 | 5 |
| Momentum Contrast for Unsupervised Visual Representation Learning | Nov 13, 2019 | Contrastive LearningImage Classification | CodeCode Available | 3 | 5 |
| Characterization of Excess Risk for Locally Strongly Convex Population Risk | Dec 4, 2020 | | CodeCode Available | 3 | 5 |
| wav2letter++: The Fastest Open-source Speech Recognition System | Dec 18, 2018 | Speech Recognition | CodeCode Available | 3 | 5 |
| Identifying Audio Adversarial Examples via Anomalous Pattern Detection | Feb 13, 2020 | | CodeCode Available | 3 | 5 |
| Towards VQA Models That Can Read | Apr 18, 2019 | TextVQAVisual Question Answering (VQA) | CodeCode Available | 3 | 5 |
| First Order Motion Model for Image Animation | Feb 29, 2020 | Image Animationmodel | CodeCode Available | 3 | 5 |
| Transformers in Medical Imaging: A Survey | Jan 24, 2022 | Image ClassificationImage Segmentation | CodeCode Available | 3 | 5 |
| Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models | Sep 3, 2023 | HallucinationWorld Knowledge | CodeCode Available | 3 | 5 |
| Pythia v0.1: the Winning Entry to the VQA Challenge 2018 | Jul 26, 2018 | Data AugmentationVisual Question Answering (VQA) | CodeCode Available | 3 | 5 |
| SQLFlow: A Bridge between SQL and Machine Learning | Jan 19, 2020 | BIG-bench Machine Learning | CodeCode Available | 3 | 5 |
| Mesh R-CNN | Jun 6, 2019 | 3D Shape ModelingPrediction | CodeCode Available | 3 | 5 |
| PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting | Dec 16, 2024 | 3D Reconstruction4k | CodeCode Available | 3 | 5 |
| MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Mar 7, 2025 | Video Generation | CodeCode Available | 3 | 5 |
| Efficient and Robust Automated Machine Learning | Dec 1, 2015 | AutoMLBayesian Optimization | CodeCode Available | 3 | 5 |
| MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem | May 20, 2025 | Mathematical Reasoningscientific discovery | CodeCode Available | 3 | 5 |
| SynSin: End-to-end View Synthesis from a Single Image | Dec 18, 2019 | Novel View Synthesis | CodeCode Available | 3 | 5 |
| An Extensible Framework for Open Heterogeneous Collaborative Perception | Jan 25, 2024 | | CodeCode Available | 3 | 5 |
| Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Jun 7, 2024 | BenchmarkingDecoder | CodeCode Available | 3 | 5 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 | 5 |
| MMLSpark: Unifying Machine Learning Ecosystems at Massive Scales | Oct 20, 2018 | BIG-bench Machine LearningDistributed Computing | CodeCode Available | 3 | 5 |
| Simulating the Real World: A Unified Survey of Multimodal Generative Models | Mar 6, 2025 | 3D GenerationSurvey | CodeCode Available | 3 | 5 |
| AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment | Mar 30, 2021 | AutoMLStock Prediction | CodeCode Available | 3 | 5 |
| VideoRoPE: What Makes for Good Video Rotary Position Embedding? | Feb 7, 2025 | HallucinationPosition | CodeCode Available | 3 | 5 |
| Green AI | Jul 22, 2019 | Deep Learning | CodeCode Available | 3 | 5 |
| Bag of Freebies for Training Object Detection Neural Networks | Feb 11, 2019 | General Classificationimage-classification | CodeCode Available | 3 | 5 |
| Characterizing signal propagation to close the performance gap in unnormalized ResNets | Jan 21, 2021 | | CodeCode Available | 3 | 5 |
| SnapKV: LLM Knows What You are Looking for Before Generation | Apr 22, 2024 | 16kGPU | CodeCode Available | 3 | 5 |
| Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | Oct 10, 2024 | Large Language ModelRecommendation Systems | CodeCode Available | 3 | 5 |
| Distributional Generalization: A New Kind of Generalization | Sep 17, 2020 | 2D Object Detection | CodeCode Available | 3 | 5 |
| Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space | May 21, 2025 | | CodeCode Available | 3 | 5 |
| ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning | Dec 4, 2024 | AttributeTime Series | CodeCode Available | 3 | 5 |
| Bilinear Attention Networks | May 21, 2018 | Visual Question AnsweringVisual Question Answering (VQA) | CodeCode Available | 3 | 5 |
| Caption Anything: Interactive Image Description with Diverse Multimodal Controls | May 4, 2023 | controllable image captioningImage Captioning | CodeCode Available | 3 | 5 |