| OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving | Dec 19, 2024 | Autonomous Driving | CodeCode Available | 4 |
| MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge | Jun 17, 2022 | Atari GamesMinecraft | CodeCode Available | 4 |
| GLIGEN: Open-Set Grounded Text-to-Image Generation | Jan 17, 2023 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 4 |
| Simulation-free Schrödinger bridges via score and flow matching | Jul 7, 2023 | | CodeCode Available | 4 |
| Constitutional AI: Harmlessness from AI Feedback | Dec 15, 2022 | Decision Making | CodeCode Available | 4 |
| Revisiting Self-Attentive Sequential Recommendation | Apr 13, 2025 | DecoderRecommendation Systems | CodeCode Available | 4 |
| Aria Everyday Activities Dataset | Feb 20, 2024 | | CodeCode Available | 4 |
| QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning | May 23, 2025 | Question AnsweringReinforcement Learning (RL) | CodeCode Available | 4 |
| Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs | Jul 17, 2024 | Autonomous NavigationCollision Avoidance | CodeCode Available | 4 |
| A-MEM: Agentic Memory for LLM Agents | Feb 17, 2025 | Large Language Model | CodeCode Available | 4 |
| MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations | Jun 13, 2024 | 3D visual groundingAttribute | CodeCode Available | 4 |
| FILM: Frame Interpolation for Large Motion | Feb 10, 2022 | Optical Flow EstimationVideo Frame Interpolation | CodeCode Available | 4 |
| WorldVLA: Towards Autoregressive Action World Model | Jun 26, 2025 | Action Generationmodel | CodeCode Available | 4 |
| SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering | Nov 21, 2023 | | CodeCode Available | 4 |
| Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | May 5, 2025 | Image Generationmultimodal interaction | CodeCode Available | 4 |
| AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling | Feb 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Open Problems in Applied Deep Learning | Jan 26, 2023 | AutoMLDeep Learning | CodeCode Available | 4 |
| ReAct: Synergizing Reasoning and Acting in Language Models | Oct 6, 2022 | Decision MakingFact Verification | CodeCode Available | 4 |
| A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions | Jun 15, 2022 | ClusteringDeep Clustering | CodeCode Available | 4 |
| Diffusion Models for Medical Image Analysis: A Comprehensive Survey | Nov 14, 2022 | DenoisingMedical Image Analysis | CodeCode Available | 4 |
| LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning | Jan 2, 2024 | | CodeCode Available | 4 |
| Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies | Jul 1, 2024 | image-classificationImage Classification | CodeCode Available | 4 |
| ChatGPT for Robotics: Design Principles and Model Abilities | Feb 20, 2023 | Mathematical ReasoningPrompt Engineering | CodeCode Available | 4 |
| An Entropy-based Text Watermarking Detection Method | Mar 20, 2024 | | CodeCode Available | 4 |
| RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation | Feb 26, 2024 | Code Documentation GenerationCode Generation | CodeCode Available | 4 |
| MINIMA: Modality Invariant Image Matching | Dec 27, 2024 | | CodeCode Available | 4 |
| SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation | May 30, 2024 | AttributeAutonomous Driving | CodeCode Available | 4 |
| Tower: An Open Multilingual Large Language Model for Translation-Related Tasks | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| TrustLLM: Trustworthiness in Large Language Models | Jan 10, 2024 | EthicsFairness | CodeCode Available | 4 |
| Null-text Inversion for Editing Real Images using Guided Diffusion Models | Nov 17, 2022 | Image GenerationText-based Image Editing | CodeCode Available | 4 |
| GriTS: Grid table similarity metric for table structure recognition | Mar 23, 2022 | | CodeCode Available | 4 |
| 3D Scene Generation: A Survey | May 8, 2025 | Autonomous DrivingDiversity | CodeCode Available | 4 |
| LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover | Jul 24, 2024 | Automated Theorem ProvingMath | CodeCode Available | 4 |
| Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance | May 26, 2023 | | CodeCode Available | 4 |
| AgentBench: Evaluating LLMs as Agents | Aug 7, 2023 | Decision MakingInstruction Following | CodeCode Available | 4 |
| Semantic-SAM: Segment and Recognize Anything at Any Granularity | Jul 10, 2023 | Image SegmentationSegmentation | CodeCode Available | 4 |
| 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering | Oct 12, 2023 | Dynamic ReconstructionGPU | CodeCode Available | 4 |
| InstanceDiffusion: Instance-level Control for Image Generation | Feb 5, 2024 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 4 |
| Depth Any Video with Scalable Synthetic Data | Oct 14, 2024 | Depth Estimation | CodeCode Available | 4 |
| TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data | Jan 21, 2025 | FairnessImputation | CodeCode Available | 4 |
| Quality-aware Masked Diffusion Transformer for Enhanced Music Generation | May 24, 2024 | DiversityMusic Generation | CodeCode Available | 4 |
| LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection | Jun 15, 2022 | Depth EstimationObject Detection | CodeCode Available | 4 |
| Simple and Effective Masked Diffusion Language Models | Jun 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Sample-Efficient Alignment for LLMs | Nov 3, 2024 | Thompson Sampling | CodeCode Available | 4 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 |
| SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution | Nov 27, 2023 | Image Super-ResolutionSuper-Resolution | CodeCode Available | 4 |
| Sparse Tensor-based Point Cloud Attribute Compression | Apr 3, 2022 | Attribute | CodeCode Available | 4 |
| WavCraft: Audio Editing and Generation with Large Language Models | Mar 14, 2024 | In-Context Learning | CodeCode Available | 4 |
| Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound | Feb 7, 2025 | Benchmarking | CodeCode Available | 4 |
| mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration | Nov 7, 2023 | 1 Image, 2*2 StitchingDecoder | CodeCode Available | 4 |