| Wonder3D: Single Image to 3D using Cross-Domain Diffusion | Oct 23, 2023 | 3D geometryImage to 3D | CodeCode Available | 5 |
| MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | Feb 6, 2024 | AutoMLLanguage Modeling | CodeCode Available | 5 |
| MV-Adapter: Multi-view Consistent Image Generation Made Easy | Dec 4, 2024 | 3D GenerationImage Generation | CodeCode Available | 5 |
| DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows | Feb 16, 2024 | Synthetic Data Generation | CodeCode Available | 5 |
| DeepPhase: Periodic Autoencoders for Learning Motion Phase Manifolds | Jul 22, 2022 | Motion Synthesis | CodeCode Available | 5 |
| WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling | Aug 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model | Jun 16, 2025 | Large Language Modelmultimodal interaction | CodeCode Available | 5 |
| Understanding R1-Zero-Like Training: A Critical Perspective | Mar 26, 2025 | Reinforcement Learning (RL) | CodeCode Available | 5 |
| OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations | Dec 10, 2024 | AttributeBenchmarking | CodeCode Available | 5 |
| NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification | May 22, 2025 | 2D Semantic SegmentationActivity Prediction | CodeCode Available | 5 |
| CogAgent: A Visual Language Model for GUI Agents | Dec 14, 2023 | Language Modeling | CodeCode Available | 5 |
| Transformer-Squared: Self-adaptive LLMs | Jan 9, 2025 | | CodeCode Available | 5 |
| CogVLM: Visual Expert for Pretrained Language Models | Nov 6, 2023 | 1 Image, 2*2 StitchingFS-MEVQA | CodeCode Available | 5 |
| Aria: An Open Multimodal Native Mixture-of-Experts Model | Oct 8, 2024 | Instruction FollowingMixture-of-Experts | CodeCode Available | 5 |
| Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning | Aug 26, 2024 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains | Jun 17, 2024 | | CodeCode Available | 5 |
| A Brief Overview of AI Governance for Responsible Machine Learning Systems | Nov 21, 2022 | | CodeCode Available | 5 |
| Autoregressive Image Generation without Vector Quantization | Jun 17, 2024 | Image GenerationQuantization | CodeCode Available | 5 |
| Representing Long Volumetric Video with Temporal Gaussian Hierarchy | Dec 12, 2024 | GPU | CodeCode Available | 5 |
| Scalable Diffusion Models with Transformers | Dec 19, 2022 | Image Generation | CodeCode Available | 5 |
| Awesome Multi-modal Object Tracking | May 23, 2024 | Autonomous DrivingKnowledge Distillation | CodeCode Available | 5 |
| Trajectory Prediction Meets Large Language Models: A Survey | Jun 3, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| PaperBench: Evaluating AI's Ability to Replicate AI Research | Apr 2, 2025 | | CodeCode Available | 5 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints | Apr 19, 2022 | Additive modelsExplainable artificial intelligence | CodeCode Available | 5 |
| Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects | Jan 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 |
| SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Jan 16, 2024 | Image Generation | CodeCode Available | 5 |
| Long-context LLMs Struggle with Long In-context Learning | Apr 2, 2024 | 2kIn-Context Learning | CodeCode Available | 5 |
| Track Anything: Segment Anything Meets Videos | Apr 24, 2023 | Image SegmentationObject Tracking | CodeCode Available | 5 |
| PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling | Jun 4, 2024 | | CodeCode Available | 5 |
| AppAgent: Multimodal Agents as Smartphone Users | Dec 21, 2023 | Navigate | CodeCode Available | 5 |
| CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes | Nov 1, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 5 |
| High-Fidelity Simultaneous Speech-To-Speech Translation | Feb 5, 2025 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 5 |
| ReFT: Representation Finetuning for Language Models | Apr 4, 2024 | Arithmetic Reasoning | CodeCode Available | 5 |
| LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models | Nov 8, 2023 | 8kGPU | CodeCode Available | 5 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |
| WebThinker: Empowering Large Reasoning Models with Deep Research Capability | Apr 30, 2025 | Navigate | CodeCode Available | 5 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey | Aug 19, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 5 |
| Point Transformer V3: Simpler Faster Stronger | Jan 1, 2024 | Representation Learning | CodeCode Available | 5 |
| DUET: Dual Clustering Enhanced Multivariate Time Series Forecasting | Dec 14, 2024 | Clusteringenergy management | CodeCode Available | 5 |
| TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods | Mar 29, 2024 | BenchmarkingMultivariate Time Series Forecasting | CodeCode Available | 5 |
| From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | Jun 17, 2024 | Chatbot | CodeCode Available | 5 |
| Watermark Anything with Localized Messages | Nov 11, 2024 | | CodeCode Available | 5 |
| PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | May 23, 2024 | Quantization | CodeCode Available | 5 |
| R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration | May 30, 2025 | Mathematical Reasoning | CodeCode Available | 5 |
| Differentiable Tree Search Network | Jan 22, 2024 | Decision MakingInductive Bias | CodeCode Available | 5 |
| A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? | Aug 9, 2024 | Natural Language QueriesText to SQL | CodeCode Available | 5 |
| LeVo: High-Quality Song Generation with Multi-Preference Alignment | Jun 9, 2025 | Instruction FollowingMusic Generation | CodeCode Available | 5 |