| DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence | Jan 25, 2024 | Code GenerationLanguage Modeling | CodeCode Available | 11 |
| InstantID: Zero-shot Identity-Preserving Generation in Seconds | Jan 15, 2024 | Diffusion PersonalizationDiffusion Personalization Tuning Free | CodeCode Available | 11 |
| TinyLlama: An Open-Source Small Language Model | Jan 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 11 |
| PaperBanana: Automating Academic Illustration for AI Scientists | Jan 30, 2026 | | —Unverified | 9 |
| Qwen3-TTS Technical Report | Jan 22, 2026 | | —Unverified | 9 |
| Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | Jul 14, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 9 |
| MiniCPM4: Ultra-Efficient LLMs on End Devices | Jun 9, 2025 | Large Language Model | CodeCode Available | 9 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 |
| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Jun 1, 2025 | Denoising | CodeCode Available | 9 |
| Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting | May 20, 2025 | | CodeCode Available | 9 |
| Emerging Properties in Unified Multimodal Pretraining | May 20, 2025 | Image Editing | CodeCode Available | 9 |
| UFO2: The Desktop AgentOS | Apr 20, 2025 | | CodeCode Available | 9 |
| SkyReels-V2: Infinite-length Film Generative Model | Apr 17, 2025 | Large Language Modelmodel | CodeCode Available | 9 |
| VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | Apr 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation | Mar 31, 2025 | RAGRetrieval | CodeCode Available | 9 |
| PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition | Mar 24, 2025 | | CodeCode Available | 9 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 |
| PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction | Mar 21, 2025 | CPUDocument Layout Analysis | CodeCode Available | 9 |
| RWKV-7 "Goose" with Expressive Dynamic State Evolution | Mar 18, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| YuE: Scaling Open Foundation Models for Long-Form Music Generation | Mar 11, 2025 | FormIn-Context Learning | CodeCode Available | 9 |
| A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications | Mar 10, 2025 | Continual LearningMeta-Learning | CodeCode Available | 9 |
| PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | Feb 20, 2025 | Decision Making | CodeCode Available | 9 |
| AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Feb 9, 2025 | Large Language ModelRAG | CodeCode Available | 9 |
| Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Feb 5, 2025 | Self-Supervised LearningSpeech Enhancement | CodeCode Available | 9 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer | Jan 30, 2025 | Image GenerationModel Compression | CodeCode Available | 9 |
| Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation | Jan 27, 2025 | | CodeCode Available | 9 |
| Overview of the Amphion Toolkit (v0.2) | Jan 26, 2025 | text-to-speechText to Speech | CodeCode Available | 9 |
| Agent Laboratory: Using LLM Agents as Research Assistants | Jan 8, 2025 | scientific discovery | CodeCode Available | 9 |
| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 |
| 2 OLMo 2 Furious | Dec 31, 2024 | | CodeCode Available | 9 |
| Aviary: training language agents on challenging scientific tasks | Dec 30, 2024 | | CodeCode Available | 9 |
| LTX-Video: Realtime Video Latent Diffusion | Dec 30, 2024 | DenoisingGPU | CodeCode Available | 9 |
| Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models | Dec 23, 2024 | CPU | CodeCode Available | 9 |
| FastVLM: Efficient Vision Encoding for Vision Language Models | Dec 17, 2024 | | CodeCode Available | 9 |
| Large Action Models: From Inception to Implementation | Dec 13, 2024 | Action Generation | CodeCode Available | 9 |
| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 |
| LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync | Dec 12, 2024 | Portrait Animation | CodeCode Available | 9 |
| Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis | Dec 5, 2024 | Image Generation | CodeCode Available | 9 |
| SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Nov 18, 2024 | Object TrackingVisual Object Tracking | CodeCode Available | 9 |
| FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Nov 13, 2024 | AI Agent | CodeCode Available | 9 |
| Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research | Nov 7, 2024 | AI AgentDecision Making | CodeCode Available | 9 |
| SkyServe: Serving AI Models across Regions and Clouds with Spot Instances | Nov 3, 2024 | | CodeCode Available | 9 |
| SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile | Nov 1, 2024 | | CodeCode Available | 9 |
| Soft Condorcet Optimization for Ranking of General Agents | Oct 31, 2024 | | CodeCode Available | 9 |
| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 |
| Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework | Oct 20, 2024 | Code CompletionRAG | CodeCode Available | 9 |
| DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception | Oct 16, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 9 |
| HART: Efficient Visual Generation with Hybrid Autoregressive Transformer | Oct 14, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 9 |
| Liger Kernel: Efficient Triton Kernels for LLM Training | Oct 14, 2024 | ChunkingGPU | CodeCode Available | 9 |