| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 | 5 |
| LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync | Dec 12, 2024 | Portrait Animation | CodeCode Available | 9 | 5 |
| FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | May 23, 2024 | AI AgentDecision Making | CodeCode Available | 9 | 5 |
| MiniCPM4: Ultra-Efficient LLMs on End Devices | Jun 9, 2025 | Large Language Model | CodeCode Available | 9 | 5 |
| Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding | Jul 14, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 9 | 5 |
| Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models | Dec 23, 2024 | CPU | CodeCode Available | 9 | 5 |
| OLMo: Accelerating the Science of Language Models | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Apr 9, 2024 | Domain Adaptation | CodeCode Available | 9 | 5 |
| UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation | Mar 31, 2025 | RAGRetrieval | CodeCode Available | 9 | 5 |
| Model Stock: All we need is just a few fine-tuned models | Mar 28, 2024 | All | CodeCode Available | 9 | 5 |
| CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion | May 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| Large Action Models: From Inception to Implementation | Dec 13, 2024 | Action Generation | CodeCode Available | 9 | 5 |
| A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications | Mar 10, 2025 | Continual LearningMeta-Learning | CodeCode Available | 9 | 5 |
| 2 OLMo 2 Furious | Dec 31, 2024 | | CodeCode Available | 9 | 5 |
| LTX-Video: Realtime Video Latent Diffusion | Dec 30, 2024 | DenoisingGPU | CodeCode Available | 9 | 5 |
| VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models | Jan 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 9 | 5 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| FastVLM: Efficient Vision Encoding for Vision Language Models | Dec 17, 2024 | | CodeCode Available | 9 | 5 |
| Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | Jan 19, 2024 | Data AugmentationDepth Estimation | CodeCode Available | 9 | 5 |
| Arcee's MergeKit: A Toolkit for Merging Large Language Models | Mar 20, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| SkyServe: Serving AI Models across Regions and Clouds with Spot Instances | Nov 3, 2024 | | CodeCode Available | 9 | 5 |
| PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition | Mar 24, 2025 | | CodeCode Available | 9 | 5 |
| When Do We Not Need Larger Vision Models? | Mar 19, 2024 | Depth Estimation | CodeCode Available | 9 | 5 |
| garak: A Framework for Security Probing Large Language Models | Jun 16, 2024 | Red Teaming | CodeCode Available | 9 | 5 |
| LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | Mar 19, 2024 | GSM8KLanguage Modelling | CodeCode Available | 9 | 5 |
| Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment | Oct 12, 2024 | Language ModellingPhilosophy | CodeCode Available | 9 | 5 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 | 5 |
| SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory | Nov 18, 2024 | Object TrackingVisual Object Tracking | CodeCode Available | 9 | 5 |
| InternLM2 Technical Report | Mar 26, 2024 | 4kLong-Context Understanding | CodeCode Available | 9 | 5 |
| DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception | Oct 16, 2024 | Document Layout Analysisdocument understanding | CodeCode Available | 9 | 5 |
| PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction | Mar 21, 2025 | CPUDocument Layout Analysis | CodeCode Available | 9 | 5 |
| VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | Apr 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| UFO: A UI-Focused Agent for Windows OS Interaction | Feb 8, 2024 | Navigate | CodeCode Available | 9 | 5 |
| AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | Mar 26, 2024 | DiversityFace Reenactment | CodeCode Available | 9 | 5 |
| RULER: What's the Real Context Size of Your Long-Context Language Models? | Apr 9, 2024 | Long-Context Understanding | CodeCode Available | 9 | 5 |
| MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | Jul 29, 2024 | 2D Semantic Segmentation task 1 (8 classes)graph construction | CodeCode Available | 9 | 5 |
| Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation | Jan 27, 2025 | | CodeCode Available | 9 | 5 |
| Overview of the Amphion Toolkit (v0.2) | Jan 26, 2025 | text-to-speechText to Speech | CodeCode Available | 9 | 5 |
| Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis | Dec 5, 2024 | Image Generation | CodeCode Available | 9 | 5 |
| Agent Laboratory: Using LLM Agents as Research Assistants | Jan 8, 2025 | scientific discovery | CodeCode Available | 9 | 5 |
| Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble | Mar 7, 2024 | Anomaly DetectionGPU | CodeCode Available | 9 | 5 |
| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 | 5 |
| Transformer Explainer: Interactive Learning of Text-Generative Models | Aug 8, 2024 | | CodeCode Available | 9 | 5 |
| SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile | Nov 1, 2024 | | CodeCode Available | 9 | 5 |
| Emerging Properties in Unified Multimodal Pretraining | May 20, 2025 | Image Editing | CodeCode Available | 9 | 5 |
| Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Jun 10, 2024 | Mixture-of-Experts | CodeCode Available | 9 | 5 |
| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Jun 1, 2025 | Denoising | CodeCode Available | 9 | 5 |
| Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 | 5 |
| Natural language guidance of high-fidelity text-to-speech with synthetic annotations | Feb 2, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 9 | 5 |