| YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | Feb 21, 2024 | object-detectionObject Detection | CodeCode Available | 16 |
| MinerU: An Open-Source Solution for Precise Document Content Extraction | Sep 27, 2024 | DiversityOptical Character Recognition (OCR) | CodeCode Available | 16 |
| SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 15 |
| YOLOv11: An Overview of the Key Architectural Enhancements | Oct 23, 2024 | Computational EfficiencyInstance Segmentation | CodeCode Available | 15 |
| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | Jan 22, 2025 | Mathematical ReasoningMulti-task Language Understanding | CodeCode Available | 15 |
| DeepSeek-V3 Technical Report | Dec 27, 2024 | GPULanguage Modeling | CodeCode Available | 15 |
| Docling Technical Report | Aug 19, 2024 | | CodeCode Available | 15 |
| AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | Aug 9, 2024 | | CodeCode Available | 15 |
| OpenHands: An Open Platform for AI Software Developers as Generalist Agents | Jul 23, 2024 | | CodeCode Available | 15 |
| Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory | Apr 28, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 15 |
| LightRAG: Simple and Fast Retrieval-Augmented Generation | Oct 8, 2024 | Information RetrievalRAG | CodeCode Available | 14 |
| Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 14 |
| Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | Feb 22, 2024 | ArticlesRetrieval | CodeCode Available | 14 |
| TradingAgents: Multi-Agents LLM Financial Trading Framework | Dec 28, 2024 | Management | CodeCode Available | 14 |
| Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking | Mar 14, 2025 | AllLarge Language Model | CodeCode Available | 13 |
| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 13 |
| UI-TARS: Pioneering Automated GUI Interaction with Native Agents | Jan 21, 2025 | | CodeCode Available | 13 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization | May 21, 2025 | Code GenerationModel Optimization | CodeCode Available | 13 |
| 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs | Oct 21, 2024 | | CodeCode Available | 13 |
| Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k | Mar 12, 2025 | Video Generation | CodeCode Available | 13 |
| Open-Sora: Democratizing Efficient Video Production for All | Dec 29, 2024 | AllImage Generation | CodeCode Available | 13 |
| Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Feb 17, 2025 | | CodeCode Available | 13 |
| FLUX that Plays Music | Sep 1, 2024 | Music GenerationText-to-Music Generation | CodeCode Available | 13 |
| Autonomous Agents for Collaborative Task under Information Asymmetry | Jun 21, 2024 | Language ModellingLarge Language Model | CodeCode Available | 13 |
| Qwen3 Technical Report | May 14, 2025 | Code GenerationMathematical Reasoning | CodeCode Available | 13 |
| From Local to Global: A Graph RAG Approach to Query-Focused Summarization | Apr 24, 2024 | Query-focused SummarizationQuestion Answering | CodeCode Available | 13 |
| Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | Mar 7, 2024 | Chatbot | CodeCode Available | 13 |
| Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | Aug 27, 2024 | Sentiment Analysis | CodeCode Available | 13 |
| Qwen2.5 Technical Report | Dec 19, 2024 | Common Sense Reasoning | CodeCode Available | 13 |
| MiniCPM-V: A GPT-4V Level MLLM on Your Phone | Aug 3, 2024 | HallucinationMultiple-choice | CodeCode Available | 12 |
| Zep: A Temporal Knowledge Graph Architecture for Agent Memory | Jan 20, 2025 | Large Language ModelRAG | CodeCode Available | 12 |
| OmniParser for Pure Vision Based GUI Agent | Aug 1, 2024 | Natural Language Visual Grounding | CodeCode Available | 12 |
| DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints | Jan 26, 2026 | | —Unverified | 11 |
| Qwen3-Coder-Next Technical Report | Feb 28, 2026 | | —Unverified | 11 |
| InstantID: Zero-shot Identity-Preserving Generation in Seconds | Jan 15, 2024 | Diffusion PersonalizationDiffusion Personalization Tuning Free | CodeCode Available | 11 |
| KAN 2.0: Kolmogorov-Arnold Networks Meet Science | Aug 19, 2024 | Kolmogorov-Arnold Networksscientific discovery | CodeCode Available | 11 |
| Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence | Apr 8, 2024 | | CodeCode Available | 11 |
| USP: A Unified Sequence Parallelism Approach for Long Context Generative AI | May 13, 2024 | | CodeCode Available | 11 |
| Mixtures of Experts Unlock Parameter Scaling for Deep RL | Feb 13, 2024 | reinforcement-learningReinforcement Learning | CodeCode Available | 11 |
| CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | May 23, 2025 | Automatic Speech RecognitionEmotion Recognition | CodeCode Available | 11 |
| BioMamba: Leveraging Spectro-Temporal Embedding in Bidirectional Mamba for Enhanced Biosignal Classification | Mar 14, 2025 | Mamba | CodeCode Available | 11 |
| EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | Jan 11, 2024 | | CodeCode Available | 11 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | CodeCode Available | 11 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 |
| Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models | Mar 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 11 |
| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 |
| Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | Sep 18, 2024 | Natural Language Visual Grounding | CodeCode Available | 11 |
| F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching | Oct 9, 2024 | Denoisingtext-to-speech | CodeCode Available | 11 |
| On the Design and Analysis of LLM-Based Algorithms | Jul 20, 2024 | Prompt Engineering | CodeCode Available | 11 |