| MinerU: An Open-Source Solution for Precise Document Content Extraction | Sep 27, 2024 | DiversityOptical Character Recognition (OCR) | CodeCode Available | 16 |
| YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information | Feb 21, 2024 | object-detectionObject Detection | CodeCode Available | 16 |
| Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory | Apr 28, 2025 | RAGRetrieval-augmented Generation | CodeCode Available | 15 |
| SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 15 |
| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | Jan 22, 2025 | Mathematical ReasoningMulti-task Language Understanding | CodeCode Available | 15 |
| DeepSeek-V3 Technical Report | Dec 27, 2024 | GPULanguage Modeling | CodeCode Available | 15 |
| YOLOv11: An Overview of the Key Architectural Enhancements | Oct 23, 2024 | Computational EfficiencyInstance Segmentation | CodeCode Available | 15 |
| Docling Technical Report | Aug 19, 2024 | | CodeCode Available | 15 |
| AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | Aug 9, 2024 | | CodeCode Available | 15 |
| OpenHands: An Open Platform for AI Software Developers as Generalist Agents | Jul 23, 2024 | | CodeCode Available | 15 |
| TradingAgents: Multi-Agents LLM Financial Trading Framework | Dec 28, 2024 | Management | CodeCode Available | 14 |
| LightRAG: Simple and Fast Retrieval-Augmented Generation | Oct 8, 2024 | Information RetrievalRAG | CodeCode Available | 14 |
| Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs | Jun 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 14 |
| Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | Feb 22, 2024 | ArticlesRetrieval | CodeCode Available | 14 |
| R&D-Agent-Quant: A Multi-Agent Framework for Data-Centric Factors and Model Joint Optimization | May 21, 2025 | Code GenerationModel Optimization | CodeCode Available | 13 |
| Qwen3 Technical Report | May 14, 2025 | Code GenerationMathematical Reasoning | CodeCode Available | 13 |
| Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking | Mar 14, 2025 | AllLarge Language Model | CodeCode Available | 13 |
| Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k | Mar 12, 2025 | Video Generation | CodeCode Available | 13 |
| Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Feb 17, 2025 | | CodeCode Available | 13 |
| UI-TARS: Pioneering Automated GUI Interaction with Native Agents | Jan 21, 2025 | | CodeCode Available | 13 |
| Open-Sora: Democratizing Efficient Video Production for All | Dec 29, 2024 | AllImage Generation | CodeCode Available | 13 |
| Qwen2.5 Technical Report | Dec 19, 2024 | Common Sense Reasoning | CodeCode Available | 13 |
| 1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs | Oct 21, 2024 | | CodeCode Available | 13 |
| FLUX that Plays Music | Sep 1, 2024 | Music GenerationText-to-Music Generation | CodeCode Available | 13 |
| Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | Aug 27, 2024 | Sentiment Analysis | CodeCode Available | 13 |
| Qwen2 Technical Report | Jul 15, 2024 | Arithmetic ReasoningGSM8K | CodeCode Available | 13 |
| Autonomous Agents for Collaborative Task under Information Asymmetry | Jun 21, 2024 | Language ModellingLarge Language Model | CodeCode Available | 13 |
| ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Jun 18, 2024 | AllGSM8K | CodeCode Available | 13 |
| From Local to Global: A Graph RAG Approach to Query-Focused Summarization | Apr 24, 2024 | Query-focused SummarizationQuestion Answering | CodeCode Available | 13 |
| Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | Mar 7, 2024 | Chatbot | CodeCode Available | 13 |
| Zep: A Temporal Knowledge Graph Architecture for Agent Memory | Jan 20, 2025 | Large Language ModelRAG | CodeCode Available | 12 |
| MiniCPM-V: A GPT-4V Level MLLM on Your Phone | Aug 3, 2024 | HallucinationMultiple-choice | CodeCode Available | 12 |
| OmniParser for Pure Vision Based GUI Agent | Aug 1, 2024 | Natural Language Visual Grounding | CodeCode Available | 12 |
| Qwen3-Coder-Next Technical Report | Feb 28, 2026 | | —Unverified | 11 |
| DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints | Jan 26, 2026 | | —Unverified | 11 |
| WebSailor: Navigating Super-human Reasoning for Web Agent | Jul 3, 2025 | | CodeCode Available | 11 |
| SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Jun 2, 2025 | Action GenerationGPU | CodeCode Available | 11 |
| OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation | May 29, 2025 | Large Language Model | CodeCode Available | 11 |
| WebDancer: Towards Autonomous Information Seeking Agency | May 28, 2025 | | CodeCode Available | 11 |
| CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | May 23, 2025 | Automatic Speech RecognitionEmotion Recognition | CodeCode Available | 11 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 |
| Packing Input Frame Context in Next-Frame Prediction Models for Video Generation | Apr 17, 2025 | | CodeCode Available | 11 |
| Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | Apr 1, 2025 | AI AgentTask Planning | CodeCode Available | 11 |
| EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell Analysis | Mar 30, 2025 | Deep LearningSegmentation | CodeCode Available | 11 |
| Wan: Open and Advanced Large-Scale Video Generative Models | Mar 26, 2025 | Video EditingVideo Generation | CodeCode Available | 11 |
| BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems | Mar 18, 2025 | CPUMath | CodeCode Available | 11 |
| Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models | Mar 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 11 |
| BioMamba: Leveraging Spectro-Temporal Embedding in Bidirectional Mamba for Enhanced Biosignal Classification | Mar 14, 2025 | Mamba | CodeCode Available | 11 |
| VGGT: Visual Geometry Grounded Transformer | Mar 14, 2025 | Depth EstimationNovel View Synthesis | CodeCode Available | 11 |
| YOLOE: Real-Time Seeing Anything | Mar 10, 2025 | 10-shot image generation | CodeCode Available | 11 |