| Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis | Dec 5, 2024 | Image Generation | CodeCode Available | 9 |
| Agent Laboratory: Using LLM Agents as Research Assistants | Jan 8, 2025 | scientific discovery | CodeCode Available | 9 |
| Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble | Mar 7, 2024 | Anomaly DetectionGPU | CodeCode Available | 9 |
| OpenVLA: An Open-Source Vision-Language-Action Model | Jun 13, 2024 | Imitation LearningLanguage Modelling | CodeCode Available | 9 |
| Transformer Explainer: Interactive Learning of Text-Generative Models | Aug 8, 2024 | | CodeCode Available | 9 |
| SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile | Nov 1, 2024 | | CodeCode Available | 9 |
| Emerging Properties in Unified Multimodal Pretraining | May 20, 2025 | Image Editing | CodeCode Available | 9 |
| Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Jun 10, 2024 | Mixture-of-Experts | CodeCode Available | 9 |
| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Jun 1, 2025 | Denoising | CodeCode Available | 9 |
| Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 |
| Natural language guidance of high-fidelity text-to-speech with synthetic annotations | Feb 2, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| Soft Condorcet Optimization for Ranking of General Agents | Oct 31, 2024 | | CodeCode Available | 9 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 |
| Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Jun 13, 2024 | DiversityImage Animation | CodeCode Available | 9 |
| SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | Oct 14, 2024 | DecoderGPU | CodeCode Available | 9 |
| Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Jun 17, 2024 | | CodeCode Available | 9 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 |
| PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | Feb 20, 2025 | Decision Making | CodeCode Available | 9 |
| GPT4All: An Ecosystem of Open Source Compressed Language Models | Nov 6, 2023 | | CodeCode Available | 8 |
| DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis | Jun 2, 2022 | Document Layout AnalysisObject Detection | CodeCode Available | 8 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 |
| Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition | Jul 17, 2023 | DecoderLanguage Modeling | CodeCode Available | 8 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 |
| Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models | Oct 18, 2022 | Language ModellingSentence | CodeCode Available | 8 |
| Robust Speech Recognition via Large-Scale Weak Supervision | Dec 6, 2022 | Robust Speech Recognitionspeech-recognition | CodeCode Available | 8 |
| WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning | Mar 12, 2026 | | —Unverified | 7 |
| dLLM: Simple Diffusion Language Modeling | Feb 26, 2026 | | —Unverified | 7 |
| GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning | Feb 26, 2026 | | —Unverified | 7 |
| SAM 3D Body: Robust Full-Body Human Mesh Recovery | Feb 17, 2026 | | —Unverified | 7 |
| Advancing Open-source World Models | Jan 28, 2026 | | —Unverified | 7 |
| Attention Residuals | Mar 16, 2026 | | —Unverified | 7 |
| Pretraining Large Language Models with NVFP4 | Mar 4, 2026 | | —Unverified | 7 |
| Qwen3-ASR Technical Report | Jan 30, 2026 | | —Unverified | 7 |
| Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem | Mar 12, 2026 | | —Unverified | 7 |
| Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Mar 12, 2025 | Question AnsweringRAG | CodeCode Available | 7 |
| EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation | Nov 15, 2024 | Audio-Driven Body AnimationHuman Animation | CodeCode Available | 7 |
| HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | May 28, 2025 | Image GenerationMixture-of-Experts | CodeCode Available | 7 |
| LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Feb 28, 2025 | | CodeCode Available | 7 |
| LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Feb 11, 2025 | Large Language ModelMath | CodeCode Available | 7 |
| HuixiangDou2: A Robustly Optimized GraphRAG Approach | Mar 9, 2025 | RetrievalRetrieval-augmented Generation | CodeCode Available | 7 |
| Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Oct 10, 2024 | 4kImage Animation | CodeCode Available | 7 |
| Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Mar 21, 2024 | Animated GIF GenerationImage Animation | CodeCode Available | 7 |
| LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | Sep 21, 2023 | ChatbotDiversity | CodeCode Available | 7 |
| GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | Jul 1, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 7 |
| MaskSketch: Unpaired Structure-guided Masked Image Generation | Feb 10, 2023 | Conditional Image GenerationDiversity | CodeCode Available | 7 |
| MoE-LLaVA: Mixture of Experts for Large Vision-Language Models | Jan 29, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 7 |
| Byte Latent Transformer: Patches Scale Better Than Tokens | Dec 13, 2024 | | CodeCode Available | 7 |
| Gravity-aligned Rotation Averaging with Circular Regression | Oct 16, 2024 | Mixed Realityregression | CodeCode Available | 7 |