| SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Jun 1, 2025 | Denoising | CodeCode Available | 9 |
| Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Apr 11, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| AgentRxiv: Towards Collaborative Autonomous Research | Mar 23, 2025 | Mathscientific discovery | CodeCode Available | 9 |
| Natural language guidance of high-fidelity text-to-speech with synthetic annotations | Feb 2, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| Soft Condorcet Optimization for Ranking of General Agents | Oct 31, 2024 | | CodeCode Available | 9 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 |
| Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Jun 13, 2024 | DiversityImage Animation | CodeCode Available | 9 |
| SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | Oct 14, 2024 | DecoderGPU | CodeCode Available | 9 |
| Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Jun 17, 2024 | | CodeCode Available | 9 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 |
| PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | Feb 20, 2025 | Decision Making | CodeCode Available | 9 |
| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 |
| TripoSR: Fast 3D Object Reconstruction from a Single Image | Mar 4, 2024 | 3D Generation3D Object Reconstruction | CodeCode Available | 9 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 |
| AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Feb 9, 2025 | Large Language ModelRAG | CodeCode Available | 9 |
| Moshi: a speech-text foundation model for real-time dialogue | Sep 17, 2024 | Action DetectionActivity Detection | CodeCode Available | 9 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| RWKV-7 "Goose" with Expressive Dynamic State Evolution | Mar 18, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 |
| OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 |
| Robust Speech Recognition via Large-Scale Weak Supervision | Dec 6, 2022 | Robust Speech Recognitionspeech-recognition | CodeCode Available | 8 |
| Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition | Jul 17, 2023 | DecoderLanguage Modeling | CodeCode Available | 8 |
| GPT4All: An Ecosystem of Open Source Compressed Language Models | Nov 6, 2023 | | CodeCode Available | 8 |
| Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models | Oct 18, 2022 | Language ModellingSentence | CodeCode Available | 8 |
| DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis | Jun 2, 2022 | Document Layout AnalysisObject Detection | CodeCode Available | 8 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 |
| Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem | Mar 12, 2026 | | —Unverified | 7 |
| Pretraining Large Language Models with NVFP4 | Mar 4, 2026 | | —Unverified | 7 |
| Qwen3-ASR Technical Report | Jan 30, 2026 | | —Unverified | 7 |
| SAM 3D Body: Robust Full-Body Human Mesh Recovery | Feb 17, 2026 | | —Unverified | 7 |
| GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning | Feb 26, 2026 | | —Unverified | 7 |
| Advancing Open-source World Models | Jan 28, 2026 | | —Unverified | 7 |
| Attention Residuals | Mar 16, 2026 | | —Unverified | 7 |
| WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning | Mar 12, 2026 | | —Unverified | 7 |
| dLLM: Simple Diffusion Language Modeling | Feb 26, 2026 | | —Unverified | 7 |
| Robust Inverse Graphics via Probabilistic Inference | Feb 2, 2024 | NeRF | CodeCode Available | 7 |
| MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | Apr 20, 2023 | Image DescriptionLanguage Modelling | CodeCode Available | 7 |
| 2D Gaussian Splatting for Geometrically Accurate Radiance Fields | Mar 26, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 7 |
| In-Context LoRA for Diffusion Transformers | Oct 31, 2024 | Image Generation | CodeCode Available | 7 |
| SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | Oct 3, 2024 | Image GenerationQuantization | CodeCode Available | 7 |
| AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline | Oct 28, 2024 | RAGRetrieval | CodeCode Available | 7 |
| Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Jan 21, 2024 | Image Generation | CodeCode Available | 7 |
| PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | Oct 12, 2023 | Language ModellingLarge Language Model | CodeCode Available | 7 |
| FourierKAN outperforms MLP on Text Classification Head Fine-tuning | Aug 16, 2024 | ClassificationKolmogorov-Arnold Networks | CodeCode Available | 7 |
| One-Step Image Translation with Text-to-Image Models | Mar 18, 2024 | DenoisingTranslation | CodeCode Available | 7 |
| HealthBench: Evaluating Large Language Models Towards Improved Human Health | May 13, 2025 | Instruction FollowingMultiple-choice | CodeCode Available | 7 |
| OmniGen: Unified Image Generation | Sep 17, 2024 | Edge DetectionImage Generation | CodeCode Available | 7 |