| Soft Condorcet Optimization for Ranking of General Agents | Oct 31, 2024 | | CodeCode Available | 9 | 5 |
| General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model | Sep 3, 2024 | DecoderMath | CodeCode Available | 9 | 5 |
| Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Jun 13, 2024 | DiversityImage Animation | CodeCode Available | 9 | 5 |
| SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | Oct 14, 2024 | DecoderGPU | CodeCode Available | 9 | 5 |
| Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation | Jun 17, 2024 | | CodeCode Available | 9 | 5 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 | 5 |
| PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | Feb 20, 2025 | Decision Making | CodeCode Available | 9 | 5 |
| Moonshine: Speech Recognition for Live Transcription and Voice Commands | Oct 21, 2024 | DecoderPosition | CodeCode Available | 9 | 5 |
| TripoSR: Fast 3D Object Reconstruction from a Single Image | Mar 4, 2024 | 3D Generation3D Object Reconstruction | CodeCode Available | 9 | 5 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 | 5 |
| AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Feb 9, 2025 | Large Language ModelRAG | CodeCode Available | 9 | 5 |
| Moshi: a speech-text foundation model for real-time dialogue | Sep 17, 2024 | Action DetectionActivity Detection | CodeCode Available | 9 | 5 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 | 5 |
| RWKV-7 "Goose" with Expressive Dynamic State Evolution | Mar 18, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 9 | 5 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 | 5 |
| OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 | 5 |
| Perception Encoder: The best visual embeddings are not at the output of the network | Apr 17, 2025 | Depth EstimationLanguage Modeling | CodeCode Available | 8 | 5 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | Jul 18, 2023 | Arithmetic Reasoning | CodeCode Available | 8 | 5 |
| Robust Speech Recognition via Large-Scale Weak Supervision | Dec 6, 2022 | Robust Speech Recognitionspeech-recognition | CodeCode Available | 8 | 5 |
| Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition | Jul 17, 2023 | DecoderLanguage Modeling | CodeCode Available | 8 | 5 |
| GPT4All: An Ecosystem of Open Source Compressed Language Models | Nov 6, 2023 | | CodeCode Available | 8 | 5 |
| Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models | Oct 18, 2022 | Language ModellingSentence | CodeCode Available | 8 | 5 |
| DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis | Jun 2, 2022 | Document Layout AnalysisObject Detection | CodeCode Available | 8 | 5 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 | 5 |
| LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds | Mar 13, 2025 | 3D Human Reconstruction | CodeCode Available | 7 | 5 |
| SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization | Nov 17, 2024 | Image GenerationQuantization | CodeCode Available | 7 | 5 |
| Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Jan 21, 2024 | Image Generation | CodeCode Available | 7 | 5 |
| Transparent Image Layer Diffusion using Latent Transparency | Feb 27, 2024 | | CodeCode Available | 7 | 5 |
| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 | 5 |
| AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline | Oct 28, 2024 | RAGRetrieval | CodeCode Available | 7 | 5 |
| Robust Inverse Graphics via Probabilistic Inference | Feb 2, 2024 | NeRF | CodeCode Available | 7 | 5 |
| Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | Jun 13, 2024 | Instruction FollowingMath | CodeCode Available | 7 | 5 |
| From Bytes to Ideas: Language Modeling with Autoregressive U-Nets | Jun 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| One-Step Image Translation with Text-to-Image Models | Mar 18, 2024 | DenoisingTranslation | CodeCode Available | 7 | 5 |
| PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation | Jan 20, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| 2D Gaussian Splatting for Geometrically Accurate Radiance Fields | Mar 26, 2024 | 3DGSNovel View Synthesis | CodeCode Available | 7 | 5 |
| MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | Apr 20, 2023 | Image DescriptionLanguage Modelling | CodeCode Available | 7 | 5 |
| In-Context LoRA for Diffusion Transformers | Oct 31, 2024 | Image Generation | CodeCode Available | 7 | 5 |
| SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | Oct 3, 2024 | Image GenerationQuantization | CodeCode Available | 7 | 5 |
| xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism | Nov 4, 2024 | GPU | CodeCode Available | 7 | 5 |
| Domain Expansion of Image Generators | Jan 12, 2023 | | CodeCode Available | 7 | 5 |
| CALE: Continuous Arcade Learning Environment | Oct 31, 2024 | Atari GamesBenchmarking | CodeCode Available | 7 | 5 |
| Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Feb 14, 2025 | Video GenerationVideo Reconstruction | CodeCode Available | 7 | 5 |
| FourierKAN outperforms MLP on Text Classification Head Fine-tuning | Aug 16, 2024 | ClassificationKolmogorov-Arnold Networks | CodeCode Available | 7 | 5 |
| Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | Oct 12, 2023 | Language ModellingLarge Language Model | CodeCode Available | 7 | 5 |
| HealthBench: Evaluating Large Language Models Towards Improved Human Health | May 13, 2025 | Instruction FollowingMultiple-choice | CodeCode Available | 7 | 5 |
| OmniGen: Unified Image Generation | Sep 17, 2024 | Edge DetectionImage Generation | CodeCode Available | 7 | 5 |
| Fast Timing-Conditioned Latent Audio Diffusion | Feb 7, 2024 | Audio GenerationGPU | CodeCode Available | 7 | 5 |
| Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation | Jun 24, 2024 | parameter-efficient fine-tuningSentence | CodeCode Available | 7 | 5 |
| PuLID: Pure and Lightning ID Customization via Contrastive Alignment | Apr 24, 2024 | Image GenerationText to Image Generation | CodeCode Available | 7 | 5 |