| SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | May 6, 2024 | Bug fixingLanguage Modeling | CodeCode Available | 11 |
| HybridFlow: A Flexible and Efficient RLHF Framework | Sep 28, 2024 | Large Language Model | CodeCode Available | 11 |
| PaperBanana: Automating Academic Illustration for AI Scientists | Jan 30, 2026 | | —Unverified | 9 |
| Qwen3-TTS Technical Report | Jan 22, 2026 | | —Unverified | 9 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| Moshi: a speech-text foundation model for real-time dialogue | Sep 17, 2024 | Action DetectionActivity Detection | CodeCode Available | 9 |
| OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on | Mar 4, 2024 | DenoisingImage Generation | CodeCode Available | 9 |
| RWKV-7 "Goose" with Expressive Dynamic State Evolution | Mar 18, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 9 |
| OpenELM: An Efficient Language Model Family with Open Training and Inference Framework | Apr 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| HART: Efficient Visual Generation with Hybrid Autoregressive Transformer | Oct 14, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 9 |
| MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer | Sep 1, 2024 | Self-Supervised Learningtext-to-speech | CodeCode Available | 9 |
| FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | Nov 13, 2024 | AI Agent | CodeCode Available | 9 |
| Language agents achieve superhuman synthesis of scientific knowledge | Sep 10, 2024 | ArticlesInformation Retrieval | CodeCode Available | 9 |
| Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework | Oct 20, 2024 | Code CompletionRAG | CodeCode Available | 9 |
| StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models | Mar 12, 2024 | Benchmarking | CodeCode Available | 9 |
| Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | Oct 2, 2024 | Depth EstimationGPU | CodeCode Available | 9 |
| ORPO: Monolithic Preference Optimization without Reference Model | Mar 12, 2024 | model | CodeCode Available | 9 |
| MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Jul 2, 2024 | GPULanguage Modelling | CodeCode Available | 9 |
| Sapiens: Foundation for Human Vision Models | Aug 22, 2024 | 2D Human Pose Estimation2D Pose Estimation | CodeCode Available | 9 |
| SkyReels-V2: Infinite-length Film Generative Model | Apr 17, 2025 | Large Language Modelmodel | CodeCode Available | 9 |
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | Feb 5, 2024 | Arithmetic ReasoningMath | CodeCode Available | 9 |
| DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | Jan 5, 2024 | | CodeCode Available | 9 |
| SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer | Jan 30, 2025 | Image GenerationModel Compression | CodeCode Available | 9 |
| DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | May 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | Jun 14, 2024 | Question AnsweringRetrieval-augmented Generation | CodeCode Available | 9 |
| TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | Oct 9, 2024 | GPU | CodeCode Available | 9 |
| Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion | Jul 1, 2024 | Decision MakingPrediction | CodeCode Available | 9 |
| Liger Kernel: Efficient Triton Kernels for LLM Training | Oct 14, 2024 | ChunkingGPU | CodeCode Available | 9 |
| CogVLM2: Visual Language Models for Image and Video Understanding | Aug 29, 2024 | MM-VetMVBench | CodeCode Available | 9 |
| SuperSimpleNet: Unifying Unsupervised and Supervised Learning for Fast and Reliable Surface Defect Detection | Aug 6, 2024 | Anomaly DetectionDefect Detection | CodeCode Available | 9 |
| Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks | Jan 25, 2024 | Segmentation | CodeCode Available | 9 |
| StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation | May 2, 2024 | motion predictionStory Generation | CodeCode Available | 9 |
| Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction | Apr 3, 2024 | Image GenerationImage Reconstruction | CodeCode Available | 9 |
| LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection | Jun 5, 2024 | Decoderobject-detection | CodeCode Available | 9 |
| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 |
| Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | Jun 3, 2024 | | CodeCode Available | 9 |
| Symbolic Learning Enables Self-Evolving Agents | Jun 26, 2024 | | CodeCode Available | 9 |
| Aviary: training language agents on challenging scientific tasks | Dec 30, 2024 | | CodeCode Available | 9 |
| Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research | Nov 7, 2024 | AI AgentDecision Making | CodeCode Available | 9 |
| Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Feb 5, 2025 | Self-Supervised LearningSpeech Enhancement | CodeCode Available | 9 |
| Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting | May 20, 2025 | | CodeCode Available | 9 |
| CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark | Jan 22, 2024 | | CodeCode Available | 9 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| Yi: Open Foundation Models by 01.AI | Mar 7, 2024 | AttributeChatbot | CodeCode Available | 9 |
| Steering Language Models with Game-Theoretic Solvers | Jan 24, 2024 | Imitation LearningScheduling | CodeCode Available | 9 |
| VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild | Mar 25, 2024 | DecoderLanguage Modeling | CodeCode Available | 9 |
| (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | May 20, 2024 | Machine TranslationTranslation | CodeCode Available | 9 |
| LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model | Jun 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | Feb 9, 2025 | Large Language ModelRAG | CodeCode Available | 9 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 |