| aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Processing | Oct 17, 2024 | AttributeCode Completion | CodeCode Available | 7 | 5 |
| AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving | Jun 14, 2025 | | CodeCode Available | 7 | 5 |
| MAGI-1: Autoregressive Video Generation at Scale | May 19, 2025 | Video Generation | CodeCode Available | 7 | 5 |
| Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | May 14, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 7 | 5 |
| ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development | Jun 5, 2025 | Large Language Model | CodeCode Available | 7 | 5 |
| Kimi-Audio Technical Report | Apr 25, 2025 | Audio Question AnsweringQuestion Answering | CodeCode Available | 7 | 5 |
| Bilateral Reference for High-Resolution Dichotomous Image Segmentation | Jan 7, 2024 | Camouflaged Object SegmentationDichotomous Image Segmentation | CodeCode Available | 7 | 5 |
| EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming | Jan 21, 2025 | Feature EngineeringGPU | CodeCode Available | 7 | 5 |
| AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems | Mar 9, 2025 | | CodeCode Available | 7 | 5 |
| StarCoder 2 and The Stack v2: The Next Generation | Feb 29, 2024 | Code CompletionCode Generation | CodeCode Available | 7 | 5 |
| Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | Aug 29, 2024 | Speech Synthesis | CodeCode Available | 7 | 5 |
| Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems | Dec 12, 2024 | | CodeCode Available | 7 | 5 |
| Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers | Jan 5, 2023 | In-Context LearningLanguage Modeling | CodeCode Available | 7 | 5 |
| DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | Oct 16, 2024 | | CodeCode Available | 7 | 5 |
| Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases | Feb 5, 2024 | Prompt Engineering | CodeCode Available | 7 | 5 |
| Improving Sample Quality of Diffusion Models Using Self-Attention Guidance | Oct 3, 2022 | DenoisingDiversity | CodeCode Available | 7 | 5 |
| EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | May 29, 2024 | Image GenerationVideo Generation | CodeCode Available | 7 | 5 |
| HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters | May 26, 2025 | Human Animation | CodeCode Available | 7 | 5 |
| MagicQuill: An Intelligent Interactive Image Editing System | Nov 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training | May 16, 2025 | | CodeCode Available | 7 | 5 |
| The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding | Jun 4, 2024 | | CodeCode Available | 7 | 5 |
| Faster Video Diffusion with Trainable Sparse Attention | May 19, 2025 | | CodeCode Available | 7 | 5 |
| SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? | Oct 4, 2024 | Data Visualization | CodeCode Available | 7 | 5 |
| EasySpider: A No-Code Visual System for Crawling the Web | Apr 30, 2023 | Data IntegrationMarketing | CodeCode Available | 7 | 5 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 | 5 |
| MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors | Dec 16, 2024 | 3D Reconstructiongraph construction | CodeCode Available | 7 | 5 |
| Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought | Apr 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| Simulating 500 million years of evolution with a language model | Dec 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | Oct 24, 2024 | Image GenerationQuestion Generation | CodeCode Available | 7 | 5 |
| Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models | Oct 3, 2024 | | CodeCode Available | 7 | 5 |
| rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | Jan 8, 2025 | Math | CodeCode Available | 7 | 5 |
| Chinese-Vicuna: A Chinese Instruction-following Llama-based Model | Apr 17, 2025 | Code GenerationCPU | CodeCode Available | 7 | 5 |
| Fast Video Generation with Sliding Tile Attention | Feb 6, 2025 | Video Generation | CodeCode Available | 7 | 5 |
| CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos | Oct 15, 2024 | Point Tracking | CodeCode Available | 7 | 5 |
| Learning Multi-dimensional Human Preference for Text-to-Image Generation | May 23, 2024 | Image GenerationText to Image Generation | CodeCode Available | 7 | 5 |
| DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation | Mar 13, 2024 | Image GenerationPrompt Engineering | CodeCode Available | 7 | 5 |
| Mixture-of-Agents Enhances Large Language Model Capabilities | Jun 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 | 5 |
| Foundation Models for Time Series Analysis: A Tutorial and Survey | Mar 21, 2024 | SurveyTime Series | CodeCode Available | 7 | 5 |
| Scaling Vision Pre-Training to 4K Resolution | Mar 25, 2025 | 4kContrastive Learning | CodeCode Available | 7 | 5 |
| DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion | Mar 3, 2025 | Music Generation | CodeCode Available | 7 | 5 |
| Symmetry Considerations for Learning Task Symmetric Robot Policies | Mar 7, 2024 | Data AugmentationDeep Reinforcement Learning | CodeCode Available | 7 | 5 |
| PromptWizard: Task-Aware Prompt Optimization Framework | May 28, 2024 | Computational EfficiencyDiversity | CodeCode Available | 7 | 5 |
| ColPali: Efficient Document Retrieval with Vision Language Models | Jun 27, 2024 | document understandingRAG | CodeCode Available | 7 | 5 |
| Large Language Diffusion Models | Feb 14, 2025 | In-Context LearningInstruction Following | CodeCode Available | 7 | 5 |
| Chameleon: Mixed-Modal Early-Fusion Foundation Models | May 16, 2024 | Image CaptioningImage Generation | CodeCode Available | 7 | 5 |
| MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | Oct 14, 2023 | Image ClassificationImage Description | CodeCode Available | 7 | 5 |
| AudioLM: a Language Modeling Approach to Audio Generation | Sep 7, 2022 | Audio Generation | CodeCode Available | 7 | 5 |
| Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model | Mar 31, 2025 | | CodeCode Available | 7 | 5 |
| Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Mar 27, 2024 | Image ClassificationImage Comprehension | CodeCode Available | 7 | 5 |