| Goku: Flow Based Video Generative Foundation Models | Feb 7, 2025 | Image GenerationText to Image Generation | CodeCode Available | 7 | 5 |
| NVILA: Efficient Frontier Visual Language Models | Dec 5, 2024 | Video Question Answering | CodeCode Available | 7 | 5 |
| OpenVoice: Versatile Instant Voice Cloning | Dec 3, 2023 | RhythmVoice Cloning | CodeCode Available | 7 | 5 |
| Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile | Feb 10, 2025 | Video Generation | CodeCode Available | 7 | 5 |
| Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration | Apr 24, 2024 | ManagementPrompt Engineering | CodeCode Available | 7 | 5 |
| Byte Latent Transformer: Patches Scale Better Than Tokens | Dec 13, 2024 | | CodeCode Available | 7 | 5 |
| EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation | Nov 15, 2024 | Audio-Driven Body AnimationHuman Animation | CodeCode Available | 7 | 5 |
| OmniGen2: Exploration to Advanced Multimodal Generation | Jun 23, 2025 | Image Generationmultimodal generation | CodeCode Available | 7 | 5 |
| LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation | Feb 7, 2024 | | CodeCode Available | 7 | 5 |
| Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Mar 21, 2024 | Animated GIF GenerationImage Animation | CodeCode Available | 7 | 5 |
| GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | Jul 1, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 7 | 5 |
| Gravity-aligned Rotation Averaging with Circular Regression | Oct 16, 2024 | Mixed Realityregression | CodeCode Available | 7 | 5 |
| LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | Feb 11, 2025 | Large Language ModelMath | CodeCode Available | 7 | 5 |
| Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | Mar 12, 2025 | Question AnsweringRAG | CodeCode Available | 7 | 5 |
| HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | May 28, 2025 | Image GenerationMixture-of-Experts | CodeCode Available | 7 | 5 |
| LLM Post-Training: A Deep Dive into Reasoning Large Language Models | Feb 28, 2025 | | CodeCode Available | 7 | 5 |
| Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Oct 10, 2024 | 4kImage Animation | CodeCode Available | 7 | 5 |
| LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | Sep 21, 2023 | ChatbotDiversity | CodeCode Available | 7 | 5 |
| T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Mar 21, 2024 | Contrastive LearningDescriptive | CodeCode Available | 7 | 5 |
| HuixiangDou2: A Robustly Optimized GraphRAG Approach | Mar 9, 2025 | RetrievalRetrieval-augmented Generation | CodeCode Available | 7 | 5 |
| MaskSketch: Unpaired Structure-guided Masked Image Generation | Feb 10, 2023 | Conditional Image GenerationDiversity | CodeCode Available | 7 | 5 |
| MoE-LLaVA: Mixture of Experts for Large Vision-Language Models | Jan 29, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 7 | 5 |
| Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction | Feb 17, 2025 | Instruction FollowingVoice Cloning | CodeCode Available | 7 | 5 |
| Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | May 23, 2024 | GSM8KMixture-of-Experts | CodeCode Available | 7 | 5 |
| Step1X-Edit: A Practical Framework for General Image Editing | Apr 24, 2025 | Image Editing | CodeCode Available | 7 | 5 |
| LLaVA-CoT: Let Vision Language Models Reason Step-by-Step | Nov 15, 2024 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 7 | 5 |
| Zero-shot Voice Conversion with Diffusion Transformers | Nov 15, 2024 | In-Context LearningVoice Conversion | CodeCode Available | 7 | 5 |
| xLSTM: Extended Long Short-Term Memory | May 7, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 7 | 5 |
| Full Scaling Automation for Sustainable Development of Green Data Centers | May 1, 2023 | Cloud ComputingCPU | CodeCode Available | 7 | 5 |
| LLaMA: Open and Efficient Foundation Language Models | Feb 27, 2023 | Arithmetic ReasoningCode Generation | CodeCode Available | 7 | 5 |
| Direct Preference Optimization: Your Language Model is Secretly a Reward Model | May 29, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 | 5 |
| Vision Transformers Need Registers | Sep 28, 2023 | Object DiscoverySelf-Supervised Image Classification | CodeCode Available | 6 | 5 |
| iTransformer: Inverted Transformers Are Effective for Time Series Forecasting | Oct 10, 2023 | Time SeriesTime Series Forecasting | CodeCode Available | 6 | 5 |
| L-Eval: Instituting Standardized Evaluation for Long Context Language Models | Jul 20, 2023 | Instruction Following | CodeCode Available | 6 | 5 |
| Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone | Oct 30, 2023 | Disentanglement | CodeCode Available | 6 | 5 |
| RWKV: Reinventing RNNs for the Transformer Era | May 22, 2023 | Computational EfficiencyNatural Language Inference | CodeCode Available | 6 | 5 |
| A Watermark for Large Language Models | Jan 24, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 6 | 5 |
| Instant Neural Graphics Primitives with a Multiresolution Hash Encoding | Jan 16, 2022 | 3D Reconstruction3D Shape Reconstruction | CodeCode Available | 6 | 5 |
| SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | Nov 18, 2022 | Quantization | CodeCode Available | 6 | 5 |
| Mistral 7B | Oct 10, 2023 | answerability predictionArithmetic Reasoning | CodeCode Available | 6 | 5 |
| Visual Instruction Tuning | Apr 17, 2023 | 1 Image, 2*2 Stitching3D Question Answering (3D-QA) | CodeCode Available | 6 | 5 |
| A decoder-only foundation model for time-series forecasting | Oct 14, 2023 | DecoderTime Series | CodeCode Available | 6 | 5 |
| RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback | Dec 1, 2023 | HallucinationImage Captioning | CodeCode Available | 6 | 5 |
| CVNets: High Performance Library for Computer Vision | Jun 4, 2022 | Video UnderstandingVocal Bursts Intensity Prediction | CodeCode Available | 6 | 5 |
| Better speech synthesis through scaling | May 12, 2023 | Image GenerationSpeech Synthesis | CodeCode Available | 6 | 5 |
| YaRN: Efficient Context Window Extension of Large Language Models | Aug 31, 2023 | Position | CodeCode Available | 6 | 5 |
| H2O Open Ecosystem for State-of-the-art Large Language Models | Oct 17, 2023 | | CodeCode Available | 6 | 5 |
| Towards Robust Blind Face Restoration with Codebook Lookup Transformer | Jun 22, 2022 | Blind Face RestorationPrediction | CodeCode Available | 6 | 5 |
| Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | Apr 3, 2023 | Common Sense ReasoningCoreference Resolution | CodeCode Available | 6 | 5 |
| FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | Jul 17, 2023 | GPULanguage Modeling | CodeCode Available | 6 | 5 |