Open-Sora: Democratizing Efficient Video Production for All Dec 29, 2024 All Image Generation
Code Code Available 13Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Jan 29, 2025 Image Generation
Code Code Available 11InstantID: Zero-shot Identity-Preserving Generation in Seconds Jan 15, 2024 Diffusion Personalization Diffusion Personalization Tuning Free
Code Code Available 11Emerging Properties in Unified Multimodal Pretraining May 20, 2025 Image Editing
Code Code Available 9SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Jan 30, 2025 Image Generation Model Compression
Code Code Available 9Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Dec 5, 2024 Image Generation
Code Code Available 9HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Oct 14, 2024 Image Generation Image Reconstruction
Code Code Available 9SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Oct 14, 2024 Decoder GPU
Code Code Available 9Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Apr 3, 2024 Image Generation Image Reconstruction
Code Code Available 9OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Mar 4, 2024 Denoising Image Generation
Code Code Available 9OmniGen2: Exploration to Advanced Multimodal Generation Jun 23, 2025 Image Generation multimodal generation
Code Code Available 7HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer May 28, 2025 Image Generation Mixture-of-Experts
Code Code Available 7InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Mar 20, 2025 Image Generation
Code Code Available 7Goku: Flow Based Video Generative Foundation Models Feb 7, 2025 Image Generation Text to Image Generation
Code Code Available 7SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Nov 17, 2024 Image Generation Quantization
Code Code Available 7In-Context LoRA for Diffusion Transformers Oct 31, 2024 Image Generation
Code Code Available 7Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Oct 24, 2024 Image Generation Question Generation
Code Code Available 7SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Oct 3, 2024 Image Generation Quantization
Code Code Available 7OmniGen: Unified Image Generation Sep 17, 2024 Edge Detection Image Generation
Code Code Available 7Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Aug 5, 2024 Decoder Depth Estimation
Code Code Available 7Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT Jun 5, 2024 Image Generation Point Cloud Generation
Code Code Available 7EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture May 29, 2024 Image Generation Video Generation
Code Code Available 7Learning Multi-dimensional Human Preference for Text-to-Image Generation May 23, 2024 Image Generation Text to Image Generation
Code Code Available 7Chameleon: Mixed-Modal Early-Fusion Foundation Models May 16, 2024 Image Captioning Image Generation
Code Code Available 7Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding May 14, 2024 Image Generation Language Modeling
Code Code Available 7PuLID: Pure and Lightning ID Customization via Contrastive Alignment Apr 24, 2024 Image Generation Text to Image Generation
Code Code Available 7InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Apr 3, 2024 Image Generation Text to Image Generation
Code Code Available 7DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation Mar 13, 2024 Image Generation Prompt Engineering
Code Code Available 7Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Jan 21, 2024 Image Generation
Code Code Available 7PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models Jan 10, 2024 GPU Image Generation
Code Code Available 7MaskSketch: Unpaired Structure-guided Masked Image Generation Feb 10, 2023 Conditional Image Generation Diversity
Code Code Available 7Adding Conditional Control to Text-to-Image Diffusion Models Feb 10, 2023 Image Generation Layout-to-Image Generation
Code Code Available 7Improving Sample Quality of Diffusion Models Using Self-Attention Guidance Oct 3, 2022 Denoising Diversity
Code Code Available 7StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Dec 19, 2023 Denoising Image Generation
Code Code Available 6PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Dec 7, 2023 Diffusion Personalization Diffusion Personalization Tuning Free
Code Code Available 6Adversarial Diffusion Distillation Nov 28, 2023 Image Generation
Code Code Available 6Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages Aug 23, 2023 Image Generation Image to text
Code Code Available 6Better speech synthesis through scaling May 12, 2023 Image Generation Speech Synthesis
Code Code Available 6Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Nov 15, 2022 All Disentanglement
Code Code Available 6Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models Jul 26, 2022 Image Generation Prompt Engineering
Code Code Available 6Semi-Parametric Neural Image Synthesis Apr 25, 2022 Image Generation Retrieval
Code Code Available 6Pseudo Numerical Methods for Diffusion Models on Manifolds Feb 20, 2022 Denoising Image Generation
Code Code Available 6BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset May 14, 2025 Image Generation
Code Code Available 5Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities May 5, 2025 Image Generation Survey
Code Code Available 5InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Apr 16, 2025 Image Generation
Code Code Available 5Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Apr 2, 2025 Conditional Image Generation Image Generation
Code Code Available 5OminiControl2: Efficient Conditioning for Diffusion Transformers Mar 11, 2025 Conditional Image Generation Denoising
Code Code Available 5Fractal Generative Models Feb 24, 2025 Image Generation
Code Code Available 5Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Jan 2, 2025 Image Generation
Code Code Available 5Learning Flow Fields in Attention for Controllable Person Image Generation Dec 11, 2024 Attribute Image Generation
Code Code Available 5