SOTAVerified

Image Generation

Image Generation (synthesis) is the task of generating new images from an existing dataset.

  • Unconditional generation refers to generating samples unconditionally from the dataset, i.e. $p(y)$
  • Conditional image generation (subtask) refers to generating samples conditionally from the dataset, based on a label, i.e. $p(y|x)$.

In this section, you can find state-of-the-art leaderboards for unconditional generation. For conditional generation, and other types of image generations, refer to the subtasks.

( Image credit: StyleGAN )

Papers

Showing 31013150 of 6689 papers

TitleStatusHype
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback0
VBench: Comprehensive Benchmark Suite for Video Generative ModelsCode3
When StyleGAN Meets Stable Diffusion: a W_+ Adapter for Personalized Image GenerationCode1
Non-Visible Light Data Synthesis and Application: A Case Study for Synthetic Aperture Radar Imagery0
SODA: Bottleneck Diffusion Models for Representation LearningCode1
HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models0
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual ExplanationsCode1
Fair Text-to-Image Diffusion via Fair Mapping0
Image Inpainting via Tractable Steering of Diffusion ModelsCode0
Unlocking Spatial Comprehension in Text-to-Image Diffusion Models0
A High-Quality Robust Diffusion Framework for Corrupted DatasetCode0
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image GenerationCode1
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image GenerationCode1
SEED-Bench-2: Benchmarking Multimodal Large Language ModelsCode2
Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image SynthesisCode1
Large Language Models Suffer From Their Own Output: An Analysis of the Self-Consuming Training Loop0
Federated Learning with Diffusion Models for Privacy-Sensitive Vision TasksCode1
COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design0
Text-Driven Image Editing via Learnable RegionsCode2
Adversarial Diffusion DistillationCode6
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering0
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices0
Denoising Diffusion Probabilistic Models for Image Inpainting of Cell Distributions in the Human Brain0
Manifold Preserving Guided Diffusion0
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion ModelsCode1
Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations0
LLMGA: Multimodal Large Language Model based Generation AssistantCode2
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language ModelsCode1
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation0
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion ModelsCode1
VehicleGAN: Pair-flexible Pose Guided Image Synthesis for Vehicle Re-identification0
Reinforcement Learning from Diffusion Feedback: Q* for Image Search0
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation0
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion0
ViT-Lens: Towards Omni-modal RepresentationsCode1
PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated ImagesCode1
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation0
Tell2Design: A Dataset for Language-Guided Floor Plan GenerationCode1
Self-correcting LLM-controlled Diffusion ModelsCode1
Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person ImagesCode1
ET3D: Efficient Text-to-3D Generation via Multi-View Distillation0
BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray ImagesCode1
Flow-Guided Diffusion for Video InpaintingCode2
Automatic Synthetic Data and Fine-grained Adaptive Feature Alignment for Composed Person RetrievalCode1
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large DatasetsCode0
Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual NoiseCode1
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style AdviserCode1
Synthetic Shifts to Initial Seed Vector Exposes the Brittle Nature of Latent-Based Diffusion Models0
DemoFusion: Democratising High-Resolution Image Generation With No $Code4
AdaDiff: Adaptive Step Selection for Fast Diffusion Models0
Show:102550
← PrevPage 63 of 134Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Improved DDPMFID12.3Unverified
2ADMFID11.84Unverified
3BigGAN-deepFID8.1Unverified
4Polarity-BigGANFID6.82Unverified
5VQGAN+Transformer (k=mixed, p=1.0, a=0.005)FID6.59Unverified
6MaskGITFID6.18Unverified
7VQGAN+Transformer (k=600, p=1.0, a=0.05)FID5.2Unverified
8CDMFID4.88Unverified
9ADM-GFID4.59Unverified
10RINFID4.51Unverified
#ModelMetricClaimedVerifiedStatus
1PresGANFID52.2Unverified
2RESFLOWFID48.29Unverified
3Residual FlowFID46.37Unverified
4GLF+perceptual loss (ours)FID44.6Unverified
5ProdPoly no activation functionsFID40.45Unverified
6ProdPoly no activation functionsFID36.77Unverified
7ACGANFID35.47Unverified
8DenseFlow-74-10FID34.9Unverified
9NVAE w/ flowFID32.53Unverified
10QSNGANFID31.97Unverified
#ModelMetricClaimedVerifiedStatus
1GLIDE + CLSFID30.87Unverified
2GLIDE + CLIPFID30.46Unverified
3GLIDE + CLS-FREEFID29.22Unverified
4GLIDE + CLIP + CLS + CLS-FREEFID29.18Unverified
5PGMGANFID21.73Unverified
6CLR-GANFID20.27Unverified
7FMFID14.45Unverified
8CT (Direct Generation, NFE=1)FID13Unverified
9CT (Direct Generation, NFE=2)FID11.1Unverified
10GLIDE +CLSKID7.95Unverified