SOTAVerified

Image Generation

Image Generation (synthesis) is the task of generating new images from an existing dataset.

  • Unconditional generation refers to generating samples unconditionally from the dataset, i.e. $p(y)$
  • Conditional image generation (subtask) refers to generating samples conditionally from the dataset, based on a label, i.e. $p(y|x)$.

In this section, you can find state-of-the-art leaderboards for unconditional generation. For conditional generation, and other types of image generations, refer to the subtasks.

( Image credit: StyleGAN )

Papers

Showing 36513700 of 6689 papers

TitleStatusHype
StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning0
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation0
What If We Recaption Billions of Web Images with LLaMA-3?0
DiTFastAttn: Attention Compression for Diffusion Transformer Models0
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation0
WMAdapter: Adding WaterMark Control to Latent Diffusion Models0
Understanding and Mitigating Compositional Issues in Text-to-Image Generative ModelsCode0
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models0
Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance0
Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration0
Understanding Visual Concepts Across ModelsCode0
Progress Towards Decoding Visual Imagery via fNIRS0
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?0
Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models0
Instant 3D Human Avatar Generation using Image Diffusion Models0
The Effect of Training Dataset Size on Discriminative and Diffusion-Based Speech Enhancement Systems0
Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models0
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models0
OmniControlNet: Dual-stage Integration for Conditional Image Generation0
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language ModelsCode0
Rapid Review of Generative AI in Smart Medical Applications0
GANetic Loss for Generative Adversarial Networks with a Focus on Medical ApplicationsCode0
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation0
Optimal Eye Surgeon: Finding Image Priors through Sparse Generators at InitializationCode0
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance PredictionCode0
Coherent Zero-Shot Visual Instruction Generation0
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model0
Diffusion-based image inpainting with internal learningCode0
ReDistill: Residual Encoded Distillation for Peak Memory Reduction0
DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks0
Understanding the Limitations of Diffusion Concept Algebra Through Food0
Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter0
Enhancing Traffic Sign Recognition with Tailored Data Augmentation: Addressing Class Imbalance and Instance Scarcity0
Tackling Copyright Issues in AI Image Generation Through Originality Estimation and GenericizationCode0
Analyzing the Feature Extractor Networks for Face Image SynthesisCode0
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise0
Enhance Image-to-Image Generation with LLaVA-generated Prompts0
Plug-and-Play Diffusion Distillation0
I4VGen: Image as Free Stepping Stone for Text-to-Video Generation0
It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators0
Differentially Private Fine-Tuning of Diffusion Models0
ATTIQA: Generalizable Image Quality Feature Extractor using Attribute-aware Pretraining0
Δ-DiT: A Training-Free Acceleration Method Tailored for Diffusion TransformersCode0
Layout Agnostic Scene Text Image Synthesis with Diffusion Models0
Anomaly Anything: Promptable Unseen Visual Anomaly Generation0
Text-guided Controllable Mesh Refinement for Interactive 3D Modeling0
fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddingsCode0
pOps: Photo-Inspired Diffusion Operators0
Dimba: Transformer-Mamba Diffusion Models0
ParallelEdits: Efficient Multi-object Image Editing0
Show:102550
← PrevPage 74 of 134Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Improved DDPMFID12.3Unverified
2ADMFID11.84Unverified
3BigGAN-deepFID8.1Unverified
4Polarity-BigGANFID6.82Unverified
5VQGAN+Transformer (k=mixed, p=1.0, a=0.005)FID6.59Unverified
6MaskGITFID6.18Unverified
7VQGAN+Transformer (k=600, p=1.0, a=0.05)FID5.2Unverified
8CDMFID4.88Unverified
9ADM-GFID4.59Unverified
10RINFID4.51Unverified
#ModelMetricClaimedVerifiedStatus
1PresGANFID52.2Unverified
2RESFLOWFID48.29Unverified
3Residual FlowFID46.37Unverified
4GLF+perceptual loss (ours)FID44.6Unverified
5ProdPoly no activation functionsFID40.45Unverified
6ProdPoly no activation functionsFID36.77Unverified
7ACGANFID35.47Unverified
8DenseFlow-74-10FID34.9Unverified
9NVAE w/ flowFID32.53Unverified
10QSNGANFID31.97Unverified
#ModelMetricClaimedVerifiedStatus
1GLIDE + CLSFID30.87Unverified
2GLIDE + CLIPFID30.46Unverified
3GLIDE + CLS-FREEFID29.22Unverified
4GLIDE + CLIP + CLS + CLS-FREEFID29.18Unverified
5PGMGANFID21.73Unverified
6CLR-GANFID20.27Unverified
7FMFID14.45Unverified
8CT (Direct Generation, NFE=1)FID13Unverified
9CT (Direct Generation, NFE=2)FID11.1Unverified
10GLIDE +CLSKID7.95Unverified