SOTAVerified

Image Generation

Image Generation (synthesis) is the task of generating new images from an existing dataset.

  • Unconditional generation refers to generating samples unconditionally from the dataset, i.e. $p(y)$
  • Conditional image generation (subtask) refers to generating samples conditionally from the dataset, based on a label, i.e. $p(y|x)$.

In this section, you can find state-of-the-art leaderboards for unconditional generation. For conditional generation, and other types of image generations, refer to the subtasks.

( Image credit: StyleGAN )

Papers

Showing 851900 of 6689 papers

TitleStatusHype
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image GenerationCode1
AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion ExtrapolationCode1
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative ModelsCode1
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative ModelsCode1
Fréchet Radiomic Distance (FRD): A Versatile Metric for Comparing Medical Imaging DatasetsCode1
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and EditingCode1
AMO Sampler: Enhancing Text Rendering with OvershootingCode1
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text GenerationCode1
cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image SynthesisCode1
ZoomLDM: Latent Diffusion Model for multi-scale image generationCode1
Image Generation Diversity Issues and How to Tame ThemCode1
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMsCode1
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation BenchmarkCode1
Detecting Human Artifacts from Text-to-Image ModelsCode1
Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided DiffusionCode1
Stylecodes: Encoding Stylistic Information For Image GenerationCode1
Continuous Speculative Decoding for Autoregressive Image GenerationCode1
SmoothCache: A Universal Inference Acceleration Technique for Diffusion TransformersCode1
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image SynthesisCode1
Image Understanding Makes for A Good Tokenizer for Image GenerationCode1
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image GenerationCode1
EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like SketchingCode1
Volumetric Conditioning Module to Control Pretrained Diffusion Models for 3D Medical ImagesCode1
AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion modelsCode1
Shallow Diffuse: Robust and Invisible Watermarking through Low-Dimensional Subspaces in Diffusion ModelsCode1
MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image GenerationCode1
Unified Cross-Modal Image Synthesis with Hierarchical Mixture of Product-of-ExpertsCode1
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded SamplingCode1
Hierarchical Clustering for Conditional Diffusion in Image GenerationCode1
Elucidating the design space of language models for image generationCode1
Personalized Image Generation with Large Multimodal ModelsCode1
Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-GuidanceCode1
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided DiffusionCode1
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value OptimizationCode1
ChatHouseDiffusion: Prompt-Guided Generation and Editing of Floor PlansCode1
Efficient Diffusion Models: A Comprehensive Survey from Principles to PracticesCode1
On the Effectiveness of Dataset Alignment for Fake Image DetectionCode1
First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text BlendingCode1
Customize Your Visual Autoregressive Recipe with Set Autoregressive ModelingCode1
Anatomical feature-prioritized loss for enhanced MR to CT translationCode1
TULIP: Token-length Upgraded CLIPCode1
Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT PromptingCode1
Minority-Focused Text-to-Image Generation via Prompt OptimizationCode1
TextLap: Customizing Language Models for Text-to-Layout PlanningCode1
Personalized Visual Instruction TuningCode1
Training-free Diffusion Model Alignment with Sampling DemonsCode1
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep ApproachCode1
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative DecodingCode1
Not All Diffusion Model Activations Have Been Evaluated as Discriminative FeaturesCode1
Unleashing the Potential of the Diffusion Model in Few-shot Semantic SegmentationCode1
Show:102550
← PrevPage 18 of 134Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Improved DDPMFID12.3Unverified
2ADMFID11.84Unverified
3BigGAN-deepFID8.1Unverified
4Polarity-BigGANFID6.82Unverified
5VQGAN+Transformer (k=mixed, p=1.0, a=0.005)FID6.59Unverified
6MaskGITFID6.18Unverified
7VQGAN+Transformer (k=600, p=1.0, a=0.05)FID5.2Unverified
8CDMFID4.88Unverified
9ADM-GFID4.59Unverified
10RINFID4.51Unverified
#ModelMetricClaimedVerifiedStatus
1PresGANFID52.2Unverified
2RESFLOWFID48.29Unverified
3Residual FlowFID46.37Unverified
4GLF+perceptual loss (ours)FID44.6Unverified
5ProdPoly no activation functionsFID40.45Unverified
6ProdPoly no activation functionsFID36.77Unverified
7ACGANFID35.47Unverified
8DenseFlow-74-10FID34.9Unverified
9NVAE w/ flowFID32.53Unverified
10QSNGANFID31.97Unverified
#ModelMetricClaimedVerifiedStatus
1GLIDE + CLSFID30.87Unverified
2GLIDE + CLIPFID30.46Unverified
3GLIDE + CLS-FREEFID29.22Unverified
4GLIDE + CLIP + CLS + CLS-FREEFID29.18Unverified
5PGMGANFID21.73Unverified
6CLR-GANFID20.27Unverified
7FMFID14.45Unverified
8CT (Direct Generation, NFE=1)FID13Unverified
9CT (Direct Generation, NFE=2)FID11.1Unverified
10GLIDE +CLSKID7.95Unverified