SOTAVerified

Image Generation

Image Generation (synthesis) is the task of generating new images from an existing dataset.

  • Unconditional generation refers to generating samples unconditionally from the dataset, i.e. $p(y)$
  • Conditional image generation (subtask) refers to generating samples conditionally from the dataset, based on a label, i.e. $p(y|x)$.

In this section, you can find state-of-the-art leaderboards for unconditional generation. For conditional generation, and other types of image generations, refer to the subtasks.

( Image credit: StyleGAN )

Papers

Showing 30513100 of 6689 papers

TitleStatusHype
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNetCode1
Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image GenerationCode0
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler0
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion ModelsCode1
Learning Energy-based Model via Dual-MCMC Teaching0
GPT4Point: A Unified Framework for Point-Language Understanding and GenerationCode2
GeNIe: Generative Hard Negative Images Through DiffusionCode1
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures0
Generator Born from Classifier0
Analyzing and Improving the Training Dynamics of Diffusion ModelsCode2
FaceStudio: Put Your Face Everywhere in Seconds0
InstructBooth: Instruction-following Personalized Text-to-Image Generation0
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images0
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity MetricsCode0
UniGS: Unified Representation for Image Generation and SegmentationCode3
DiffiT: Diffusion Vision Transformers for Image GenerationCode2
Style Aligned Image Generation via Shared AttentionCode3
GIVT: Generative Infinite-Vocabulary TransformersCode1
A multi-channel cycleGAN for CBCT to CT synthesis0
Fully Spiking Denoising Diffusion Implicit ModelsCode1
ResEnsemble-DDPM: Residual Denoising Diffusion Probabilistic Models for Ensemble Learning0
Meta ControlNet: Enhancing Task Adaptation via Meta LearningCode1
CT Reconstruction using Diffusion Posterior Sampling conditioned on a Nonlinear Measurement Model0
Stable Messenger: Steganography for Message-Concealed Image Generation0
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models0
Generating Images of the M87* Black Hole Using GANsCode0
Token Fusion: Bridging the Gap between Token Pruning and Token Merging0
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models0
Ultra-Resolution Cascaded Diffusion Model for Gigapixel Image Synthesis in Histopathology0
Pipeline Enabling Zero-shot Classification for Bangla Handwritten Grapheme0
Enhancing Diffusion Models with 3D Perspective Geometry Constraints0
DeepCache: Accelerating Diffusion Models for FreeCode2
Generative models for visualising abstract social processes: Guiding streetview image synthesis of StyleGAN2 with indices of deprivation0
DFU: scale-robust diffusion model for zero-shot super-resolution image generationCode0
Rethinking FID: Towards a Better Evaluation Metric for Image GenerationCode1
S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion0
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models0
Fast ODE-based Sampling for Diffusion Models in Around 5 StepsCode2
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation0
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion ModelCode1
Diffusion Models Without Attention0
Detailed Human-Centric Text Description-Driven Large Scene Synthesis0
HiPA: Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation0
ZeST-NeRF: Using temporal aggregation for Zero-Shot Temporal NeRFs0
Layered Rendering Diffusion Model for Controllable Zero-Shot Image SynthesisCode0
Anatomy and Physiology of Artificial Intelligence in PET Imaging0
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation0
Few-shot Image Generation via Style Adaptation and Content Preservation0
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content SeparationCode1
M^2Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image GenerationCode1
Show:102550
← PrevPage 62 of 134Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Improved DDPMFID12.3Unverified
2ADMFID11.84Unverified
3BigGAN-deepFID8.1Unverified
4Polarity-BigGANFID6.82Unverified
5VQGAN+Transformer (k=mixed, p=1.0, a=0.005)FID6.59Unverified
6MaskGITFID6.18Unverified
7VQGAN+Transformer (k=600, p=1.0, a=0.05)FID5.2Unverified
8CDMFID4.88Unverified
9ADM-GFID4.59Unverified
10RINFID4.51Unverified
#ModelMetricClaimedVerifiedStatus
1PresGANFID52.2Unverified
2RESFLOWFID48.29Unverified
3Residual FlowFID46.37Unverified
4GLF+perceptual loss (ours)FID44.6Unverified
5ProdPoly no activation functionsFID40.45Unverified
6ProdPoly no activation functionsFID36.77Unverified
7ACGANFID35.47Unverified
8DenseFlow-74-10FID34.9Unverified
9NVAE w/ flowFID32.53Unverified
10QSNGANFID31.97Unverified
#ModelMetricClaimedVerifiedStatus
1GLIDE + CLSFID30.87Unverified
2GLIDE + CLIPFID30.46Unverified
3GLIDE + CLS-FREEFID29.22Unverified
4GLIDE + CLIP + CLS + CLS-FREEFID29.18Unverified
5PGMGANFID21.73Unverified
6CLR-GANFID20.27Unverified
7FMFID14.45Unverified
8CT (Direct Generation, NFE=1)FID13Unverified
9CT (Direct Generation, NFE=2)FID11.1Unverified
10GLIDE +CLSKID7.95Unverified