Video Generation

( Various Video Generation Tasks. Gif credit: MaGViT )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1466 papers

Title	Date	Tasks	Status	Hype	Score
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k	Mar 12, 2025	Video Generation	CodeCode Available	14	5
Open-Sora: Democratizing Efficient Video Production for All	Dec 29, 2024	AllImage Generation	CodeCode Available	13	5
Open-Sora Plan: Open-Source Large Video Generation Model	Nov 28, 2024	Video Generation	CodeCode Available	11	5
HunyuanVideo: A Systematic Framework For Large Video Generative Models	Dec 3, 2024	Video AlignmentVideo Generation	CodeCode Available	11	5
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer	Aug 12, 2024	Text-to-Video GenerationVideo Alignment	CodeCode Available	11	5
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control	Jul 3, 2024	Computational EfficiencyFace Reenactment	CodeCode Available	11	5
Wan: Open and Advanced Large-Scale Video Generative Models	Mar 26, 2025	Video EditingVideo Generation	CodeCode Available	11	5
LTX-Video: Realtime Video Latent Diffusion	Dec 30, 2024	DenoisingGPU	CodeCode Available	9	5
SkyReels-V2: Infinite-length Film Generative Model	Apr 17, 2025	Large Language Modelmodel	CodeCode Available	9	5
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models	Jan 17, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	9	5
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling	Oct 14, 2024	Audio-Visual SynchronizationGPU	CodeCode Available	9	5
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation	May 2, 2024	motion predictionStory Generation	CodeCode Available	9	5
MAGI-1: Autoregressive Video Generation at Scale	May 19, 2025	Video Generation	CodeCode Available	7	5
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance	Mar 21, 2024	Animated GIF GenerationImage Animation	CodeCode Available	7	5
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	Oct 10, 2024	4kImage Animation	CodeCode Available	7	5
Aligning Anime Video Generation with Human Feedback	Apr 14, 2025	Video Generation	CodeCode Available	7	5
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Feb 14, 2025	Video GenerationVideo Reconstruction	CodeCode Available	7	5
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	Nov 17, 2024	Image GenerationQuantization	CodeCode Available	7	5
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Oct 3, 2024	Image GenerationQuantization	CodeCode Available	7	5
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation	Nov 15, 2024	Audio-Driven Body AnimationHuman Animation	CodeCode Available	7	5
SageAttention2++: A More Efficient Implementation of SageAttention2	May 27, 2025	QuantizationVideo Generation	CodeCode Available	7	5
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers	Mar 15, 2024	Text GenerationVideo Generation	CodeCode Available	7	5
Real-Time Video Generation with Pyramid Attention Broadcast	Aug 22, 2024	Video Generation	CodeCode Available	7	5
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	May 28, 2025	Human AnimationInstruction Following	CodeCode Available	7	5
Pyramidal Flow Matching for Efficient Video Generative Modeling	Oct 8, 2024	GPUText-to-Video Generation	CodeCode Available	7	5
Goku: Flow Based Video Generative Foundation Models	Feb 7, 2025	Image GenerationText to Image Generation	CodeCode Available	7	5
AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era	Dec 13, 2024	Image to Video GenerationVideo Generation	CodeCode Available	7	5
Fast Video Generation with Sliding Tile Attention	Feb 6, 2025	Video Generation	CodeCode Available	7	5
DragAnything: Motion Control for Anything using Entity Representation	Mar 12, 2024	ObjectVideo Generation	CodeCode Available	7	5
VACE: All-in-One Video Creation and Editing	Mar 10, 2025	AllHuman-Domain Subject-to-Video	CodeCode Available	7	5
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability	May 27, 2024	Autonomous DrivingVideo Generation	CodeCode Available	7	5
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile	Feb 10, 2025	Video Generation	CodeCode Available	7	5
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture	May 29, 2024	Image GenerationVideo Generation	CodeCode Available	7	5
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models	Nov 28, 2023	Video Generation	CodeCode Available	6	5
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers	May 29, 2022	Text-to-Video GenerationVideo Generation	CodeCode Available	6	5
DanceGRPO: Unleashing GRPO on Visual Generation	May 12, 2025	Denoisingreinforcement-learning	CodeCode Available	5	5
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators	Apr 7, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5	5
Matrix-Game: Interactive World Foundation Model	Jun 23, 2025	Minecraftmodel	CodeCode Available	5	5
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation	Jun 26, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5	5
StableAnimator: High-Quality Identity-Preserving Human Image Animation	Nov 26, 2024	DenoisingFace Reenactment	CodeCode Available	5	5
Show-o2: Improved Native Unified Multimodal Models	Jun 18, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
Latte: Latent Diffusion Transformer for Video Generation	Jan 5, 2024	Text-to-Video GenerationVideo Generation	CodeCode Available	5	5
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Jun 6, 2024	Video CaptioningVideo Generation	CodeCode Available	5	5
ControlNeXt: Powerful and Efficient Control for Image and Video Generation	Aug 12, 2024	Video Generation	CodeCode Available	5	5
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	May 7, 2025	Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video	CodeCode Available	5	5
Consistency Models	Mar 2, 2023	ColorizationImage Generation	CodeCode Available	5	5
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control	Mar 5, 2025	Novel View SynthesisVideo Generation	CodeCode Available	5	5
OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Jun 2, 2025	Data AugmentationHuman Animation	CodeCode Available	5	5
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos	Sep 3, 2024	Depth EstimationDiversity	CodeCode Available	5	5
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework	Mar 20, 2024	Image to Video GenerationText-to-Video Generation	CodeCode Available	5	5

Show:10 25 50

← PrevPage 1 of 30Next →

All datasets UCF-101 BAIR Robot Pushing Sky Time-lapse UCF-101 16 frames, 64x64, Unconditional UCF-101 16 frames, Unconditional, Single GPU LAION-400M Taichi UCF-101 16 frames, 128x128, Unconditional Kinetics-600 12 frames, 64x64 How2Sign Kinetics-600 12 frames, 128x128 Kinetics-600 48 frames, 64x64

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MCVD	FVD16	2,460	—	Unverified
2	VDM	FVD16	1,396	—	Unverified
3	TGAN-v2 (128x128)	FVD16	1,209	—	Unverified
4	MCVD (64x64)	FVD16	1,143	—	Unverified
5	MoCoGAN-HD (256x256, unconditional)	FVD16	700	—	Unverified
6	MagicVideo (256x256, text-conditional)	FVD16	699	—	Unverified
7	TATS (256x256)	FVD16	635	—	Unverified
8	FIFO-Diffusion	FVD128	596.64	—	Unverified
9	DIGAN (128x128, unconditional)	FVD16	577	—	Unverified
10	LVDM (256x256, unconditional)	FVD16	552	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MoCoGAN	FVD score	503	—	Unverified
2	Baseline (from LVT)	FVD score	320.9	—	Unverified
3	SVG-FP (from FVD)	FVD score	315.5	—	Unverified
4	CDNA (from FVD)	FVD score	296.5	—	Unverified
5	SV2P (from FVD)	FVD score	262.5	—	Unverified
6	SVG-LP (from vRNN)	FVD score	256.62	—	Unverified
7	WAM	FVD score	159.6	—	Unverified
8	VRNN 1L	FVD score	149.22	—	Unverified
9	SAVP (from vRNN)	FVD score	143.43	—	Unverified
10	Hier-VRNN	FVD score	143.4	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MoCoGAN-HD (128x128)	FVD 16	183.6	—	Unverified
2	TATS (128x128)	FVD 16	132.6	—	Unverified
3	Long-video GAN (256x256)	FVD 16	116.5	—	Unverified
4	DIGAN (128x128)	FVD 16	114.6	—	Unverified
5	Long-video GAN (128x128)	FVD 16	107.5	—	Unverified
6	LVDM (256x256)	FVD 16	95.2	—	Unverified
7	DDMI	FVD 16	66.25	—	Unverified
8	Latte + LeanVAE	FVD 16	49.59	—	Unverified
9	StyleSV (256x256)	FVD 16	49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Video Diffusion Model	Inception Score	57	—	Unverified
2	TGAN-ODE	Inception Score	15.2	—	Unverified
3	TGAN-F	Inception Score	13.62	—	Unverified
4	MoCoGAN	Inception Score	12.42	—	Unverified
5	MoCoGAN-MDP	Inception Score	11.86	—	Unverified
6	TGAN-SVC	Inception Score	11.85	—	Unverified
7	VGAN	Inception Score	8.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TGAN-F	Inception Score	22.91	—	Unverified
2	TGANv2	Inception Score	21.45	—	Unverified
3	TGANv2-ODE	Inception Score	21.02	—	Unverified
4	MoCoGAN	Inception Score	12.42	—	Unverified
5	MoCoGAN-MDP	Inception Score	11.86	—	Unverified
6	TGAN-SVC	Inception Score	11.85	—	Unverified
7	VGAN	Inception Score	8.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Imagen original (constant=6)	CLIP R-Precision	92.12	—	Unverified
2	Imagen fully distilled (oscillate (15,1))	CLIP R-Precision	90.97	—	Unverified
3	Imagen distilled (constant=6)	CLIP R-Precision	90.88	—	Unverified
4	Imagen original (oscillate(15,1))	CLIP R-Precision	89.91	—	Unverified
5	Imagen fully distilled (constant=6)	CLIP R-Precision	89.68	—	Unverified
6	Imagen distilled (oscillate (15,1))	CLIP R-Precision	88.78	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DIGAN (256x256)	FVD16	156.7	—	Unverified
2	MoCoGAN-HD (128x128)	FVD16	144.7	—	Unverified
3	DIGAN (128x128)	FVD16	128.1	—	Unverified
4	LVDM (256x256)	FVD16	99	—	Unverified
5	TATS (128x128)	FVD16	94.6	—	Unverified
6	StyleSV (256x256)	FVD16	82.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TGANv2 (2020)	Inception Score	28.87	—	Unverified
2	DVD-GAN	Inception Score	27.38	—	Unverified
3	VideoGPT	Inception Score	24.69	—	Unverified
4	TGANv2	Inception Score	24.34	—	Unverified
5	TGAN-F	Inception Score	22.91	—	Unverified
6	TGANv2-ODE	Inception Score	21.02	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVD-GAN	FVD	31.1	—	Unverified
2	MAGVIT	FVD	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	INR-V	FVD16	144	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVD-GAN	FID	2.16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVD-GAN	FID	12.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DiT-XL/2 + CVAE-FT-SE	FID	8.59	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VideoAssembler (Zero-Shot, 256x256, class-conditional)	FVD16	252	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PG-SWGAN-3D	FID	404.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	StyleSV	FVD16	207.2	—	Unverified