SOTAVerified

Text-to-Video Generation

Ma grand-mère m’a raconté que quand elle était étudiante, elle avait un petit-ami. À l’âge de 18 ans, il a dû partir pour le service militaire, elle ne l’a pas attendu et elle a épousé quelqu’un d’autre. Quand ma grand-mère avait 58-59 ans, un homme (son premier amour) lui a envoyé une demande d’amis sur un réseau social, ils ont commencé à parler... En moins de six mois, ils ont décidé de se voir. Le trajet en train a duré deux jours et ils se sont finalement rencontrés. Cela fait maintenant deux ans qu’ils habitent ensemble et qu’ils nous rendent visite de temps en temps. Je réalise maintenant que leur amour l’un envers l’autre n’a jamais cessé.

Papers

Showing 150 of 201 papers

TitleStatusHype
Open-Sora: Democratizing Efficient Video Production for AllCode13
CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerCode11
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion ModelsCode9
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
CogVideo: Large-scale Pretraining for Text-to-Video Generation via TransformersCode6
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video GenerationCode5
MagicTime: Time-lapse Video Generation Models as Metamorphic SimulatorsCode5
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from TextCode5
Mora: Enabling Generalist Video Generation via A Multi-Agent FrameworkCode5
Latte: Latent Diffusion Transformer for Video GenerationCode5
VideoCrafter1: Open Diffusion Models for High-Quality Video GenerationCode5
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion ModelsCode4
TransPixeler: Advancing Text-to-Video Generation with TransparencyCode4
Identity-Preserving Text-to-Video Generation by Frequency DecompositionCode4
MotionClone: Training-Free Motion Cloning for Controllable Video GenerationCode4
CameraCtrl: Enabling Camera Control for Text-to-Video GenerationCode4
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video GeneratorsCode4
VideoFusion: Decomposed Diffusion Models for High-Quality Video GenerationCode4
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationCode4
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video GenerationCode3
GameGen-X: Interactive Open-world Game Video GenerationCode3
Evaluation of Text-to-Video Generation Models: A Dynamics PerspectiveCode3
VideoTetris: Towards Compositional Text-to-Video GenerationCode3
FIFO-Diffusion: Generating Infinite Videos from Text without TrainingCode3
From Sora What We Can See: A Survey of Text-to-Video GenerationCode3
Lumiere: A Space-Time Diffusion Model for Video GenerationCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
ModelScope Text-to-Video Technical ReportCode3
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free VideosCode3
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile DevicesCode2
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile DevicesCode2
Magic Mirror: ID-Preserved Video Generation in Video Diffusion TransformersCode2
Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph ProgrammingCode2
Divot: Diffusion Powers Video Tokenizer for Comprehension and GenerationCode2
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video GenerationCode2
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video GenerationCode2
GenAI Arena: An Open Evaluation Platform for Generative ModelsCode2
Video Diffusion Models: A SurveyCode2
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsCode2
FreeInit: Bridging Initialization Gap in Video Diffusion ModelsCode2
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style AdapterCode2
VideoComposer: Compositional Video Synthesis with Motion ControllabilityCode2
Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback LearningCode2
ControlVideo: Training-free Controllable Text-to-Video GenerationCode2
CelebV-Text: A Large-Scale Facial Text-Video DatasetCode2
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsCode2
MAGVIT: Masked Generative Video TransformerCode2
Latent Video Diffusion Models for High-Fidelity Long Video GenerationCode2
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MagicVideoFVD998Unverified
2VideoComposerFVD580Unverified
3ModelScopeT2VFVD550Unverified
4Show-1FVD538Unverified
5TF-T2VFVD441Unverified
6HiGenFVD406Unverified
7PixelDanceFVD381Unverified
8VideoPoetFVD213Unverified
9Video-LaVITFVD188.36Unverified
10Snap Video (288×288)FVD110.4Unverified
#ModelMetricClaimedVerifiedStatus
1MagicVideo (Zero-shot, 256x256)FVD16699Unverified
2Video LDM (Zero-shot, 320x512)FVD16550.61Unverified
3LAVIE (Zero-shot, 320x512)FVD16526.3Unverified
4PYoCo (Zero-shot, 64x64)FVD16355.19Unverified
5VideoPoetFVD16355Unverified
6Lumiere (Zero-shot, 1024x1024)FVD16332.49Unverified
7Snap Video (Zero-shot, 288×288)FVD16260.1Unverified
8W.A.L.T 3BFVD16258.1Unverified
9PixelDance (Zero-shot, 256x256)FVD16242.82Unverified
10Snap Video (Zero-shot, 512x288)FVD16200.2Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCrafter2Visual Quality54.82Unverified
2Show-1Visual Quality53.74Unverified
3VideoCrafter1Visual Quality53.08Unverified
4LavieVisual Quality52.83Unverified
5ModelScopeVisual Quality52.47Unverified
#ModelMetricClaimedVerifiedStatus
1MAGVITFVD79.1Unverified
2MAGVITFVD28.5Unverified
#ModelMetricClaimedVerifiedStatus
1NUWA (128×128)Accuracy77.9Unverified
#ModelMetricClaimedVerifiedStatus
1VideoFactoryFVD292.35Unverified