SOTAVerified

Text-to-Video Generation

Ma grand-mère m’a raconté que quand elle était étudiante, elle avait un petit-ami. À l’âge de 18 ans, il a dû partir pour le service militaire, elle ne l’a pas attendu et elle a épousé quelqu’un d’autre. Quand ma grand-mère avait 58-59 ans, un homme (son premier amour) lui a envoyé une demande d’amis sur un réseau social, ils ont commencé à parler... En moins de six mois, ils ont décidé de se voir. Le trajet en train a duré deux jours et ils se sont finalement rencontrés. Cela fait maintenant deux ans qu’ils habitent ensemble et qu’ils nous rendent visite de temps en temps. Je réalise maintenant que leur amour l’un envers l’autre n’a jamais cessé.

Papers

Showing 151200 of 201 papers

TitleStatusHype
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation PipelineCode1
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning0
Make Pixels Dance: High-Dynamic Video Generation0
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning0
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video GenerationCode1
REGIS: Refining Generated Videos via Iterative Stylistic RedesigningCode0
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning0
POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation0
VideoCrafter1: Open Diffusion Models for High-Quality Video GenerationCode5
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
ConditionVideo: Training-Free Condition-Guided Text-to-Video GenerationCode1
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model AdaptationCode1
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion ModelsCode1
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM AnimatorCode1
Reuse and Diffuse: Iterative Denoising for Text-to-Video GenerationCode1
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation0
Dual-Stream Diffusion Net for Text-to-Video Generation0
ModelScope Text-to-Video Technical ReportCode3
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationCode0
VideoComposer: Compositional Video Synthesis with Motion ControllabilityCode2
Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback LearningCode2
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video GenerationCode1
ControlVideo: Training-free Controllable Text-to-Video GenerationCode2
Swap Attention in Spatiotemporal Diffusions for Text-to-Video GenerationCode1
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models0
Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation0
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video ModelsCode1
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion ModelsCode1
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation0
Generative Disco: Text-to-Video Generation for Music VisualizationCode1
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free VideosCode3
CelebV-Text: A Large-Scale Facial Text-Video DatasetCode2
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video GeneratorsCode4
VideoFusion: Decomposed Diffusion Models for High-Quality Video GenerationCode4
Structure and Content-Guided Video Synthesis with Diffusion Models0
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsCode2
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationCode4
MAGVIT: Masked Generative Video TransformerCode2
Latent Video Diffusion Models for High-Fidelity Long Video GenerationCode2
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video GenerationCode1
MagicVideo: Efficient Video Generation With Latent Diffusion Models0
Make-A-Video: Text-to-Video Generation without Text-Video DataCode1
FlexLip: A Controllable Text-to-Lip System0
CogVideo: Large-scale Pretraining for Text-to-Video Generation via TransformersCode6
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENerationCode1
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionCode1
Video Generation from Text Employing Latent Path Construction for Temporal Modeling0
GODIVA: Generating Open-DomaIn Videos from nAtural DescriptionsCode1
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MagicVideoFVD998Unverified
2VideoComposerFVD580Unverified
3ModelScopeT2VFVD550Unverified
4Show-1FVD538Unverified
5TF-T2VFVD441Unverified
6HiGenFVD406Unverified
7PixelDanceFVD381Unverified
8VideoPoetFVD213Unverified
9Video-LaVITFVD188.36Unverified
10Snap Video (288×288)FVD110.4Unverified
#ModelMetricClaimedVerifiedStatus
1MagicVideo (Zero-shot, 256x256)FVD16699Unverified
2Video LDM (Zero-shot, 320x512)FVD16550.61Unverified
3LAVIE (Zero-shot, 320x512)FVD16526.3Unverified
4PYoCo (Zero-shot, 64x64)FVD16355.19Unverified
5VideoPoetFVD16355Unverified
6Lumiere (Zero-shot, 1024x1024)FVD16332.49Unverified
7Snap Video (Zero-shot, 288×288)FVD16260.1Unverified
8W.A.L.T 3BFVD16258.1Unverified
9PixelDance (Zero-shot, 256x256)FVD16242.82Unverified
10Snap Video (Zero-shot, 512x288)FVD16200.2Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCrafter2Visual Quality54.82Unverified
2Show-1Visual Quality53.74Unverified
3VideoCrafter1Visual Quality53.08Unverified
4LavieVisual Quality52.83Unverified
5ModelScopeVisual Quality52.47Unverified
#ModelMetricClaimedVerifiedStatus
1MAGVITFVD79.1Unverified
2MAGVITFVD28.5Unverified
#ModelMetricClaimedVerifiedStatus
1NUWA (128×128)Accuracy77.9Unverified
#ModelMetricClaimedVerifiedStatus
1VideoFactoryFVD292.35Unverified