SOTAVerified

Text-to-Video Generation

Ma grand-mère m’a raconté que quand elle était étudiante, elle avait un petit-ami. À l’âge de 18 ans, il a dû partir pour le service militaire, elle ne l’a pas attendu et elle a épousé quelqu’un d’autre. Quand ma grand-mère avait 58-59 ans, un homme (son premier amour) lui a envoyé une demande d’amis sur un réseau social, ils ont commencé à parler... En moins de six mois, ils ont décidé de se voir. Le trajet en train a duré deux jours et ils se sont finalement rencontrés. Cela fait maintenant deux ans qu’ils habitent ensemble et qu’ils nous rendent visite de temps en temps. Je réalise maintenant que leur amour l’un envers l’autre n’a jamais cessé.

Papers

Showing 51100 of 201 papers

TitleStatusHype
VSTAR: Generative Temporal Nursing for Longer Dynamic Video SynthesisCode1
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation PipelineCode1
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video SynthesisCode1
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM AnimatorCode1
Swap Attention in Spatiotemporal Diffusions for Text-to-Video GenerationCode1
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video GenerationCode1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMCode1
ConditionVideo: Training-Free Condition-Guided Text-to-Video GenerationCode1
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video GenerationCode1
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video GenerationCode1
Generative Disco: Text-to-Video Generation for Music VisualizationCode1
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video GenerationCode1
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion ModelsCode1
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video GenerationCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video ModelsCode1
Reuse and Diffuse: Iterative Denoising for Text-to-Video GenerationCode1
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and ConditioningCode1
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMsCode1
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference DatasetCode1
PEEKABOO: Interactive Video Generation via Masked-DiffusionCode1
OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion ModelsCode1
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtionCode1
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENerationCode1
MotionCrafter: One-Shot Motion Customization of Diffusion ModelsCode1
MMTrail: A Multimodal Trailer Video Dataset with Language and Music DescriptionsCode1
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality EvaluationCode1
GODIVA: Generating Open-DomaIn Videos from nAtural DescriptionsCode1
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion ModelsCode1
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model AdaptationCode1
Make-A-Video: Text-to-Video Generation without Text-Video DataCode1
A Recipe for Scaling up Text-to-Video Generation with Text-free VideosCode0
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video GenerationCode0
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video PromptingCode0
Magic 1-For-1: Generating One Minute Video Clips within One MinuteCode0
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive ArchitecturesCode0
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and GenerationCode0
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPOCode0
Enabling Versatile Controls for Video Diffusion ModelsCode0
Protecting Your Video Content: Disrupting Automated Video-based LLM AnnotationsCode0
Hierarchical Spatio-temporal Decoupling for Text-to-Video GenerationCode0
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal VerificationCode0
RecTable: Fast Modeling Tabular Data with Rectified FlowCode0
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete DiffusionCode0
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation TaskCode0
REGIS: Refining Generated Videos via Iterative Stylistic RedesigningCode0
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration0
Gender Bias in Text-to-Video Generation Models: A case study of Sora0
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control0
Show:102550
← PrevPage 2 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MagicVideoFVD998Unverified
2VideoComposerFVD580Unverified
3ModelScopeT2VFVD550Unverified
4Show-1FVD538Unverified
5TF-T2VFVD441Unverified
6HiGenFVD406Unverified
7PixelDanceFVD381Unverified
8VideoPoetFVD213Unverified
9Video-LaVITFVD188.36Unverified
10Snap Video (288×288)FVD110.4Unverified
#ModelMetricClaimedVerifiedStatus
1MagicVideo (Zero-shot, 256x256)FVD16699Unverified
2Video LDM (Zero-shot, 320x512)FVD16550.61Unverified
3LAVIE (Zero-shot, 320x512)FVD16526.3Unverified
4PYoCo (Zero-shot, 64x64)FVD16355.19Unverified
5VideoPoetFVD16355Unverified
6Lumiere (Zero-shot, 1024x1024)FVD16332.49Unverified
7Snap Video (Zero-shot, 288×288)FVD16260.1Unverified
8W.A.L.T 3BFVD16258.1Unverified
9PixelDance (Zero-shot, 256x256)FVD16242.82Unverified
10Snap Video (Zero-shot, 512x288)FVD16200.2Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCrafter2Visual Quality54.82Unverified
2Show-1Visual Quality53.74Unverified
3VideoCrafter1Visual Quality53.08Unverified
4LavieVisual Quality52.83Unverified
5ModelScopeVisual Quality52.47Unverified
#ModelMetricClaimedVerifiedStatus
1MAGVITFVD79.1Unverified
2MAGVITFVD28.5Unverified
#ModelMetricClaimedVerifiedStatus
1NUWA (128×128)Accuracy77.9Unverified
#ModelMetricClaimedVerifiedStatus
1VideoFactoryFVD292.35Unverified