| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 |
| Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Dec 3, 2024 | ObjectOffline RL | —Unverified | 0 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 |
| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 |
| Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Oct 10, 2024 | Video AlignmentVideo Generation | —Unverified | 0 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment | Sep 22, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |