| Large Motion Video Autoencoding with Cross-modal Video VAE | Dec 23, 2024 | Video Generation | —Unverified | 0 |
| SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Dec 22, 2024 | Data AugmentationFault Diagnosis | —Unverified | 0 |
| Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation | Dec 22, 2024 | Video Frame InterpolationVideo Generation | —Unverified | 0 |
| VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation | Dec 21, 2024 | Video Generation | —Unverified | 0 |
| Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance | Dec 21, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models | Dec 21, 2024 | QuantizationVideo Generation | —Unverified | 0 |
| DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization | Dec 20, 2024 | Computational EfficiencyDiversity | —Unverified | 0 |
| CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training | Dec 20, 2024 | parameter-efficient fine-tuningVideo Generation | CodeCode Available | 0 |
| Consistent Human Image and Video Generation with Spatially Conditioned Diffusion | Dec 19, 2024 | Computational EfficiencyDenoising | CodeCode Available | 0 |
| AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation | Dec 19, 2024 | Video GenerationVideo Synchronization | —Unverified | 0 |
| Parallelized Autoregressive Visual Generation | Dec 19, 2024 | Video Generation | —Unverified | 0 |
| DirectorLLM for Human-Centric Video Generation | Dec 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VideoDPO: Omni-Preference Alignment for Video Diffusion Generation | Dec 18, 2024 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| FlexCache: Flexible Approximate Cache System for Video Diffusion | Dec 18, 2024 | Video Generation | —Unverified | 0 |
| SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation | Dec 18, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 |
| AKiRa: Augmentation Kit on Rays for optical video generation | Dec 18, 2024 | Video Generation | —Unverified | 0 |
| ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping | Dec 18, 2024 | ObjectVideo Generation | —Unverified | 0 |
| CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices | Dec 17, 2024 | Action RecognitionMotion Estimation | —Unverified | 0 |
| MotionBridge: Dynamic Video Inbetweening with Flexible Controls | Dec 17, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation | Dec 17, 2024 | Story CompletionVideo Generation | —Unverified | 0 |
| InterDyn: Controllable Interactive Dynamics with Video Diffusion Models | Dec 16, 2024 | Video Generation | —Unverified | 0 |
| Can video generation replace cinematographers? Research on the cinematic language of generated video | Dec 16, 2024 | Video Generation | —Unverified | 0 |
| VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting | Dec 16, 2024 | InformativenessLarge Language Model | CodeCode Available | 0 |
| DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes | Dec 15, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| GenLit: Reformulating Single-Image Relighting as Video Generation | Dec 15, 2024 | Image GenerationImage Relighting | —Unverified | 0 |
| MSC: Multi-Scale Spatio-Temporal Causal Attention for Autoregressive Video Diffusion | Dec 13, 2024 | Video Generation | —Unverified | 0 |
| TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation | Dec 13, 2024 | Image to Video GenerationObject | —Unverified | 0 |
| LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity | Dec 13, 2024 | GPUMamba | —Unverified | 0 |
| SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device | Dec 13, 2024 | DenoisingImage Generation | —Unverified | 0 |
| Mojito: Motion Trajectory and Intensity Control for Video Generation | Dec 12, 2024 | Computational EfficiencyOptical Flow Estimation | —Unverified | 0 |
| InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption | Dec 12, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Video Creation by Demonstration | Dec 12, 2024 | Video Generation | —Unverified | 0 |
| T-SVG: Text-Driven Stereoscopic Video Generation | Dec 12, 2024 | Depth EstimationText-to-Video Generation | —Unverified | 0 |
| Enhancing Facial Consistency in Conditional Video Generation via Facial Landmark Transformation | Dec 12, 2024 | Video Generation | —Unverified | 0 |
| OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation | Dec 12, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors | Dec 12, 2024 | 3D ReconstructionImage to 3D | —Unverified | 0 |
| UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer | Dec 12, 2024 | Video Generation | CodeCode Available | 0 |
| FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model | Dec 11, 2024 | Representation LearningVideo Generation | —Unverified | 0 |
| SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization | Dec 11, 2024 | Image ReconstructionRepresentation Learning | —Unverified | 0 |
| Physical Informed Driving World Model | Dec 11, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Multi-Shot Character Consistency for Text-to-Video Generation | Dec 10, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics | Dec 10, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| StyleMaster: Stylize Your Video with Artistic Generation and Translation | Dec 10, 2024 | Contrastive LearningStyle Transfer | —Unverified | 0 |
| STIV: Scalable Text and Image Conditioned Video Generation | Dec 10, 2024 | Video GenerationVideo Prediction | —Unverified | 0 |
| From Slow Bidirectional to Fast Autoregressive Video Diffusion Models | Dec 10, 2024 | GPUVideo Generation | —Unverified | 0 |
| 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation | Dec 10, 2024 | Video Generation | —Unverified | 0 |
| SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations | Dec 9, 2024 | Video Generation | —Unverified | 0 |
| MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation | Dec 8, 2024 | Contrastive LearningImage to Video Generation | —Unverified | 0 |
| Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation | Dec 8, 2024 | Point TrackingVideo Generation | —Unverified | 0 |
| Accelerating Video Diffusion Models via Distribution Matching | Dec 8, 2024 | DenoisingVideo Generation | —Unverified | 0 |