| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 |
| Understanding World or Predicting Future? A Comprehensive Survey of World Models | Nov 21, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control | Nov 21, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| TaQ-DiT: Time-aware Quantization for Diffusion Transformers | Nov 21, 2024 | DenoisingModel Compression | —Unverified | 0 |
| What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality | Nov 20, 2024 | Video Generation | —Unverified | 0 |
| Towards motion from video diffusion models | Nov 19, 2024 | Video Generation | —Unverified | 0 |
| Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Nov 19, 2024 | 3D GenerationGPU | —Unverified | 0 |
| Medical Video Generation for Disease Progression Simulation | Nov 18, 2024 | PrognosisVideo Generation | —Unverified | 0 |
| SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input | Nov 18, 2024 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge | Nov 18, 2024 | Video Generation | —Unverified | 0 |
| AnimateAnything: Consistent and Controllable Animation for Video Generation | Nov 16, 2024 | Video Generation | —Unverified | 0 |
| ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models | Nov 16, 2024 | HallucinationVideo Generation | —Unverified | 0 |
| VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Nov 14, 2024 | DenoisingRobot Manipulation | —Unverified | 0 |
| Motion Control for Enhanced Complex Action Video Generation | Nov 13, 2024 | Motion GenerationVideo Generation | —Unverified | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation | Nov 13, 2024 | Video Generation | —Unverified | 0 |
| Artificial Intelligence for Biomedical Video Generation | Nov 12, 2024 | Data AugmentationVideo Generation | CodeCode Available | 0 |
| I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength | Nov 10, 2024 | Video Generation | —Unverified | 0 |
| A Survey of Emerging Approaches and Advances in Video Generation | Nov 9, 2024 | Image to Video GenerationLanguage Modeling | —Unverified | 0 |
| WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | Nov 8, 2024 | Decision MakingVideo Generation | —Unverified | 0 |
| DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion | Nov 7, 2024 | 3D GenerationDenoising | —Unverified | 0 |
| SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Nov 7, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration | Nov 7, 2024 | Video Generation | —Unverified | 0 |
| TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation | Nov 5, 2024 | Image to Video GenerationMisinformation | —Unverified | 0 |
| Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey | Nov 5, 2024 | 3D Scene ReconstructionAutonomous Driving | —Unverified | 0 |
| How Far is Video Generation from World Model: A Physical Law Perspective | Nov 4, 2024 | Video Generation | —Unverified | 0 |
| Adaptive Caching for Faster Video Generation with Diffusion Transformers | Nov 4, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation | Nov 3, 2024 | MambaOptical Flow Estimation | —Unverified | 0 |
| Fashion-VDM: Video Diffusion Model for Virtual Try-On | Oct 31, 2024 | Video GenerationVirtual Try-on | —Unverified | 0 |
| Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Oct 31, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LumiSculpt: A Consistency Lighting Control Network for Video Generation | Oct 30, 2024 | Video Generation | —Unverified | 0 |
| SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation | Oct 30, 2024 | Video Generation | —Unverified | 0 |
| Investigating Memorization in Video Diffusion Models | Oct 29, 2024 | MemorizationVideo Generation | —Unverified | 0 |
| ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation | Oct 27, 2024 | Video Generation | —Unverified | 0 |
| GiVE: Guiding Visual Encoder to Perceive Overlooked Information | Oct 26, 2024 | ObjectQuestion Answering | —Unverified | 0 |
| MarDini: Masked Autoregressive Diffusion for Video Generation at Scale | Oct 26, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality | Oct 25, 2024 | Video Generation | —Unverified | 0 |
| Framer: Interactive Frame Interpolation | Oct 24, 2024 | Image MorphingVideo Generation | —Unverified | 0 |
| VISAGE: Video Synthesis using Action Graphs for Surgery | Oct 23, 2024 | Video Generation | —Unverified | 0 |
| WorldSimBench: Towards Video Generation Models as World Simulators | Oct 23, 2024 | Autonomous DrivingRobot Manipulation | —Unverified | 0 |
| 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors | Oct 21, 2024 | 3DGSDecoder | —Unverified | 0 |
| EVA: An Embodied World Model for Future Video Anticipation | Oct 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| FrameBridge: Improving Image-to-Video Generation with Bridge Models | Oct 20, 2024 | Image AnimationImage to Video Generation | —Unverified | 0 |
| Fundus to Fluorescein Angiography Video Generation as a Retinal Generative Foundation Model | Oct 17, 2024 | Disease PredictionGenerative Adversarial Network | —Unverified | 0 |
| DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation | Oct 17, 2024 | 3DGS4D reconstruction | —Unverified | 0 |
| AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations | Oct 17, 2024 | DecoderQuantization | —Unverified | 0 |
| VidPanos: Generative Panoramic Videos from Casual Panning Videos | Oct 17, 2024 | Image StitchingVideo Generation | —Unverified | 0 |
| DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Oct 17, 2024 | Video Generation | —Unverified | 0 |
| DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships | Oct 14, 2024 | Video Generation | —Unverified | 0 |
| Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention | Oct 14, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |