| Improving the Diffusability of Autoencoders | Feb 20, 2025 | DecoderImage Generation | —Unverified | 0 |
| MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation | Feb 18, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Feb 18, 2025 | BenchmarkingText Generation | —Unverified | 0 |
| MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation | Feb 16, 2025 | Video Generation | —Unverified | 0 |
| RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Feb 14, 2025 | 3D Scene ReconstructionDepth Estimation | —Unverified | 0 |
| GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Feb 13, 2025 | Contrastive LearningVideo Generation | —Unverified | 0 |
| CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Feb 12, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Feb 12, 2025 | Video Generation | —Unverified | 0 |
| FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis | Feb 12, 2025 | Motion SynthesisOptical Flow Estimation | —Unverified | 0 |
| Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization | Feb 11, 2025 | Image GenerationMotion Generation | —Unverified | 0 |
| Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Feb 11, 2025 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation | Feb 11, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Feb 11, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| Magic 1-For-1: Generating One Minute Video Clips within One Minute | Feb 11, 2025 | Image GenerationImage to Video Generation | CodeCode Available | 0 |
| Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling | Feb 11, 2025 | Video Generation | —Unverified | 0 |
| CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers | Feb 10, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Feb 10, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction | Feb 8, 2025 | Model OptimizationOptical Flow Estimation | CodeCode Available | 0 |
| HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation | Feb 7, 2025 | FormPose Transfer | —Unverified | 0 |
| UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation | Feb 6, 2025 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression | Feb 6, 2025 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation | Feb 6, 2025 | Audio GenerationDiversity | —Unverified | 0 |
| MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Feb 6, 2025 | Image to Video GenerationVideo Editing | —Unverified | 0 |
| Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach | Feb 5, 2025 | Video Generation | —Unverified | 0 |
| MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent | Feb 5, 2025 | Image to Video GenerationMotion Generation | —Unverified | 0 |
| FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise | Feb 5, 2025 | Video Generation | —Unverified | 0 |
| Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models | Feb 4, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| IPO: Iterative Preference Optimization for Text-to-Video Generation | Feb 4, 2025 | Large Language ModelText-to-Video Generation | —Unverified | 0 |
| VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models | Feb 4, 2025 | Motion Generationmotion prediction | —Unverified | 0 |
| OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models | Feb 3, 2025 | Human AnimationHuman-Object Interaction Detection | —Unverified | 0 |
| MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Feb 3, 2025 | BenchmarkingFairness | —Unverified | 0 |
| Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity | Feb 3, 2025 | Video Generation | —Unverified | 0 |
| Secure & Personalized Music-to-Video Generation via CHARCHA | Feb 3, 2025 | RhythmVideo Generation | —Unverified | 0 |
| Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer | Feb 2, 2025 | Reinforcement Learning (RL)Video Generation | —Unverified | 0 |
| HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment | Feb 2, 2025 | Video Generation | —Unverified | 0 |
| Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation | Feb 1, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Shape from Semantics: 3D Shape Generation from Multi-View Semantics | Feb 1, 2025 | 3D geometry3D Shape Generation | —Unverified | 0 |
| Every Image Listens, Every Image Dances: Music-Driven Image Animation | Jan 30, 2025 | Image AnimationVideo Generation | —Unverified | 0 |
| Improving Video Generation with Human Feedback | Jan 23, 2025 | Video Generation | —Unverified | 0 |
| Taming Teacher Forcing for Masked Autoregressive Video Generation | Jan 21, 2025 | Video GenerationVideo Prediction | —Unverified | 0 |
| GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video | Jan 20, 2025 | Video ClassificationVideo Generation | —Unverified | 0 |
| EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | Jan 18, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation | Jan 17, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Learnings from Scaling Visual Tokenizers for Reconstruction and Generation | Jan 16, 2025 | DecoderImage Generation | —Unverified | 0 |
| VideoWorld: Exploring Knowledge Learning from Unlabeled Videos | Jan 16, 2025 | Video Generation | —Unverified | 0 |
| Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | Jan 15, 2025 | DenoisingVideo Denoising | —Unverified | 0 |
| RepVideo: Rethinking Cross-Layer Representation for Video Generation | Jan 15, 2025 | Video Generation | —Unverified | 0 |
| Comprehensive Subjective and Objective Evaluation Method for Text-generated Video | Jan 15, 2025 | Video Generation | —Unverified | 0 |
| GameFactory: Creating New Games with Generative Interactive Videos | Jan 14, 2025 | Domain GeneralizationMinecraft | —Unverified | 0 |
| Diffusion Adversarial Post-Training for One-Step Video Generation | Jan 14, 2025 | Video Generation | —Unverified | 0 |