| Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey | Nov 26, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation | Nov 26, 2024 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| Free^2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Models | Nov 26, 2024 | Reinforcement Learning (RL)Text-to-Video Generation | —Unverified | 0 |
| Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints | Nov 26, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| StableAnimator: High-Quality Identity-Preserving Human Image Animation | Nov 26, 2024 | DenoisingFace Reenactment | CodeCode Available | 5 |
| PhysMotion: Physics-Grounded Dynamics From a Single Image | Nov 26, 2024 | Video Generation | —Unverified | 0 |
| AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM | Nov 26, 2024 | BenchmarkingText-to-Video Generation | CodeCode Available | 1 |
| Identity-Preserving Text-to-Video Generation by Frequency Decomposition | Nov 26, 2024 | Human-Domain Subject-to-VideoImage to Video Generation | CodeCode Available | 4 |
| PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation | Nov 26, 2024 | Video Generation | —Unverified | 0 |
| InTraGen: Trajectory-controlled Video Generation for Object Interactions | Nov 25, 2024 | ObjectVideo Generation | CodeCode Available | 1 |
| Pathways on the Image Manifold: Image Editing via Video Generation | Nov 25, 2024 | Text-based Image EditingVideo Generation | —Unverified | 0 |
| Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing | Nov 25, 2024 | DenoisingVideo Generation | CodeCode Available | 2 |
| Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric | Nov 25, 2024 | Video GenerationVideo Quality Assessment | —Unverified | 0 |
| DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation | Nov 25, 2024 | Large Language ModelMotion Planning | —Unverified | 0 |
| LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Nov 24, 2024 | DiversityImage Animation | —Unverified | 0 |
| Importance-Based Token Merging for Efficient Image and Video Generation | Nov 23, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| Optical-Flow Guided Prompt Optimization for Coherent Video Generation | Nov 23, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 |
| MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation | Nov 22, 2024 | Video Generation | CodeCode Available | 2 |
| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 |
| TaQ-DiT: Time-aware Quantization for Diffusion Transformers | Nov 21, 2024 | DenoisingModel Compression | —Unverified | 0 |
| Understanding World or Predicting Future? A Comprehensive Survey of World Models | Nov 21, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Nov 21, 2024 | Video Generation | CodeCode Available | 1 |
| MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control | Nov 21, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| What You See Is What Matters: A Novel Visual and Physics-Based Metric for Evaluating Video Generation Quality | Nov 20, 2024 | Video Generation | —Unverified | 0 |
| VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models | Nov 20, 2024 | BenchmarkingImage Generation | CodeCode Available | 5 |
| REDUCIO! Generating 10241024 Video within 16 Seconds using Extremely Compressed Motion Latents | Nov 20, 2024 | GPUVideo Generation | CodeCode Available | 3 |
| Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Nov 19, 2024 | 3D GenerationGPU | —Unverified | 0 |
| Towards motion from video diffusion models | Nov 19, 2024 | Video Generation | —Unverified | 0 |
| PoM: Efficient Image and Video Generation with the Polynomial Mixer | Nov 19, 2024 | Video Generation | CodeCode Available | 1 |
| SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input | Nov 18, 2024 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| Medical Video Generation for Disease Progression Simulation | Nov 18, 2024 | PrognosisVideo Generation | —Unverified | 0 |
| Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge | Nov 18, 2024 | Video Generation | —Unverified | 0 |
| SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization | Nov 17, 2024 | Image GenerationQuantization | CodeCode Available | 7 |
| AnimateAnything: Consistent and Controllable Animation for Video Generation | Nov 16, 2024 | Video Generation | —Unverified | 0 |
| ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models | Nov 16, 2024 | HallucinationVideo Generation | —Unverified | 0 |
| OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models | Nov 15, 2024 | Optical Flow EstimationText-to-Video Generation | CodeCode Available | 1 |
| EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation | Nov 15, 2024 | Audio-Driven Body AnimationHuman Animation | CodeCode Available | 7 |
| VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | Nov 14, 2024 | DenoisingRobot Manipulation | —Unverified | 0 |
| Motion Control for Enhanced Complex Action Video Generation | Nov 13, 2024 | Motion GenerationVideo Generation | —Unverified | 0 |
| A Survey on Vision Autoregressive Model | Nov 13, 2024 | 3D GenerationBenchmarking | —Unverified | 0 |
| EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation | Nov 13, 2024 | Video Generation | —Unverified | 0 |
| Artificial Intelligence for Biomedical Video Generation | Nov 12, 2024 | Data AugmentationVideo Generation | CodeCode Available | 0 |
| I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength | Nov 10, 2024 | Video Generation | —Unverified | 0 |
| A Survey of Emerging Approaches and Advances in Video Generation | Nov 9, 2024 | Image to Video GenerationLanguage Modeling | —Unverified | 0 |
| Autoregressive Models in Vision: A Survey | Nov 8, 2024 | 3D GenerationImage Generation | CodeCode Available | 4 |
| WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | Nov 8, 2024 | Decision MakingVideo Generation | —Unverified | 0 |
| Taming Rectified Flow for Inversion and Editing | Nov 7, 2024 | Image GenerationText-to-Image Generation | CodeCode Available | 4 |
| StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration | Nov 7, 2024 | Video Generation | —Unverified | 0 |
| SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Nov 7, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |