| Boosting Camera Motion Control for Video Diffusion Transformers | Oct 14, 2024 | Video Generation | —Unverified | 0 |
| Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities | Oct 11, 2024 | DenoisingImage Quality Assessment | —Unverified | 0 |
| HARIVO: Harnessing Text-to-Image Models for Video Generation | Oct 10, 2024 | DiversityVideo Generation | —Unverified | 0 |
| Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Oct 10, 2024 | Video AlignmentVideo Generation | —Unverified | 0 |
| Scaling Laws For Diffusion Transformers | Oct 10, 2024 | Image GenerationText to Image Generation | —Unverified | 0 |
| Animating the Past: Reconstruct Trilobite via Video Generation | Oct 10, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion | Oct 10, 2024 | Denoisingparameter-efficient fine-tuning | CodeCode Available | 0 |
| Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Oct 9, 2024 | Video Generation | CodeCode Available | 0 |
| ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler | Oct 8, 2024 | GPUVideo Generation | —Unverified | 0 |
| BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way | Oct 8, 2024 | DecoderText-to-Video Generation | —Unverified | 0 |
| GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation | Oct 8, 2024 | Multi-Task LearningRobot Manipulation | —Unverified | 0 |
| The Dawn of Video Generation: Preliminary Explorations with SORA-like Models | Oct 7, 2024 | Video Generation | —Unverified | 0 |
| ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction | Oct 7, 2024 | multimodal generationStory Generation | —Unverified | 0 |
| Realizing Video Summarization from the Path of Language-based Semantic Understanding | Oct 6, 2024 | Mixture-of-ExpertsVideo Generation | —Unverified | 0 |
| Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models | Oct 5, 2024 | Image GenerationStyle Transfer | —Unverified | 0 |
| People are poorly equipped to detect AI-powered voice clones | Oct 3, 2024 | Video Generation | —Unverified | 0 |
| Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Oct 3, 2024 | Video Generation | —Unverified | 0 |
| COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Oct 2, 2024 | DecoderPosition | —Unverified | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning | Sep 30, 2024 | BenchmarkingDisparity Estimation | CodeCode Available | 0 |
| Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions | Sep 27, 2024 | DenoisingGaussian Processes | —Unverified | 0 |
| Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation | Sep 26, 2024 | Self-Supervised LearningSSIM | —Unverified | 0 |
| Pose-Guided Fine-Grained Sign Language Video Generation | Sep 25, 2024 | Image GenerationOptical Flow Estimation | —Unverified | 0 |
| Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Sep 24, 2024 | Robot ManipulationVideo Generation | —Unverified | 0 |
| Technical Report: Competition Solution For Modelscope-Sora | Sep 24, 2024 | Text-to-Video GenerationVideo Description | —Unverified | 0 |
| Advancing Video Quality Assessment for AIGC | Sep 23, 2024 | Image GenerationText Generation | —Unverified | 0 |
| Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Sep 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset | Sep 23, 2024 | Image GenerationUnconditional Video Generation | —Unverified | 0 |
| Dormant: Defending against Pose-driven Human Image Animation | Sep 22, 2024 | Image AnimationVideo Generation | CodeCode Available | 0 |
| JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation | Sep 21, 2024 | Video Generation | —Unverified | 0 |
| JoyHallo: Digital human model for Mandarin | Sep 20, 2024 | modelText Generation | —Unverified | 0 |
| Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation | Sep 19, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | Sep 17, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| OSV: One Step is Enough for High-Quality Image to Video Generation | Sep 17, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Sep 11, 2024 | Portrait AnimationTalking Head Generation | CodeCode Available | 0 |
| DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Sep 11, 2024 | DiversityTalking Head Generation | —Unverified | 0 |
| G3PT: Unleash the power of Autoregressive Modeling in 3D Generation via Cross-scale Querying Transformer | Sep 10, 2024 | 3D GenerationVideo Generation | —Unverified | 0 |
| MyGo: Consistent and Controllable Multi-View Driving Video Generation with Camera Control | Sep 10, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation | Sep 9, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency | Sep 4, 2024 | Video Generation | —Unverified | 0 |
| DiVE: DiT-based Video Generation with Enhanced Control | Sep 3, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention | Sep 3, 2024 | Human AnimationVideo Generation | —Unverified | 0 |
| OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model | Sep 2, 2024 | GPUVideo Generation | —Unverified | 0 |
| Compositional 3D-aware Video Generation with LLM Director | Aug 31, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers | Aug 30, 2024 | GPUImage Generation | —Unverified | 0 |
| DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving | Aug 29, 2024 | Autonomous DrivingDenoising | —Unverified | 0 |
| One-Shot Learning Meets Depth Diffusion in Multi-Object Videos | Aug 29, 2024 | One-Shot LearningVideo Generation | —Unverified | 0 |
| Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation | Aug 29, 2024 | AllVideo Generation | —Unverified | 0 |
| GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model | Aug 28, 2024 | Autonomous DrivingData Augmentation | —Unverified | 0 |