| HARIVO: Harnessing Text-to-Image Models for Video Generation | Oct 10, 2024 | DiversityVideo Generation | —Unverified | 0 |
| Progressive Autoregressive Video Diffusion Models | Oct 10, 2024 | DenoisingVideo Denoising | CodeCode Available | 2 |
| Scaling Laws For Diffusion Transformers | Oct 10, 2024 | Image GenerationText to Image Generation | —Unverified | 0 |
| Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis | Oct 9, 2024 | Video Generation | CodeCode Available | 0 |
| GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation | Oct 8, 2024 | Multi-Task LearningRobot Manipulation | —Unverified | 0 |
| SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution | Oct 8, 2024 | Super-ResolutionVideo Generation | CodeCode Available | 1 |
| Pyramidal Flow Matching for Efficient Video Generative Modeling | Oct 8, 2024 | GPUText-to-Video Generation | CodeCode Available | 7 |
| ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler | Oct 8, 2024 | GPUVideo Generation | —Unverified | 0 |
| TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation | Oct 8, 2024 | Video Generation | CodeCode Available | 2 |
| BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way | Oct 8, 2024 | DecoderText-to-Video Generation | —Unverified | 0 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation | Oct 7, 2024 | Prompt EngineeringVideo Generation | CodeCode Available | 2 |
| The Dawn of Video Generation: Preliminary Explorations with SORA-like Models | Oct 7, 2024 | Video Generation | —Unverified | 0 |
| Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality | Oct 7, 2024 | Video Generation | CodeCode Available | 1 |
| ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction | Oct 7, 2024 | multimodal generationStory Generation | —Unverified | 0 |
| Realizing Video Summarization from the Path of Language-based Semantic Understanding | Oct 6, 2024 | Mixture-of-ExpertsVideo Generation | —Unverified | 0 |
| Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models | Oct 5, 2024 | Image GenerationStyle Transfer | —Unverified | 0 |
| Accelerating Diffusion Transformers with Token-wise Feature Caching | Oct 5, 2024 | Video Generation | CodeCode Available | 3 |
| ECHOPulse: ECG controlled echocardio-grams video generation | Oct 4, 2024 | Video Generation | CodeCode Available | 1 |
| Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach | Oct 4, 2024 | Image GenerationImage to Video Generation | CodeCode Available | 1 |
| People are poorly equipped to detect AI-powered voice clones | Oct 3, 2024 | Video Generation | —Unverified | 0 |
| SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | Oct 3, 2024 | Image GenerationQuantization | CodeCode Available | 7 |
| Loong: Generating Minute-level Long Videos with Autoregressive Language Models | Oct 3, 2024 | Video Generation | —Unverified | 0 |
| COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation | Oct 2, 2024 | DecoderPosition | —Unverified | 0 |
| MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation | Oct 2, 2024 | Video Generation | CodeCode Available | 1 |
| Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining | Oct 1, 2024 | Atari Gamesmodel | CodeCode Available | 1 |
| ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning | Sep 30, 2024 | BenchmarkingDisparity Estimation | CodeCode Available | 0 |
| Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs | Sep 30, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Replace Anyone in Videos | Sep 30, 2024 | Video GenerationVideo Inpainting | CodeCode Available | 4 |
| PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | Sep 27, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions | Sep 27, 2024 | DenoisingGaussian Processes | —Unverified | 0 |
| Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation | Sep 26, 2024 | Self-Supervised LearningSSIM | —Unverified | 0 |
| A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation | Sep 26, 2024 | Inductive BiasVideo Generation | CodeCode Available | 1 |
| Pose-Guided Fine-Grained Sign Language Video Generation | Sep 25, 2024 | Image GenerationOptical Flow Estimation | —Unverified | 0 |
| Technical Report: Competition Solution For Modelscope-Sora | Sep 24, 2024 | Text-to-Video GenerationVideo Description | —Unverified | 0 |
| Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Sep 24, 2024 | Robot ManipulationVideo Generation | —Unverified | 0 |
| FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset | Sep 23, 2024 | Image GenerationUnconditional Video Generation | —Unverified | 0 |
| Video-to-Audio Generation with Fine-grained Temporal Semantics | Sep 23, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Advancing Video Quality Assessment for AIGC | Sep 23, 2024 | Image GenerationText Generation | —Unverified | 0 |
| Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | Sep 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Dormant: Defending against Pose-driven Human Image Animation | Sep 22, 2024 | Image AnimationVideo Generation | CodeCode Available | 0 |
| JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation | Sep 21, 2024 | Video Generation | —Unverified | 0 |
| JoyHallo: Digital human model for Mandarin | Sep 20, 2024 | modelText Generation | —Unverified | 0 |
| Infrared Small Target Detection in Satellite Videos: A New Dataset and A Novel Recurrent Feature Refinement Framework | Sep 19, 2024 | Motion CompensationVideo Generation | CodeCode Available | 1 |
| Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation | Sep 19, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | Sep 17, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| OSV: One Step is Enough for High-Quality Image to Video Generation | Sep 17, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Sep 11, 2024 | DiversityTalking Head Generation | —Unverified | 0 |
| EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Sep 11, 2024 | Portrait AnimationTalking Head Generation | CodeCode Available | 0 |
| Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Sep 11, 2024 | 3D Generation3D Reconstruction | CodeCode Available | 3 |