| JOG3R: Towards 3D-Consistent Video Generators | Jan 2, 2025 | Camera Pose EstimationPose Estimation | —Unverified | 0 |
| VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Jan 2, 2025 | Talking Head GenerationVideo Generation | —Unverified | 0 |
| Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Jan 1, 2025 | Image CaptioningImage Generation | —Unverified | 0 |
| EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Jan 1, 2025 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models | Jan 1, 2025 | Adversarial AttackImage to Video Generation | —Unverified | 0 |
| DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion | Jan 1, 2025 | Autonomous DrivingDenoising | —Unverified | 0 |
| Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs | Jan 1, 2025 | Multiple-choiceVideo Generation | —Unverified | 0 |
| ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way | Jan 1, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views | Jan 1, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution | Jan 1, 2025 | 4kSuper-Resolution | —Unverified | 0 |
| IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner | Jan 1, 2025 | Motion GenerationText-to-Video Generation | —Unverified | 0 |
| Video-Bench: Human-Aligned Video Generation Benchmark | Jan 1, 2025 | Large Language ModelVideo Generation | —Unverified | 0 |
| GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking | Jan 1, 2025 | Novel View SynthesisPoint Tracking | —Unverified | 0 |
| MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Jan 1, 2025 | Portrait AnimationVideo Generation | —Unverified | 0 |
| Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling | Jan 1, 2025 | Motion GenerationVideo Generation | —Unverified | 0 |
| Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement | Jan 1, 2025 | Gesture GenerationMotion Generation | —Unverified | 0 |
| Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Jan 1, 2025 | text annotationVideo Generation | —Unverified | 0 |
| Dynamic Camera Poses and Where to Find Them | Jan 1, 2025 | Point TrackingPose Estimation | —Unverified | 0 |
| STDD: Spatio-Temporal Dual Diffusion for Video Generation | Jan 1, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| DreamDrive: Generative 4D Scene Modeling from Street View Images | Dec 31, 2024 | Autonomous DrivingNeural Rendering | —Unverified | 0 |
| Gender Bias in Text-to-Video Generation Models: A case study of Sora | Dec 30, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LTX-Video: Realtime Video Latent Diffusion | Dec 30, 2024 | DenoisingGPU | CodeCode Available | 9 |
| Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling | Dec 30, 2024 | Retrieval-augmented GenerationStory Visualization | —Unverified | 0 |
| ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation | Dec 30, 2024 | Image MattingVideo Generation | —Unverified | 0 |
| Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Dec 30, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation | Dec 30, 2024 | Video GenerationVideo Quality Assessment | CodeCode Available | 3 |
| Open-Sora: Democratizing Efficient Video Production for All | Dec 29, 2024 | AllImage Generation | CodeCode Available | 13 |
| Generative Video Propagation | Dec 27, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | Dec 27, 2024 | Video Generation | CodeCode Available | 0 |
| DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT | Dec 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Accelerating Diffusion Transformers with Dual Feature Caching | Dec 25, 2024 | Video Generation | CodeCode Available | 3 |
| Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation | Dec 24, 2024 | Video Generation | —Unverified | 0 |
| ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Dec 24, 2024 | Human-Object Interaction DetectionVideo Generation | —Unverified | 0 |
| DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers | Dec 24, 2024 | NavSimTrajectory Planning | —Unverified | 0 |
| DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation | Dec 24, 2024 | Video EditingVideo Generation | CodeCode Available | 3 |
| VidTwin: Video VAE with Decoupled Structure and Dynamics | Dec 23, 2024 | DecoderVideo Generation | CodeCode Available | 3 |
| FFA Sora, video generation as fundus fluorescein angiography simulator | Dec 23, 2024 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory | Dec 23, 2024 | Video Generation | —Unverified | 0 |
| Large Motion Video Autoencoding with Cross-modal Video VAE | Dec 23, 2024 | Video Generation | —Unverified | 0 |
| SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults | Dec 22, 2024 | Data AugmentationFault Diagnosis | —Unverified | 0 |
| Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation | Dec 22, 2024 | Video Frame InterpolationVideo Generation | —Unverified | 0 |
| TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models | Dec 21, 2024 | QuantizationVideo Generation | —Unverified | 0 |
| Follow-Your-MultiPose: Tuning-Free Multi-Character Text-to-Video Generation via Pose Guidance | Dec 21, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation | Dec 21, 2024 | Video Generation | —Unverified | 0 |
| CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training | Dec 20, 2024 | parameter-efficient fine-tuningVideo Generation | CodeCode Available | 0 |
| DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization | Dec 20, 2024 | Computational EfficiencyDiversity | —Unverified | 0 |
| Consistent Human Image and Video Generation with Spatially Conditioned Diffusion | Dec 19, 2024 | Computational EfficiencyDenoising | CodeCode Available | 0 |
| Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations | Dec 19, 2024 | Contrastive LearningImage Reconstruction | CodeCode Available | 3 |
| DirectorLLM for Human-Centric Video Generation | Dec 19, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |