| 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering | Jan 14, 2025 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| LayerAnimate: Layer-specific Control for Animation | Jan 14, 2025 | Video Generation | —Unverified | 0 |
| BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations | Jan 13, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Jan 13, 2025 | Feature CorrelationVideo Generation | —Unverified | 0 |
| Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning | Jan 11, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators | Jan 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| VideoAuteur: Towards Long Narrative Video Generation | Jan 10, 2025 | Video Generation | —Unverified | 0 |
| Multi-subject Open-set Personalization in Video Generation | Jan 10, 2025 | Video Generation | —Unverified | 0 |
| MEt3R: Measuring Multi-View Consistency in Generated Images | Jan 10, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces | Jan 9, 2025 | Video Generation | —Unverified | 0 |
| LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Jan 8, 2025 | Lip Readingspeech-recognition | —Unverified | 0 |
| ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning | Jan 8, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion | Jan 8, 2025 | DenoisingDiversity | —Unverified | 0 |
| Motion-Aware Generative Frame Interpolation | Jan 7, 2025 | Video Generation | —Unverified | 0 |
| Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising | Jan 6, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation | Jan 6, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| License Plate Images Generation with Diffusion Models | Jan 6, 2025 | License Plate RecognitionSynthetic Data Generation | —Unverified | 0 |
| GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking | Jan 5, 2025 | Novel View SynthesisPoint Tracking | —Unverified | 0 |
| JOG3R: Towards 3D-Consistent Video Generators | Jan 2, 2025 | Camera Pose EstimationPose Estimation | —Unverified | 0 |
| VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Jan 2, 2025 | Talking Head GenerationVideo Generation | —Unverified | 0 |
| Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Jan 2, 2025 | FormVideo Generation | —Unverified | 0 |
| Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation | Jan 1, 2025 | text annotationVideo Generation | —Unverified | 0 |
| GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking | Jan 1, 2025 | Novel View SynthesisPoint Tracking | —Unverified | 0 |
| Video-Bench: Human-Aligned Video Generation Benchmark | Jan 1, 2025 | Large Language ModelVideo Generation | —Unverified | 0 |
| Dynamic Camera Poses and Where to Find Them | Jan 1, 2025 | Point TrackingPose Estimation | —Unverified | 0 |
| IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner | Jan 1, 2025 | Motion GenerationText-to-Video Generation | —Unverified | 0 |
| Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform | Jan 1, 2025 | Code GenerationImage Generation | —Unverified | 0 |
| Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement | Jan 1, 2025 | Gesture GenerationMotion Generation | —Unverified | 0 |
| Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Jan 1, 2025 | Image CaptioningImage Generation | —Unverified | 0 |
| PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution | Jan 1, 2025 | 4kSuper-Resolution | —Unverified | 0 |
| DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion | Jan 1, 2025 | Autonomous DrivingDenoising | —Unverified | 0 |
| Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling | Jan 1, 2025 | Motion GenerationVideo Generation | —Unverified | 0 |
| STDD: Spatio-Temporal Dual Diffusion for Video Generation | Jan 1, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs | Jan 1, 2025 | Multiple-choiceVideo Generation | —Unverified | 0 |
| ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way | Jan 1, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views | Jan 1, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models | Jan 1, 2025 | Adversarial AttackImage to Video Generation | —Unverified | 0 |
| MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Jan 1, 2025 | Portrait AnimationVideo Generation | —Unverified | 0 |
| EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Jan 1, 2025 | Image GenerationText-to-Video Generation | —Unverified | 0 |
| DreamDrive: Generative 4D Scene Modeling from Street View Images | Dec 31, 2024 | Autonomous DrivingNeural Rendering | —Unverified | 0 |
| Gender Bias in Text-to-Video Generation Models: A case study of Sora | Dec 30, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling | Dec 30, 2024 | Retrieval-augmented GenerationStory Visualization | —Unverified | 0 |
| ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation | Dec 30, 2024 | Image MattingVideo Generation | —Unverified | 0 |
| Generative Video Propagation | Dec 27, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | Dec 27, 2024 | Video Generation | CodeCode Available | 0 |
| DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers | Dec 24, 2024 | NavSimTrajectory Planning | —Unverified | 0 |
| ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Dec 24, 2024 | Human-Object Interaction DetectionVideo Generation | —Unverified | 0 |
| Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation | Dec 24, 2024 | Video Generation | —Unverified | 0 |
| FFA Sora, video generation as fundus fluorescein angiography simulator | Dec 23, 2024 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory | Dec 23, 2024 | Video Generation | —Unverified | 0 |