| OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | May 26, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM | May 26, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| From Single Images to Motion Policies via Video-Generation Environment Representations | May 25, 2025 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 |
| SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | May 25, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| WorldEval: World Model as Real-World Robot Policies Evaluator | May 25, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation | May 24, 2025 | Semantic SimilaritySemantic Textual Similarity | —Unverified | 0 |
| VORTA: Efficient Video Diffusion via Routing Sparse Attention | May 24, 2025 | Video Generation | CodeCode Available | 1 |
| DVD-Quant: Data-free Video Diffusion Transformers Quantization | May 24, 2025 | Data Free QuantizationQuantization | CodeCode Available | 1 |
| ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | May 24, 2025 | Action GenerationAutonomous Driving | —Unverified | 0 |
| InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO | May 23, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 0 |
| WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | May 23, 2025 | SandScene Generation | —Unverified | 0 |
| Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | May 22, 2025 | Dialogue GenerationLarge Language Model | —Unverified | 0 |
| Training-Free Efficient Video Generation via Dynamic Token Carving | May 22, 2025 | DenoisingVideo Generation | CodeCode Available | 3 |
| MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | May 22, 2025 | 3D GenerationVideo Generation | —Unverified | 0 |
| Challenger: Affordable Adversarial Driving Video Generation | May 21, 2025 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Generative AI for Autonomous Driving: A Review | May 21, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 |
| AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection | May 21, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Interspatial Attention for Efficient 4D Human Video Generation | May 21, 2025 | Video Generation | —Unverified | 0 |
| CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation | May 21, 2025 | Video Generation | CodeCode Available | 1 |
| Programmatic Video Prediction Using Large Language Models | May 20, 2025 | Autonomous DrivingPrediction | CodeCode Available | 0 |
| LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | May 20, 2025 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 |
| Hunyuan-Game: Industrial-grade Intelligent Game Creation Model | May 20, 2025 | Image GenerationImage to Video Generation | —Unverified | 0 |
| DreamGen: Unlocking Generalization in Robot Learning through Video World Models | May 19, 2025 | Video Generation | CodeCode Available | 4 |
| BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | May 19, 2025 | Binary ClassificationDeepFake Detection | CodeCode Available | 1 |
| Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking | May 19, 2025 | Image GenerationMamba | —Unverified | 0 |
| MAGI-1: Autoregressive Video Generation at Scale | May 19, 2025 | Video Generation | CodeCode Available | 7 |
| FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | May 19, 2025 | Action GenerationHuman action generation | —Unverified | 0 |
| Video-GPT via Next Clip Diffusion | May 18, 2025 | DenoisingImage Animation | CodeCode Available | 1 |
| DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance | May 17, 2025 | Video Generation | CodeCode Available | 2 |
| FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge | May 17, 2025 | Image GenerationScheduling | CodeCode Available | 1 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption | May 17, 2025 | DecoderPosition | —Unverified | 0 |
| Face Consistency Benchmark for GenAI Video | May 16, 2025 | Video Generation | —Unverified | 0 |
| MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | May 15, 2025 | Image AnimationVideo Generation | CodeCode Available | 3 |
| ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars | May 15, 2025 | Image StylizationVideo Generation | —Unverified | 0 |
| Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios | May 14, 2025 | MarketingVideo Generation | —Unverified | 0 |
| Generating time-consistent dynamics with discriminator-guided image diffusion models | May 14, 2025 | Video Generation | —Unverified | 0 |
| Generative AI for Autonomous Driving: Frontiers and Opportunities | May 13, 2025 | Autonomous DrivingVideo Generation | CodeCode Available | 3 |
| Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model | May 12, 2025 | Video Generation | CodeCode Available | 1 |
| Generative Pre-trained Autoregressive Diffusion Transformer | May 12, 2025 | Few-Shot LearningVideo Generation | —Unverified | 0 |
| ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models | May 12, 2025 | Video Generation | —Unverified | 0 |
| DanceGRPO: Unleashing GRPO on Visual Generation | May 12, 2025 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | May 11, 2025 | parameter-efficient fine-tuningVideo Alignment | —Unverified | 0 |
| BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | May 11, 2025 | Video Generation | —Unverified | 0 |
| ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | May 10, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models | May 8, 2025 | Instruction FollowingText-to-Video Generation | —Unverified | 0 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 |
| Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights | May 6, 2025 | Video Generation | —Unverified | 0 |
| A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation | May 6, 2025 | Human AnimationVideo Generation | —Unverified | 0 |