| Taming Diffusion Transformer for Real-Time Mobile Video Generation | Jul 17, 2025 | Video Generation | —Unverified | 0 |
| LoViC: Efficient Long Video Generation with Context Compression | Jul 17, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Leveraging Pre-Trained Visual Models for AI-Generated Video Detection | Jul 17, 2025 | MisinformationVideo Generation | —Unverified | 0 |
| World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving | Jul 17, 2025 | Accident AnticipationAutonomous Driving | —Unverified | 0 |
| I^2-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting | Jul 12, 2025 | Autonomous DrivingComputational Efficiency | CodeCode Available | 2 |
| Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective | Jul 11, 2025 | Video Generation | CodeCode Available | 0 |
| Scaling RL to Long Videos | Jul 10, 2025 | Reinforcement Learning (RL)Spatial Reasoning | CodeCode Available | 0 |
| Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions | Jul 10, 2025 | Video Generation | —Unverified | 0 |
| A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality | Jul 9, 2025 | DiversityVideo Generation | —Unverified | 0 |
| FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation | Jul 9, 2025 | DescriptiveText Generation | —Unverified | 0 |
| Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation | Jul 8, 2025 | Video Generation | —Unverified | 0 |
| Omni-Video: Democratizing Unified Video Understanding and Generation | Jul 8, 2025 | Video GenerationVideo Understanding | CodeCode Available | 2 |
| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| AnyI2V: Animating Any Conditional Image with Motion Control | Jul 3, 2025 | Style TransferVideo Generation | —Unverified | 0 |
| Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation | Jul 3, 2025 | DiversityVideo Generation | CodeCode Available | 2 |
| LLM-based Realistic Safety-Critical Driving Video Generation | Jul 2, 2025 | Autonomous DrivingAutonomous Vehicles | —Unverified | 0 |
| Geometry-aware 4D Video Generation for Robot Manipulation | Jul 1, 2025 | Robot ManipulationVideo Generation | —Unverified | 0 |
| Epona: Autoregressive Diffusion World Model for Autonomous Driving | Jun 30, 2025 | Autonomous Drivingmodel | CodeCode Available | 3 |
| RoboScape: Physics-informed Embodied World Model | Jun 29, 2025 | 3D geometryDepth Estimation | CodeCode Available | 0 |
| HieraSurg: Hierarchy-Aware Diffusion Model for Surgical Video Generation | Jun 26, 2025 | Panoptic SegmentationSegmentation | —Unverified | 0 |
| Video Virtual Try-on with Conditional Diffusion Transformer Inpainter | Jun 26, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Jun 26, 2025 | Spatial ReasoningVideo Generation | —Unverified | 0 |
| DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Jun 26, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models | Jun 26, 2025 | Texture SynthesisVideo Generation | —Unverified | 0 |
| SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture | Jun 26, 2025 | DenoisingSinging Voice Synthesis | —Unverified | 0 |
| Video Perception Models for 3D Scene Synthesis | Jun 25, 2025 | 3D ReconstructionImage Generation | —Unverified | 0 |
| BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jun 25, 2025 | Artifact DetectionBenchmarking | —Unverified | 0 |
| MinD: Unified Visual Imagination and Control via Hierarchical World Models | Jun 23, 2025 | Video GenerationVideo Prediction | —Unverified | 0 |
| RDPO: Real Data Preference Optimization for Physics Consistency Video Generation | Jun 23, 2025 | Video Generation | —Unverified | 0 |
| OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Jun 23, 2025 | Human AnimationVideo Generation | —Unverified | 0 |
| Matrix-Game: Interactive World Foundation Model | Jun 23, 2025 | Minecraftmodel | CodeCode Available | 5 |
| BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | Jun 23, 2025 | 4D reconstructionDepth Estimation | —Unverified | 0 |
| Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jun 20, 2025 | Temporal SequencesVideo Generation | —Unverified | 0 |
| PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models | Jun 19, 2025 | Image GenerationQuantization | —Unverified | 0 |
| Sekai: A Video Dataset towards World Exploration | Jun 18, 2025 | Video Generation | —Unverified | 0 |
| Show-o2: Improved Native Unified Multimodal Models | Jun 18, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| VideoMAR: Autoregressive Video Generatio with Continuous Tokens | Jun 17, 2025 | GPUImage Generation | —Unverified | 0 |
| Causally Steered Diffusion for Automated Video Counterfactual Generation | Jun 17, 2025 | counterfactualVideo Editing | CodeCode Available | 0 |
| STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation | Jun 16, 2025 | Autonomous DrivingDenoising | —Unverified | 0 |
| UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions | Jun 16, 2025 | 4k8k | —Unverified | 0 |
| iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Jun 15, 2025 | ObjectVideo Generation | —Unverified | 0 |
| AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Jun 12, 2025 | Video Generation | CodeCode Available | 3 |
| GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning | Jun 12, 2025 | GPUVideo Generation | —Unverified | 0 |
| M4V: Multi-Modal Mamba for Text-to-Video Generation | Jun 12, 2025 | MambaText-to-Video Generation | —Unverified | 0 |
| DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers | Jun 12, 2025 | Data AugmentationMarketing | —Unverified | 0 |
| GenWorld: Towards Detecting AI-generated Real-world Simulation Videos | Jun 12, 2025 | Video Generation | —Unverified | 0 |
| PlayerOne: Egocentric World Simulator | Jun 11, 2025 | Video Generation | —Unverified | 0 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 |
| Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Jun 10, 2025 | 3D Lane Detection3D Object Detection | CodeCode Available | 3 |
| MagCache: Fast Video Generation with Magnitude-Aware Cache | Jun 10, 2025 | SSIMVideo Generation | CodeCode Available | 3 |