| Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs | Feb 21, 2026 | | —Unverified | 2 |
| OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs | Mar 5, 2026 | | —Unverified | 2 |
| SegviGen: Repurposing 3D Generative Model for Part Segmentation | Mar 17, 2026 | | —Unverified | 2 |
| BPMN Assistant: An LLM-Based Approach to Business Process Modeling | Jan 22, 2026 | | —Unverified | 2 |
| Shaping capabilities with token-level data filtering | Jan 30, 2026 | | —Unverified | 2 |
| SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models | Mar 17, 2026 | | —Unverified | 2 |
| Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision | Jan 27, 2026 | | —Unverified | 2 |
| WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories | Mar 2, 2026 | | —Unverified | 2 |
| compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data | Feb 6, 2026 | | —Unverified | 2 |
| Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange | Mar 15, 2026 | | —Unverified | 2 |
| SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation | Feb 24, 2026 | | —Unverified | 2 |
| Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs | Jan 27, 2026 | | —Unverified | 2 |
| D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI | Mar 3, 2026 | | —Unverified | 2 |
| Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing | Jan 28, 2026 | | —Unverified | 2 |
| MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing | Mar 5, 2026 | | —Unverified | 2 |
| FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation | Feb 13, 2026 | | —Unverified | 2 |
| DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation | Jan 29, 2026 | | —Unverified | 2 |
| A Survey on Efficient Vision-Language-Action Models | Feb 2, 2026 | | —Unverified | 2 |
| Rethinking Video Generation Model for the Embodied World | Jan 21, 2026 | | —Unverified | 2 |
| Context Forcing: Consistent Autoregressive Video Generation with Long Context | Feb 5, 2026 | | —Unverified | 2 |
| Exploring Reasoning Reward Model for Agents | Jan 29, 2026 | | —Unverified | 2 |
| FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation | Feb 6, 2026 | | —Unverified | 2 |
| HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising | Mar 9, 2026 | | —Unverified | 2 |
| TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation | Mar 19, 2026 | | —Unverified | 2 |
| daVinci-Dev: Agent-native Mid-training for Software Engineering | Jan 27, 2026 | | —Unverified | 2 |
| MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning | Mar 3, 2026 | | —Unverified | 2 |
| XSkill: Continual Learning from Experience and Skills in Multimodal Agents | Mar 13, 2026 | | —Unverified | 2 |
| RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind | Feb 25, 2026 | | —Unverified | 2 |
| Generative Visual Code Mobile World Models | Feb 2, 2026 | | —Unverified | 2 |
| ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation | Feb 12, 2026 | | —Unverified | 2 |
| Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? | Feb 28, 2026 | | —Unverified | 2 |
| LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? | Feb 26, 2026 | | —Unverified | 2 |
| Lost in Stories: Consistency Bugs in Long Story Generation by LLMs | Mar 6, 2026 | | —Unverified | 2 |
| Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy | Mar 2, 2026 | | —Unverified | 2 |
| Efficient Autoregressive Video Diffusion with Dummy Head | Jan 28, 2026 | | —Unverified | 2 |
| WildActor: Unconstrained Identity-Preserving Video Generation | Mar 9, 2026 | | —Unverified | 2 |
| Olaf-World: Orienting Latent Actions for Video World Modeling | Feb 10, 2026 | | —Unverified | 2 |
| Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models | Feb 25, 2026 | | —Unverified | 2 |
| ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference | Feb 14, 2026 | | —Unverified | 2 |
| Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation | Feb 2, 2026 | | —Unverified | 2 |
| LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction | Mar 14, 2026 | | —Unverified | 2 |
| Q-learning with Adjoint Matching | Jan 23, 2026 | | —Unverified | 2 |
| Proact-VL: A Proactive VideoLLM for Real-Time AI Companions | Mar 3, 2026 | | —Unverified | 2 |
| ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation | Feb 12, 2026 | | —Unverified | 2 |
| Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models | Jan 29, 2026 | | —Unverified | 2 |
| Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis | Feb 3, 2026 | | —Unverified | 2 |
| Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory | Feb 3, 2026 | | —Unverified | 2 |
| Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics | Feb 7, 2026 | | —Unverified | 2 |
| RealWonder: Real-Time Physical Action-Conditioned Video Generation | Mar 5, 2026 | | —Unverified | 2 |
| EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding | Mar 4, 2026 | | —Unverified | 2 |