| Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress | Mar 18, 2026 | | —Unverified | 0 |
| FineViT: Progressively Unlocking Fine-Grained Perception with Dense Recaptions | Mar 18, 2026 | | —Unverified | 0 |
| A Progressive Visual-Logic-Aligned Framework for Ride-Hailing Adjudication | Mar 18, 2026 | | —Unverified | 0 |
| Grid Spatial Understanding: A Dataset for Textual Spatial Reasoning over Grids, Embodied Settings, and Coordinate Structures | Mar 18, 2026 | | —Unverified | 0 |
| Learning Permutation Distributions via Reflected Diffusion on Ranks | Mar 18, 2026 | | —Unverified | 0 |
| Argument Reconstruction as Supervision for Critical Thinking in LLMs | Mar 18, 2026 | | —Unverified | 0 |
| A 3D Reconstruction Benchmark for Asset Inspection | Mar 18, 2026 | | —Unverified | 0 |
| MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval | Mar 18, 2026 | | —Unverified | 0 |
| Variational Kernel Design for Internal Noise: Gaussian Chaos Noise, Representation Compatibility, and Reliable Deep Learning | Mar 18, 2026 | | —Unverified | 0 |
| Material Magic Wand: Material-Aware Grouping of 3D Parts in Untextured Meshes | Mar 18, 2026 | | —Unverified | 0 |
| Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift | Mar 18, 2026 | | —Unverified | 0 |
| SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems | Mar 18, 2026 | | —Unverified | 0 |
| Shot-Aware Frame Sampling for Video Understanding | Mar 18, 2026 | | —Unverified | 0 |
| Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models | Mar 18, 2026 | | —Unverified | 0 |
| CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval | Mar 18, 2026 | | —Unverified | 0 |
| Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis | Mar 18, 2026 | | —Unverified | 0 |
| Harnessing the Power of Foundation Models for Accurate Material Classification | Mar 18, 2026 | | —Unverified | 0 |
| Rapid Neural Network Prediction of Linear Block Copolymer Free Energies | Mar 18, 2026 | | —Unverified | 0 |
| Large-Scale 3D Ground-Motion Synthesis with Physics-Inspired Latent Operator Flow Matching | Mar 18, 2026 | | —Unverified | 0 |
| Structured SIR: Efficient and Expressive Importance-Weighted Inference for High-Dimensional Image Registration | Mar 18, 2026 | | —Unverified | 0 |
| Joint Degradation-Aware Arbitrary-Scale Super-Resolution for Variable-Rate Extreme Image Compression | Mar 18, 2026 | | —Unverified | 0 |
| Mutually Causal Semantic Distillation Network for Zero-Shot Learning | Mar 18, 2026 | | —Unverified | 0 |
| Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare | Mar 18, 2026 | | —Unverified | 0 |
| From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence | Mar 18, 2026 | | —Unverified | 0 |
| Data-driven model order reduction for structures with piecewise linear nonlinearity using dynamic mode decomposition | Mar 18, 2026 | | —Unverified | 0 |
| ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation | Mar 18, 2026 | | —Unverified | 0 |
| ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression | Mar 18, 2026 | | —Unverified | 0 |
| Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates | Mar 18, 2026 | | —Unverified | 0 |
| AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement | Mar 18, 2026 | | —Unverified | 0 |
| AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization | Mar 18, 2026 | | —Unverified | 0 |
| Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control | Mar 18, 2026 | | —Unverified | 0 |
| Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning | Mar 18, 2026 | | —Unverified | 0 |
| Humans and transformer LMs: Abstraction drives language learning | Mar 18, 2026 | | —Unverified | 0 |
| Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization | Mar 18, 2026 | | —Unverified | 0 |
| Learning When to Attend: Conditional Memory Access for Long-Context LLMs | Mar 18, 2026 | | —Unverified | 0 |
| Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination | Mar 18, 2026 | | —Unverified | 0 |
| Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality | Mar 18, 2026 | | —Unverified | 0 |
| Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions | Mar 18, 2026 | | —Unverified | 0 |
| Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model | Mar 18, 2026 | | —Unverified | 0 |
| Mirror Descent on Riemannian Manifolds | Mar 18, 2026 | | —Unverified | 0 |
| MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing | Mar 18, 2026 | | —Unverified | 0 |
| AdapTS: Lightweight Teacher-Student Approach for Multi-Class and Continual Visual Anomaly Detection | Mar 18, 2026 | | —Unverified | 0 |
| Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing | Mar 18, 2026 | | —Unverified | 0 |
| Informative Semi-Factuals for XAI: The Elaborated Explanations that People Prefer | Mar 18, 2026 | | —Unverified | 0 |
| Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models | Mar 18, 2026 | | —Unverified | 0 |
| ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling | Mar 18, 2026 | | —Unverified | 0 |
| Face anonymization preserving facial expressions and photometric realism | Mar 18, 2026 | | —Unverified | 0 |
| Gaussian Process Limit Reveals Structural Benefits of Graph Transformers | Mar 18, 2026 | | —Unverified | 0 |
| PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery | Mar 18, 2026 | | —Unverified | 0 |
| HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness | Mar 18, 2026 | | —Unverified | 0 |