| SurveillanceVQA-589K: A Benchmark for Comprehensive Surveillance Video-Language Understanding with Large Models | May 19, 2025 | Causal InferenceDecision Making | —Unverified | 0 |
| VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection | May 19, 2025 | Autonomous DrivingLanguage Modeling | —Unverified | 0 |
| IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment | May 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| The Traitors: Deception and Trust in Multi-Agent Language Model Simulations | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling | May 19, 2025 | Graph GenerationKnowledge Distillation | —Unverified | 0 |
| R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model | May 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Thinking-Language Modeling Gap in Large Language Models | May 19, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| 3D Visual Illusion Depth Estimation | May 19, 2025 | Common Sense ReasoningDepth Estimation | CodeCode Available | 1 |
| SpatialLLM: From Multi-modality Data to Urban Spatial Intelligence | May 19, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |