| VideoSAVi: Self-Aligned Video Language Models without Human Supervision | Dec 1, 2024 | EgoSchemaMVBench | —Unverified | 0 | 0 |
| VisionArena: 230K Real World User-VLM Conversations with Preference Labels | Dec 11, 2024 | ChatbotSpatial Reasoning | —Unverified | 0 | 0 |
| Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation | Feb 6, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 | 0 |
| Visual Agentic AI for Spatial Reasoning with a Dynamic API | Feb 10, 2025 | Program SynthesisSpatial Reasoning | —Unverified | 0 | 0 |
| VisualEchoes: Spatial Image Representation Learning through Echolocation | May 4, 2020 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 | 0 |
| Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces | May 30, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Nov 15, 2024 | DescriptiveObject | —Unverified | 0 | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 | 0 |