SOTAVerified
Home/Multimodal & Vision-Language

Multimodal & Vision-Language

171 tasks · View all areas

Papers in this area

Showing 110 of 10 papers

TitleStatusHype
EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent0
Visual Place Recognition for Large-Scale UAV Applications0
Transformer-based Spatial Grounding: A Comprehensive Survey0
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding0
Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark0
LaViPlan : Language-Guided Visual Path Planning with RLVR0
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities0
AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation0
LoViC: Efficient Long Video Generation with Context Compression0
MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval0
Show:102550