Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Nov 20, 2019 Graph Matching Graph Neural Network
Code Code Available 1A Fast and Accurate One-Stage Approach to Visual Grounding Aug 18, 2019 Referring Expression Referring Expression Comprehension
Code Code Available 1ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Aug 6, 2019 Image Retrieval Question Answering
Code Code Available 1ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition Jul 15, 2025 3D visual grounding Visual Grounding
— Unverified 0A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding Jul 9, 2025 3D visual grounding Autonomous Navigation
— Unverified 0VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation Jul 9, 2025 Backdoor Attack Visual Grounding
— Unverified 0SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding Jun 27, 2025 3D visual grounding Natural Language Queries
— Unverified 0DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 0HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Jun 26, 2025 counterfactual Counterfactual Reasoning
— Unverified 0GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding Jun 26, 2025 3D visual grounding Large Language Model
— Unverified 0GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning Jun 22, 2025 Answer Generation Decision Making
— Unverified 0I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 0Unified Representation Space for 3D Visual Grounding Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 0Semantic Localization Guiding Segment Anything Model For Reference Remote Sensing Image Segmentation Jun 12, 2025 Image Segmentation Segmentation
— Unverified 0EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments Jun 9, 2025 Benchmarking Navigate
— Unverified 0Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 0Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Jun 5, 2025 Math Visual Grounding
— Unverified 0From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes Jun 5, 2025 3D visual grounding Object
— Unverified 0RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Jun 4, 2025 Multimodal Reasoning Reasoning Segmentation
— Unverified 0GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Jun 3, 2025 Visual Grounding
— Unverified 0MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs Jun 2, 2025 Instruction Following Text Generation
— Unverified 0D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding May 30, 2025 Diversity Pseudo Label
— Unverified 0mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation May 29, 2025 Question Answering RAG
— Unverified 0Zero-Shot 3D Visual Grounding from Vision-Language Models May 28, 2025 3D visual grounding Visual Grounding
— Unverified 0Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration May 27, 2025 Hallucination Visual Grounding
— Unverified 0Two Causally Related Needles in a Video Haystack May 26, 2025 Video Understanding Visual Grounding
— Unverified 0Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model May 26, 2025 Diagnostic Reinforcement Learning (RL)
Code Code Available 0Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation May 24, 2025 Mathematical Reasoning Multimodal Reasoning
— Unverified 0More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models May 23, 2025 Diagnostic Hallucination
— Unverified 0CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays May 23, 2025 Diagnostic Question Answering
Code Code Available 0Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics May 22, 2025 Image Captioning text similarity
— Unverified 0Training-Free Reasoning and Reflection in MLLMs May 22, 2025 Decoder Multimodal Reasoning
— Unverified 0Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding May 21, 2025 Visual Grounding
— Unverified 0UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning May 20, 2025 Large Language Model Multimodal Large Language Model
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings May 17, 2025 Image to text Information Retrieval
Code Code Available 0MedSG-Bench: A Benchmark for Medical Image Sequences Grounding May 17, 2025 Visual Grounding Visual Question Answering (VQA)
— Unverified 0HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 0Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI May 9, 2025 4k Domain Generalization
Code Code Available 0DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding May 8, 2025 3D visual grounding cross-modal alignment
— Unverified 0AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding May 7, 2025 3D visual grounding Graph Attention
Code Code Available 03DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment May 3, 2025 Sentence Visual Grounding
— Unverified 0VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? Apr 27, 2025 Visual Grounding Visual Storytelling
— Unverified 0Revisiting Data Auditing in Large Vision-Language Models Apr 25, 2025 Visual Grounding
— Unverified 0Visual Intention Grounding for Egocentric Assistants Apr 18, 2025 Object Visual Grounding
— Unverified 0COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts Apr 14, 2025 Benchmarking Object
— Unverified 0Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding Apr 13, 2025 3D visual grounding Data Augmentation
Code Code Available 0DSM: Building A Diverse Semantic Map for 3D Visual Grounding Apr 11, 2025 3D visual grounding Scene Understanding
— Unverified 0AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations Apr 10, 2025 Spatial Reasoning Visual Grounding
— Unverified 0