SOTAVerified

Referring Expression

Referring expressions places a bounding box around the instance corresponding to the provided description and image.

Papers

Showing 301350 of 364 papers

TitleStatusHype
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal ModelsCode0
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesCode0
Continual Referring Expression Comprehension via Dual Modular MemorizationCode0
A Joint Speaker-Listener-Reinforcer Model for Referring ExpressionsCode0
Whether you can locate or not? Interactive Referring Expression GenerationCode0
NeuralREG: An end-to-end approach to referring expression generationCode0
Collecting Visually-Grounded Dialogue with A Game Of SortsCode0
Modeling Context Between Objects for Referring Expression UnderstandingCode0
MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote SensingCode0
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsCode0
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression ComprehensionCode0
MAttNet: Modular Attention Network for Referring Expression ComprehensionCode0
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?Code0
Referring Expression Comprehension Using Language Adaptive InferenceCode0
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional FiltersCode0
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Give Me Something to Eat: Referring Expression Comprehension with Commonsense KnowledgeCode0
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic ApproachCode0
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension GuidingCode0
Using Syntax to Ground Referring Expressions in Natural ImagesCode0
Referring Expression Generation Using Entity ProfilesCode0
Generation and Comprehension of Unambiguous Object DescriptionsCode0
Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive TeachersCode0
Referring Expression Object Segmentation with Caption-Aware ConsistencyCode0
A Real-time Global Inference Network for One-stage Referring Expression ComprehensionCode0
Improving Quality and Efficiency in Plan-based Neural Data-to-Text GenerationCode0
Adversarial Robustness for Visual Grounding of Multimodal Large Language ModelsCode0
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and SegmentationCode0
A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection TrainingCode0
Exploring Modulated Detection Transformer as a Tool for Action Recognition in VideosCode0
Learning To Segment Every Referring Object Point by PointCode0
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationCode0
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesCode0
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression GroundingCode0
Language-Conditioned Feature Pyramids for Visual Selection TasksCode0
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor EnvironmentsCode0
Towards Language-guided Visual Recognition via Dynamic ConvolutionsCode0
Resilience through Scene Context in Visual Referring Expression GenerationCode0
Towards Omni-supervised Referring Expression SegmentationCode0
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language ModelsCode0
Revisiting Counterfactual Problems in Referring Expression ComprehensionCode0
Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target GranularitiesCode0
Language Adaptive Weight Generation for Multi-task Visual GroundingCode0
Enriching the WebNLG corpusCode0
Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression GroundingCode0
Vision-Language Models Are Not Pragmatically Competent in Referring Expression GenerationCode0
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence GenerationCode0
Adaptive Reconstruction Network for Weakly Supervised Referring Expression GroundingCode0
Improving Contrastive Learning for Referring Expression CountingCode0
Grounding Referring Expressions in Images by Variational ContextCode0
Show:102550
← PrevPage 7 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RandomAcc@0.5m14.6Unverified