SOTAVerified

Referring Expression Comprehension

Papers

Showing 51100 of 167 papers

TitleStatusHype
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language ModelsCode1
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
RefDrone: A Challenging Benchmark for Referring Expression Comprehension in Drone ScenesCode1
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4DCode1
Referring Transformer: A One-step Approach to Multi-task Visual GroundingCode1
Talk2Car: Taking Control of Your Self-Driving CarCode1
Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression ComprehensionCode1
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun DistillationCode1
TransVG: End-to-End Visual Grounding with TransformersCode1
TRAR: Routing the Attention Spans in Transformer for Visual Question AnsweringCode1
Tune-An-Ellipse: CLIP Has Potential to Find What You WantCode1
Unifying Vision-and-Language Tasks via Text GenerationCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
UNITER: UNiversal Image-TExt Representation LearningCode1
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language TasksCode1
VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsCode1
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature AlignmentCode1
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression ComprehensionCode1
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsCode1
Learning Visual Grounding from Generative Vision and Language Model0
Revisiting Multi-Modal LLM Evaluation0
ScanFormer: Referring Expression Comprehension by Iteratively Scanning0
Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO0
Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding0
Dynamic Graph Attention for Referring Expression Comprehension0
Dynamic Inference With Grounding Based Vision and Language Models0
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension0
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension0
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Exploring Spatial Language Grounding Through Referring Expressions0
FindIt: Generalized Localization with Natural Language Queries0
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks0
FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis0
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping0
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Synthetic Visual Genome0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding0
Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension0
Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension0
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Video Referring Expression Comprehension via Transformer with Content-conditioned Query0
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding0
Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving0
Language-Mediated, Object-Centric Representation Learning0
Text-driven Affordance Learning from Egocentric Vision0
Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.