| Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding | Jan 1, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension | Nov 22, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models | Oct 21, 2024 | Instruction Followingobject-detection | —Unverified | 0 |
| Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression | Sep 5, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training | Aug 20, 2024 | Autonomous VehiclesComputational Efficiency | CodeCode Available | 0 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| MaskInversion: Localized Embeddings via Optimization of Explainability Maps | Jul 29, 2024 | Image GenerationReferring Expression | —Unverified | 0 |
| Learning Visual Grounding from Generative Vision and Language Model | Jul 18, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge | Jul 6, 2024 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| M^2IST: Multi-Modal Interactive Side-Tuning for Efficient Referring Expression Comprehension | Jul 1, 2024 | GPUReferring Expression | —Unverified | 0 |