| Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Mar 27, 2024 | Image ClassificationImage Comprehension | CodeCode Available | 7 |
| MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | Oct 14, 2023 | Image ClassificationImage Description | CodeCode Available | 7 |
| Improved Baselines with Visual Instruction Tuning | Oct 5, 2023 | Factual Inconsistency Detection in Chart CaptioningImage Classification | CodeCode Available | 6 |
| Visual Instruction Tuning | Apr 17, 2023 | 1 Image, 2*2 Stitching3D Question Answering (3D-QA) | CodeCode Available | 6 |
| Efficient Multimodal Learning from Data-centric Perspective | Feb 18, 2024 | Image ClassificationReferring Expression Comprehension | CodeCode Available | 5 |
| LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | Jun 1, 2023 | Image ClassificationInstruction Following | CodeCode Available | 4 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 |
| Frontiers in Intelligent Colonoscopy | Oct 22, 2024 | Image Captioning | CodeCode Available | 2 |
| Elysium: Exploring Object-level Perception in Videos via MLLM | Mar 25, 2024 | ObjectObject Tracking | CodeCode Available | 2 |
| GLaMM: Pixel Grounding Large Multimodal Model | Nov 6, 2023 | Conversational Question AnsweringImage Captioning | CodeCode Available | 2 |
| Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE | Sep 26, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception | Mar 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Kosmos-2: Grounding Multimodal Large Language Models to the World | Jun 26, 2023 | Image CaptioningIn-Context Learning | CodeCode Available | 1 |
| Modeling Context in Referring Expressions | Jul 31, 2016 | Referring ExpressionReferring expression generation | CodeCode Available | 1 |
| Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation | Apr 22, 2025 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Grounding Language in Multi-Perspective Referential Communication | Oct 4, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding | Sep 9, 2024 | Image RetrievalReferring Expression | CodeCode Available | 0 |
| Resilience through Scene Context in Visual Referring Expression Generation | Apr 18, 2024 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Intrinsic Task-based Evaluation for Referring Expression Generation | Feb 12, 2024 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models | Nov 21, 2023 | Image SegmentationLanguage Modelling | CodeCode Available | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| Whether you can locate or not? Interactive Referring Expression Generation | Aug 19, 2023 | Referring ExpressionReferring Expression Comprehension | CodeCode Available | 0 |
| DisCLIP: Open-Vocabulary Referring Expression Generation | May 30, 2023 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples | May 24, 2023 | DiagnosticReferring Expression | CodeCode Available | 0 |
| Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset | Oct 10, 2022 | FormReferring Expression | —Unverified | 0 |
| Referring Expressions with Rational Speech Act Framework: A Probabilistic Approach | May 16, 2022 | Deep LearningReferring Expression | —Unverified | 0 |
| Non-neural Models Matter: A Re-evaluation of Neural Referring Expression Generation Systems | Mar 15, 2022 | BIG-bench Machine LearningReferring Expression | —Unverified | 0 |
| Using Referring Expression Generation to Model Literary Style | Dec 1, 2021 | modelReferring Expression | —Unverified | 0 |
| Decoupling Pragmatics: Discriminative Decoding for Referring Expression Generation | Oct 1, 2021 | DiversityImage Captioning | —Unverified | 0 |
| What can Neural Referential Form Selectors Learn? | Aug 15, 2021 | FormPosition | —Unverified | 0 |
| Enriching the E2E dataset | Aug 1, 2021 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Perspective-corrected Spatial Referring Expression Generation for Human-Robot Interaction | Apr 4, 2021 | DiversityReferring Expression | —Unverified | 0 |
| Visual Question Answering based on Local-Scene-Aware Referring Expression Generation | Jan 22, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Improving the Naturalness and Diversity of Referring Expression Generation models using Minimum Risk Training | Dec 1, 2020 | DiversityReferring Expression | —Unverified | 0 |
| OMEGA : A probabilistic approach to referring expression generation in a virtual environment | Dec 1, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Referring to what you know and do not know: Making Referring Expression Generation Models Generalize To Unseen Entities | Dec 1, 2020 | DecoderReferring Expression | —Unverified | 0 |
| Generating Quantified Referring Expressions through Attention-Driven Incremental Perception | Dec 1, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| CoNAN: A Complementary Neighboring-based Attention Network for Referring Expression Generation | Dec 1, 2020 | ObjectReferring Expression | —Unverified | 0 |
| Lessons from Computational Modelling of Reference Production in Mandarin and English | Nov 14, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Fuzzy Logic for Vagueness Management in Referring Expression Generation | Sep 1, 2020 | ManagementReferring Expression | —Unverified | 0 |
| Toward Forgetting-Sensitive Referring Expression Generationfor Integrated Robot Architectures | Jul 16, 2020 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Informativity in Image Captions vs. Referring Expressions | Jun 1, 2020 | Image CaptioningObject | —Unverified | 0 |
| MuDoCo: Corpus for Multidomain Coreference Resolution and Referring Expression Generation | May 1, 2020 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| A case study on context-bound referring expression generation | Oct 1, 2019 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Improving Quality and Efficiency in Plan-based Neural Data-to-Text Generation | Sep 22, 2019 | Data-to-Text GenerationReferring Expression | CodeCode Available | 0 |
| Referring Expression Generation Using Entity Profiles | Sep 4, 2019 | Referring ExpressionReferring expression generation | CodeCode Available | 0 |
| Augmenting Robot Knowledge Consultants with Distributed Short Term Memory | Nov 26, 2018 | Referring ExpressionReferring expression generation | —Unverified | 0 |
| Adapting Descriptions of People to the Point of View of a Moving Observer | Nov 1, 2018 | PositionReferring Expression | —Unverified | 0 |
| Enriching the WebNLG corpus | Nov 1, 2018 | Machine TranslationReferring Expression | CodeCode Available | 0 |
| Decoding Strategies for Neural Referring Expression Generation | Nov 1, 2018 | Image CaptioningMachine Translation | —Unverified | 0 |