LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models

2024-12-01Code Available0· sign in to hype

Muhammad Fetrat Qharabagh, Mohammadreza Ghofrani, Kimon Fountoulakis

Code Available — Be the first to reproduce this paper.

Code

github.com/mrghofrani/lvlm-count
OfficialIn paperpytorch★ 9

Abstract

Counting is a fundamental operation for various visual tasks in real-life applications, requiring both object recognition and robust counting capabilities. Despite their advanced visual perception, large vision-language models (LVLMs) struggle with counting tasks, especially when the number of objects exceeds those commonly encountered during training. We enhance LVLMs' counting abilities using a divide-and-conquer approach, breaking counting problems into sub-counting tasks. Our method employs a mechanism that prevents bisecting and thus repetitive counting of objects, which occurs in a naive divide-and-conquer approach. Unlike prior methods, which do not generalize well to counting datasets they have not been trained on, our method performs well on new datasets without any additional training or fine-tuning. We demonstrate that our approach enhances the counting capability of LVLMs across various datasets and benchmarks.

Tasks

Object Recognition

LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models

Code

Abstract

Tasks

Reproductions