SOTAVerified

Image Comprehension

Papers

Showing 1120 of 49 papers

TitleStatusHype
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM CollaborationCode1
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced AdapterCode1
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of ExpertsCode1
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression ComprehensionCode1
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMsCode1
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional ComprehensionCode1
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained ClassificationCode0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and OutputCode0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.