SOTAVerified

Image Comprehension

Papers

Showing 4149 of 49 papers

TitleStatusHype
What Large Language Models Bring to Text-rich VQA?0
On the Performance of Multimodal Language Models0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens0
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional ComprehensionCode1
Hierarchical Open-vocabulary Universal Image SegmentationCode2
JourneyDB: A Benchmark for Generative Image UnderstandingCode2
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced AdapterCode1
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension0
Show:102550
← PrevPage 5 of 5Next →

No leaderboard results yet.