SOTAVerified

Image Comprehension

Papers

Showing 2130 of 49 papers

TitleStatusHype
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained ClassificationCode0
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal RetrievalCode0
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens0
Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP0
What Large Language Models Bring to Text-rich VQA?0
Multiplane Prior Guided Few-Shot Aerial Scene Rendering0
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension0
Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension0
GeoLocator: a location-integrated large multimodal model for inferring geo-privacy0
Show:102550
← PrevPage 3 of 5Next →

No leaderboard results yet.