SOTAVerified

Image Comprehension

Papers

Showing 2649 of 49 papers

TitleStatusHype
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and OutputCode0
Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP0
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-TuningCode0
Multiplane Prior Guided Few-Shot Aerial Scene Rendering0
Enhancing Large Vision Language Models with Self-Training on Image ComprehensionCode2
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-ImprovementCode2
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained ClassificationCode0
Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsCode7
Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models0
EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing DomainCode2
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition0
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine0
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image InputsCode0
GeoLocator: a location-integrated large multimodal model for inferring geo-privacy0
What Large Language Models Bring to Text-rich VQA?0
On the Performance of Multimodal Language Models0
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and CompositionCode0
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens0
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional ComprehensionCode1
Hierarchical Open-vocabulary Universal Image SegmentationCode2
JourneyDB: A Benchmark for Generative Image UnderstandingCode2
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced AdapterCode1
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.