SOTAVerified

Image Comprehension

Papers

Showing 2649 of 49 papers

TitleStatusHype
Multiplane Prior Guided Few-Shot Aerial Scene Rendering0
An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension0
Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension0
GeoLocator: a location-integrated large multimodal model for inferring geo-privacy0
CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation0
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs0
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM0
FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs0
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine0
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web0
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation0
On the Performance of Multimodal Language Models0
RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving0
Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models0
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models0
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models0
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition0
Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges0
Teach Multimodal LLMs to Comprehend Electrocardiographic Images0
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens0
Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP0
What Large Language Models Bring to Text-rich VQA?0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.