SOTAVerified

TextVQA

Papers

Showing 110 of 47 papers

TitleStatusHype
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
CogVLM: Visual Expert for Pretrained Language ModelsCode5
Lyra: An Efficient and Speech-Centric Framework for Omni-CognitionCode3
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesCode3
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language ModelsCode3
Towards VQA Models That Can ReadCode3
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal UnderstandingCode2
What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of GraphCode2
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language ModelsCode2
Show:102550
← PrevPage 1 of 5Next →

No leaderboard results yet.