SOTAVerified

TextVQA

Papers

Showing 1120 of 47 papers

TitleStatusHype
HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy0
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
FlexAttention for Efficient High-Resolution Vision-Language Models0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language ModelsCode2
OmniFusion Technical ReportCode0
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesCode3
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question AnsweringCode0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.