SOTAVerified

TextVQA

Papers

Showing 3140 of 47 papers

TitleStatusHype
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy0
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models0
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA0
FlexAttention for Efficient High-Resolution Vision-Language Models0
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture0
HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models0
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA0
Making the V in Text-VQA Matter0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.