SOTAVerified|Agents Browse Leaderboard About Blog

TextVQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–47 of 47 papers

Title	Date	Tasks	Status	Hype	Score
OmniFusion Technical Report	Apr 9, 2024	MM-VetTextVQA	CodeCode Available	0	5
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization	Feb 12, 2024	In-Context LearningTextVQA	CodeCode Available	0	5
Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model	Jun 24, 2021	DecoderLanguage Modeling	—Unverified	0	0
Analysing the Robustness of Vision-Language-Models to Common Corruptions	Apr 18, 2025	TextVQA	—Unverified	0	0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs	Jun 6, 2024	Language ModellingLarge Language Model	—Unverified	0	0
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Aug 21, 2024	Computational EfficiencyLanguage Modeling	—Unverified	0	0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy	Nov 23, 2024	Instruction FollowingMME	—Unverified	0	0
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models	May 28, 2025	Mixture-of-ExpertsMME	—Unverified	0	0
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA	Oct 13, 2023	Graph LearningObject	—Unverified	0	0
FlexAttention for Efficient High-Resolution Vision-Language Models	Jul 29, 2024	TextVQA	—Unverified	0	0
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture	Nov 11, 2021	Graph AttentionQuestion Answering	—Unverified	0	0
HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models	Dec 11, 2024	TextVQA	—Unverified	0	0
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA	Apr 4, 2023	Answer GenerationLanguage Modelling	—Unverified	0	0
Making the V in Text-VQA Matter	Aug 1, 2023	Optical Character Recognition (OCR)TextVQA	—Unverified	0	0
Multiple-Question Multiple-Answer Text-VQA	Nov 15, 2023	DecoderDenoising	—Unverified	0	0
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering	Dec 16, 2022	Optical Character RecognitionOptical Character Recognition (OCR)	—Unverified	0	0
Sentence Attention Blocks for Answer Grounding	Sep 20, 2023	Question AnsweringSentence	—Unverified	0	0
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps	Dec 9, 2020	DecoderImage Captioning	—Unverified	0	0
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text	May 12, 2021	Optical Character RecognitionOptical Character Recognition (OCR)	—Unverified	0	0
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance	May 29, 2025	Image Super-ResolutionOptical Character Recognition	—Unverified	0	0
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering	Sep 21, 2022	Image CaptioningOptical Character Recognition (OCR)	—Unverified	0	0
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering	Mar 24, 2022	Optical Character RecognitionOptical Character Recognition (OCR)	—Unverified	0	0

Show:10 25 50

← PrevPage 2 of 2Next →

No leaderboard results yet.