SOTAVerified|Agents Browse Leaderboard About

TextVQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 11–20 of 47 papers

Title	Date	Tasks	Status	Hype
HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models	Dec 11, 2024	TextVQA	—Unverified	0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy	Nov 23, 2024	Instruction FollowingMME	—Unverified	0
CogVLM2: Visual Language Models for Image and Video Understanding	Aug 29, 2024	MM-VetMVBench	CodeCode Available	9
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model	Aug 21, 2024	Computational EfficiencyLanguage Modeling	—Unverified	0
FlexAttention for Efficient High-Resolution Vision-Language Models	Jul 29, 2024	TextVQA	—Unverified	0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs	Jun 6, 2024	Language ModellingLarge Language Model	—Unverified	0
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models	Jun 3, 2024	Image CaptioningLanguage Modelling	CodeCode Available	2
OmniFusion Technical Report	Apr 9, 2024	MM-VetTextVQA	CodeCode Available	0
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	Mar 18, 2024	Long-Context UnderstandingTextVQA	CodeCode Available	3
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering	Mar 14, 2024	Optical Character RecognitionOptical Character Recognition (OCR)	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 5Next →

No leaderboard results yet.