SOTAVerified|Agents Browse Leaderboard About

TextVQA

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 47 papers

Title	Date	Tasks	Status	Hype
CogVLM2: Visual Language Models for Image and Video Understanding	Aug 29, 2024	MM-VetMVBench	CodeCode Available	9
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document	Mar 7, 2024	document understandingKey Information Extraction	CodeCode Available	5
CogVLM: Visual Expert for Pretrained Language Models	Nov 6, 2023	1 Image, 2*2 StitchingFS-MEVQA	CodeCode Available	5
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Dec 12, 2024	EgoSchema	CodeCode Available	3
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images	Mar 18, 2024	Long-Context UnderstandingTextVQA	CodeCode Available	3
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models	Mar 5, 2024	TextVQAVisual Question Answering	CodeCode Available	3
Towards VQA Models That Can Read	Apr 18, 2019	TextVQAVisual Question Answering (VQA)	CodeCode Available	3
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding	Jan 14, 2025	image-classificationImage Classification	CodeCode Available	2
What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph	Jan 4, 2025	TextVQA	CodeCode Available	2
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models	Jun 3, 2024	Image CaptioningLanguage Modelling	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 5Next →

No leaderboard results yet.