SOTAVerified

TextVQA

Papers

Showing 125 of 47 papers

TitleStatusHype
Mitigating Object Hallucinations via Sentence-Level Early InterventionCode1
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance0
EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models0
Analysing the Robustness of Vision-Language-Models to Common Corruptions0
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language ModelsCode0
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal UnderstandingCode2
What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of GraphCode2
InstructOCR: Instruction Boosting Scene Text SpottingCode0
Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal CluesCode0
Lyra: An Efficient and Speech-Centric Framework for Omni-CognitionCode3
HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models0
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy0
CogVLM2: Visual Language Models for Image and Video UnderstandingCode9
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model0
FlexAttention for Efficient High-Resolution Vision-Language Models0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language ModelsCode2
OmniFusion Technical ReportCode0
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution ImagesCode3
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question AnsweringCode0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language ModelsCode3
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction OptimizationCode0
Towards a Unified Multimodal Reasoning FrameworkCode0
Multiple-Question Multiple-Answer Text-VQA0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.