SOTAVerified

document understanding

Document understanding involves document classification, layout analysis, information extraction, and DocQA.

Papers

Showing 101150 of 309 papers

TitleStatusHype
Token-level Correlation-guided Compression for Efficient Multimodal Document UnderstandingCode0
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document UnderstandingCode1
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition0
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documentsCode1
Hypergraph based Understanding for Document Semantic Entity RecognitionCode0
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document UnderstandingCode2
MMLongBench-Doc: Benchmarking Long-context Document Understanding with VisualizationsCode2
ColPali: Efficient Document Retrieval with Vision Language ModelsCode7
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming0
DrVideo: Document Retrieval Based Long Video Understanding0
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical ReportCode0
Enhancing Question Answering on Charts Through Effective Pre-training Tasks0
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications0
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningCode1
Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use0
Notes on Applicability of GPT-4 to Document Understanding0
Focus Anywhere for Fine-grained Multi-page Document UnderstandingCode5
Multimodal Adaptive Inference for Document Image Classification with Anytime Early ExitingCode0
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingCode0
CREPE: Coordinate-Aware End-to-End Document Parser0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Machine Unlearning for Document ClassificationCode0
A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents0
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach0
HRVDA: High-Resolution Visual Document Assistant0
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document UnderstandingCode0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionCode0
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence0
Visually Guided Generative Text-Layout Pre-training for Document IntelligenceCode2
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding0
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document UnderstandingCode0
TextMonkey: An OCR-Free Large Multimodal Model for Understanding DocumentCode5
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models0
3MVRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document UnderstandingCode0
Hierarchical Multimodal Pre-training for Visually Rich Webpage UnderstandingCode1
Read and Think: An Efficient Step-wise Multimodal Language Model for Document Understanding and Reasoning0
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning0
LAPDoc: Layout-Aware Prompting for Documents0
Financial Report Chunking for Effective Retrieval Augmented GenerationCode0
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents0
On the Affinity, Rationality, and Diversity of Hierarchical Topic ModelingCode1
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with InstructionsCode2
INTERS: Unlocking the Power of Large Language Models in Search with Instruction TuningCode3
Long Context Compression with Activation BeaconCode0
Multimodal weighted graph representation for information extraction from visually rich documents.Code0
DocGraphLM: Documental Graph Language Model for Information Extraction0
On Scaling Up a Multilingual Vision and Language Model0
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table RecognitionCode0
DocLLM: A layout-aware generative language model for multimodal document understanding0
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.