NVLM: Open Frontier-Class Multimodal LLMs Sep 17, 2024 Math Multimodal Reasoning
— Unverified 0Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR Sep 14, 2024 3D Classification Optical Character Recognition
— Unverified 0PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction Sep 8, 2024 Deep Learning Document Layout Analysis
— Unverified 0UNIT: Unifying Image and Text Recognition in One Vision Encoder Sep 6, 2024 Decoder Optical Character Recognition (OCR)
— Unverified 0Confidence-Aware Document OCR Error Detection Sep 6, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Sep 5, 2024 document understanding GPU
— Unverified 0Post-OCR Text Correction for Bulgarian Historical Documents Aug 31, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models Aug 30, 2024 Articles named-entity-recognition
Code Code Available 0ChartEye: A Deep Learning Framework for Chart Information Extraction Aug 28, 2024 Chart Understanding Classification
— Unverified 0Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Aug 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text Aug 27, 2024 Data Augmentation Optical Character Recognition
— Unverified 0Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation Aug 27, 2024 Information Retrieval Instance Segmentation
— Unverified 0Platypus: A Generalized Specialist Model for Reading Text in Various Forms Aug 27, 2024 Handwritten Text Recognition Optical Character Recognition (OCR)
— Unverified 0FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting Aug 27, 2024 Benchmarking Decoder
Code Code Available 0MMR: Evaluating Reading Ability of Large Multimodal Models Aug 26, 2024 Font Recognition MMR total
— Unverified 0Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset Aug 24, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Aug 22, 2024 Language Modeling Language Modelling
— Unverified 0Large Language Models for Page Stream Segmentation Aug 21, 2024 Decoder Optical Character Recognition
— Unverified 0Handwritten Code Recognition for Pen-and-Paper CS Education Aug 7, 2024 Hallucination Language Modeling
Code Code Available 0Advancing Post-OCR Correction: A Comparative Study of Synthetic Data Aug 5, 2024 Optical Character Recognition (OCR) Synthetic Data Generation
Code Code Available 0PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter Jul 30, 2024 Misinformation Optical Character Recognition
Code Code Available 0ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema Jul 26, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0VILA^2: VILA Augmented VILA Jul 24, 2024 Hallucination Optical Character Recognition (OCR)
— Unverified 0Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction Jul 22, 2024 Data Augmentation Optical Character Recognition (OCR)
— Unverified 0PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips Jul 22, 2024 object-detection Object Detection
— Unverified 0Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 Jul 19, 2024 Audio Generation Audio Synthesis
— Unverified 0Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Jul 18, 2024 Decoder Handwriting Recognition
— Unverified 0Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation Jul 9, 2024 Decoder Image Generation
Code Code Available 0Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition Jul 9, 2024 Contrastive Learning Optical Character Recognition (OCR)
— Unverified 0High-Throughput Phenotyping using Computer Vision and Machine Learning Jul 8, 2024 Image Segmentation Optical Character Recognition
Code Code Available 0Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images Jul 7, 2024 Domain Adaptation Optical Character Recognition (OCR)
— Unverified 0Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge Jul 5, 2024 Instance Segmentation Optical Character Recognition (OCR)
— Unverified 0Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR Technologies Jul 5, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction Jul 4, 2024 Language Modeling Language Modelling
Code Code Available 0Proposal Report for the 2nd SciCAP Competition 2024 Jul 2, 2024 Document Summarization Optical Character Recognition (OCR)
— Unverified 0Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription Jun 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation Jun 25, 2024 Computational Efficiency Optical Character Recognition (OCR)
Code Code Available 0News Deja Vu: Connecting Past and Present with Semantic Search Jun 21, 2024 Articles Optical Character Recognition (OCR)
— Unverified 0GUI Action Narrator: Where and When Did That Action Take Place? Jun 19, 2024 Optical Character Recognition (OCR) Video Captioning
— Unverified 0Unifying Multimodal Retrieval via Document Screenshot Embedding Jun 17, 2024 Language Modelling Natural Questions
— Unverified 0Enhancing Question Answering on Charts Through Effective Pre-training Tasks Jun 14, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst Jun 14, 2024 Image Captioning Language Modeling
— Unverified 0M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation Jun 12, 2024 Document Level Machine Translation Document Translation
Code Code Available 0Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval Jun 11, 2024 Image Retrieval Image to text
— Unverified 0Scaling Automatic Extraction of Pseudocode Jun 7, 2024 Code Generation Optical Character Recognition
— Unverified 0Improving Text Generation on Images with Synthetic Captions Jun 1, 2024 Optical Character Recognition (OCR) Text Generation
— Unverified 0Towards Unified Multi-granularity Text Detection with Interactive Attention May 30, 2024 Document Layout Analysis Optical Character Recognition (OCR)
— Unverified 0Notes on Applicability of GPT-4 to Document Understanding May 28, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0RealitySummary: Exploring On-Demand Mixed Reality Text Summarization and Question Answering using Large Language Models May 28, 2024 Document Enhancement Mixed Reality
— Unverified 0Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities May 25, 2024 Boundary Detection Optical Character Recognition
— Unverified 0