| A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text | Aug 27, 2024 | Data AugmentationOptical Character Recognition | —Unverified | 0 |
| Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach | Aug 27, 2024 | License Plate RecognitionOptical Character Recognition | CodeCode Available | 1 |
| Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset | Aug 24, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | Aug 22, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Large Language Models for Page Stream Segmentation | Aug 21, 2024 | DecoderOptical Character Recognition | —Unverified | 0 |
| ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area | Aug 14, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning | Aug 10, 2024 | HallucinationOptical Character Recognition | CodeCode Available | 11 |
| Revisiting Multi-Modal LLM Evaluation | Aug 9, 2024 | Chart UnderstandingOptical Character Recognition | —Unverified | 0 |
| Handwritten Code Recognition for Pen-and-Paper CS Education | Aug 7, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval | Aug 1, 2024 | AttributeOptical Character Recognition | CodeCode Available | 1 |
| PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter | Jul 30, 2024 | MisinformationOptical Character Recognition | CodeCode Available | 0 |
| ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema | Jul 26, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation | Jul 26, 2024 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips | Jul 22, 2024 | object-detectionObject Detection | —Unverified | 0 |
| Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition | Jul 18, 2024 | DecoderHandwriting Recognition | —Unverified | 0 |
| Task-driven single-image super-resolution reconstruction of document scans | Jul 12, 2024 | Image Super-ResolutionOptical Character Recognition | —Unverified | 0 |
| Toward accessible comics for blind and low vision readers | Jul 11, 2024 | Optical Character RecognitionPrompt Engineering | —Unverified | 0 |
| Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation | Jul 9, 2024 | DecoderImage Generation | CodeCode Available | 0 |
| High-Throughput Phenotyping using Computer Vision and Machine Learning | Jul 8, 2024 | Image SegmentationOptical Character Recognition | CodeCode Available | 0 |
| Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR Technologies | Jul 5, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription | Jun 28, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst | Jun 14, 2024 | Image CaptioningLanguage Modeling | —Unverified | 0 |
| M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation | Jun 12, 2024 | Document Level Machine TranslationDocument Translation | CodeCode Available | 0 |
| VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Scaling Automatic Extraction of Pseudocode | Jun 7, 2024 | Code GenerationOptical Character Recognition | —Unverified | 0 |
| CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset | Jun 6, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| Generalized Jersey Number Recognition Using Multi-task Learning With Orientation-guided Weight Refinement | Jun 3, 2024 | Jersey Number RecognitionMulti-Task Learning | —Unverified | 0 |
| Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities | May 25, 2024 | Boundary DetectionOptical Character Recognition | —Unverified | 0 |
| Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Transfer Learning Approach for Railway Technical Map (RTM) Component Identification | May 21, 2024 | Managementobject-detection | —Unverified | 0 |
| GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | May 6, 2024 | Contrastive Learningdocument understanding | CodeCode Available | 0 |
| DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents | Apr 30, 2024 | 8kDiversity | CodeCode Available | 0 |
| Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism | Apr 29, 2024 | document understandingGPU | CodeCode Available | 0 |
| ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images | Apr 29, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer | Apr 19, 2024 | DecoderOptical Character Recognition | —Unverified | 0 |
| Resilience of Large Language Models for Noisy Instructions | Apr 15, 2024 | Automatic Speech RecognitionOptical Character Recognition | —Unverified | 0 |
| TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model | Apr 14, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines | Apr 9, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement | Apr 8, 2024 | BinarizationDocument Enhancement | CodeCode Available | 2 |
| PEaCE: A Chemistry-Oriented Dataset for Optical Character Recognition on Scientific Documents | Mar 23, 2024 | ArticlesOptical Character Recognition | CodeCode Available | 1 |
| Advancing Multilingual Handwritten Numeral Recognition with Attention-driven Transfer Learning | Mar 18, 2024 | Handwritten Digit RecognitionOptical Character Recognition | CodeCode Available | 0 |
| OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System | Mar 18, 2024 | AllDecision Making | —Unverified | 0 |
| Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning | Mar 17, 2024 | Edge DetectionLine Detection | —Unverified | 0 |
| Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering | Mar 14, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 0 |
| Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking | Mar 13, 2024 | Chinese Spell CheckingIn-Context Learning | —Unverified | 0 |
| LOCR: Location-Guided Transformer for Optical Character Recognition | Mar 4, 2024 | MarketingOptical Character Recognition | —Unverified | 0 |
| Large Language Models for Simultaneous Named Entity Extraction and Spelling Correction | Mar 1, 2024 | DecoderOptical Character Recognition | —Unverified | 0 |
| ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | Mar 1, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | CodeCode Available | 1 |
| Representing Online Handwriting for Recognition in Large Vision-Language Models | Feb 23, 2024 | Handwriting RecognitionOptical Character Recognition | —Unverified | 0 |
| Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing | Feb 12, 2024 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |