Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach Jan 15, 2024 Optical Character Recognition (OCR)
— Unverified 0An Empirical Study of Scaling Law for Scene Text Recognition Jan 1, 2024 Optical Character Recognition (OCR) Scene Text Recognition
Code Code Available 2Efficient Multi-domain Text Recognition Deep Neural Network Parameterization with Residual Adapters Jan 1, 2024 Multi-Task Learning Optical Character Recognition
Code Code Available 0Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition Dec 31, 2023 Decoder Language Modeling
— Unverified 0An Empirical Study of Scaling Law for OCR Dec 29, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1Chaurah: A Smart Raspberry Pi based Parking System Dec 28, 2023 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Advancements and Challenges in Arabic Optical Character Recognition: A Comprehensive Survey Dec 19, 2023 Articles Optical Character Recognition
— Unverified 0TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement Dec 18, 2023 Optical Character Recognition (OCR) Table Detection
— Unverified 0When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning Dec 16, 2023 Optical Character Recognition (OCR)
Code Code Available 1Privacy-Aware Document Visual Question Answering Dec 15, 2023 document understanding Federated Learning
Code Code Available 1Information Extraction from Unstructured data using Augmented-AI and Computer Vision Dec 15, 2023 Optical Character Recognition (OCR)
— Unverified 0Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation Dec 13, 2023 Optical Character Recognition (OCR)
— Unverified 0Multimodal Sentiment Analysis: Perceived vs Induced Sentiments Dec 12, 2023 Multimodal Sentiment Analysis Optical Character Recognition (OCR)
— Unverified 0Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Dec 11, 2023 Chart Understanding Decoder
Code Code Available 3UPOCR: Towards Unified Pixel-Level OCR Interface Dec 5, 2023 Decoder Optical Character Recognition
— Unverified 0Enhancing Vehicle Entrance and Parking Management: Deep Learning Solutions for Efficiency and Security Dec 5, 2023 Face Detection License Plate Recognition
— Unverified 0DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction Dec 1, 2023 Optical Character Recognition (OCR)
Code Code Available 1Pipeline Enabling Zero-shot Classification for Bangla Handwritten Grapheme Dec 1, 2023 Bangla Text Detection Classification
— Unverified 0Automatic Recognition of Learning Resource Category in a Digital Library Nov 28, 2023 document-image-classification Document Image Classification
Code Code Available 0Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks Nov 28, 2023 Adversarial Attack Optical Character Recognition
— Unverified 0SUT: a new multi-purpose synthetic dataset for Farsi document image analysis Nov 27, 2023 Document Classification document-image-classification
Code Code Available 0Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents Nov 27, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Data Generation for Post-OCR correction of Cyrillic handwriting Nov 27, 2023 Handwriting generation Handwritten Text Recognition
Code Code Available 1Similar Document Template Matching Algorithm Nov 21, 2023 Fraud Detection Optical Character Recognition (OCR)
— Unverified 0ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing Nov 20, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding Nov 20, 2023 document understanding Language Modeling
— Unverified 0Efficient End-to-End Visual Document Understanding with Rationale Distillation Nov 16, 2023 document understanding Image to text
— Unverified 0DECDM: Document Enhancement using Cycle-Consistent Diffusion Models Nov 16, 2023 Data Augmentation Denoising
— Unverified 0Multiple-Question Multiple-Answer Text-VQA Nov 15, 2023 Decoder Denoising
— Unverified 0Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset Nov 14, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0What Large Language Models Bring to Text-rich VQA? Nov 13, 2023 Image Comprehension Optical Character Recognition (OCR)
— Unverified 0DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency Nov 9, 2023 document understanding Key Information Extraction
— Unverified 0FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts Nov 9, 2023 Optical Character Recognition (OCR) Safety Alignment
Code Code Available 1AnyText: Multilingual Visual Text Generation And Editing Nov 6, 2023 Image Generation Optical Character Recognition (OCR)
Code Code Available 4On Manipulating Scene Text in the Wild with Diffusion Models Nov 1, 2023 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding Oct 29, 2023 Answer Generation Chart Question Answering
Code Code Available 0Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation Oct 25, 2023 Handwritten Text Recognition Key Information Extraction
Code Code Available 1GenKIE: Robust Generative Multimodal Document Key Information Extraction Oct 24, 2023 Decoder Key Information Extraction
Code Code Available 1PHD: Pixel-Based Language Modeling of Historical Documents Oct 22, 2023 Language Modeling Language Modelling
Code Code Available 0MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition Oct 20, 2023 named-entity-recognition Named Entity Recognition
— Unverified 0DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond Oct 19, 2023 Document AI Document Layout Analysis
— Unverified 0Towards reducing hallucination in extracting information from financial reports using Large Language Models Oct 16, 2023 Hallucination Optical Character Recognition
— Unverified 0EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge Oct 16, 2023 Image Retrieval Language Modeling
— Unverified 0DSG: An End-to-End Document Structure Generator Oct 13, 2023 Optical Character Recognition (OCR)
Code Code Available 1Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA Oct 13, 2023 Graph Learning Object
— Unverified 0Invisible Threats: Backdoor Attack in OCR Systems Oct 12, 2023 Backdoor Attack Optical Character Recognition
— Unverified 0Solution for SMART-101 Challenge of ICCV Multi-modal Algorithmic Reasoning Task 2023 Oct 10, 2023 Decoder object-detection
— Unverified 0UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model Oct 8, 2023 Decoder Language Modeling
Code Code Available 1Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks Oct 8, 2023 Binarization CPU
Code Code Available 1Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition Oct 8, 2023 Image to text Optical Character Recognition (OCR)
Code Code Available 1