Hespi: A pipeline for automatically detecting information from hebarium specimen sheets Oct 11, 2024 Handwritten Text Recognition HTR
Code Code Available 1Automated Quality Control System for Canned Tuna Production using Artificial Vision Oct 8, 2024 GPU Optical Character Recognition (OCR)
— Unverified 0Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR Oct 8, 2024 object-detection Object Detection
— Unverified 0TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens Oct 7, 2024 Language Modeling Language Modelling
Code Code Available 2Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends Oct 5, 2024 Benchmarking Chart Understanding
— Unverified 0Khattat: Enhancing Readability and Concept Representation of Semantic Typography Oct 1, 2024 Language Modeling Language Modelling
— Unverified 0MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Sep 30, 2024 Mixture-of-Experts Optical Character Recognition (OCR)
— Unverified 0JaPOC: Japanese Post-OCR Correction Benchmark using Vouchers Sep 30, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Sep 30, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Scrambled text: training Language Models to correct OCR errors using synthetic data Sep 29, 2024 Articles Language Modeling
Code Code Available 0See then Tell: Enhancing Key Information Extraction with Vision Grounding Sep 29, 2024 Image to text Key Information Extraction
— Unverified 0CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials Sep 27, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0MinerU: An Open-Source Solution for Precise Document Content Extraction Sep 27, 2024 Diversity Optical Character Recognition (OCR)
Code Code Available 16JoyType: A Robust Design for Multilingual Visual Text Creation Sep 26, 2024 Image Generation Optical Character Recognition (OCR)
— Unverified 0General Detection-based Text Line Recognition Sep 25, 2024 HTR Optical Character Recognition (OCR)
Code Code Available 2MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features Sep 25, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents Sep 25, 2024 named-entity-recognition Named Entity Recognition
Code Code Available 0@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 0One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks Sep 20, 2024 All Dependency Parsing
Code Code Available 1NVLM: Open Frontier-Class Multimodal LLMs Sep 17, 2024 Math Multimodal Reasoning
— Unverified 0Computer Vision Intelligence Test Modeling and Generation: A Case Study on Smart OCR Sep 14, 2024 3D Classification Optical Character Recognition
— Unverified 0PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction Sep 8, 2024 Deep Learning Document Layout Analysis
Code Code Available 0UNIT: Unifying Image and Text Recognition in One Vision Encoder Sep 6, 2024 Decoder Optical Character Recognition (OCR)
— Unverified 0Confidence-Aware Document OCR Error Detection Sep 6, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Sep 5, 2024 document understanding GPU
Code Code Available 0MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Sep 4, 2024 Optical Character Recognition (OCR)
Code Code Available 4General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Sep 3, 2024 Decoder Math
Code Code Available 9Post-OCR Text Correction for Bulgarian Historical Documents Aug 31, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models Aug 30, 2024 Articles named-entity-recognition
Code Code Available 0ChartEye: A Deep Learning Framework for Chart Information Extraction Aug 28, 2024 Chart Understanding Classification
— Unverified 0Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Aug 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding Aug 27, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 1Platypus: A Generalized Specialist Model for Reading Text in Various Forms Aug 27, 2024 Handwritten Text Recognition Optical Character Recognition (OCR)
Code Code Available 0FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting Aug 27, 2024 Benchmarking Decoder
Code Code Available 0A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text Aug 27, 2024 Data Augmentation Optical Character Recognition
— Unverified 0Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation Aug 27, 2024 Information Retrieval Instance Segmentation
— Unverified 0Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach Aug 27, 2024 License Plate Recognition Optical Character Recognition
Code Code Available 1MMR: Evaluating Reading Ability of Large Multimodal Models Aug 26, 2024 Font Recognition MMR total
— Unverified 0Ancient but Digitized: Developing Handwritten Optical Character Recognition for East Syriac Script Through Creating KHAMIS Dataset Aug 24, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Aug 22, 2024 Language Modeling Language Modelling
— Unverified 0Large Language Models for Page Stream Segmentation Aug 21, 2024 Decoder Optical Character Recognition
— Unverified 0ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area Aug 14, 2024 Language Modeling Language Modelling
Code Code Available 2SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Aug 10, 2024 Hallucination Optical Character Recognition
Code Code Available 11Handwritten Code Recognition for Pen-and-Paper CS Education Aug 7, 2024 Hallucination Language Modeling
Code Code Available 0Advancing Post-OCR Correction: A Comparative Study of Synthetic Data Aug 5, 2024 Optical Character Recognition (OCR) Synthetic Data Generation
Code Code Available 0MiniCPM-V: A GPT-4V Level MLLM on Your Phone Aug 3, 2024 Hallucination Multiple-choice
Code Code Available 12Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Aug 1, 2024 Math MM-Vet
Code Code Available 3PIXELMOD: Improving Soft Moderation of Visual Misleading Information on Twitter Jul 30, 2024 Misinformation Optical Character Recognition
Code Code Available 0Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1