ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema Jul 26, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0VILA^2: VILA Augmented VILA Jul 24, 2024 Hallucination Optical Character Recognition (OCR)
— Unverified 0PLayerTV: Advanced Player Tracking and Identification for Automatic Soccer Highlight Clips Jul 22, 2024 object-detection Object Detection
— Unverified 0Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction Jul 22, 2024 Data Augmentation Optical Character Recognition (OCR)
— Unverified 0Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 Jul 19, 2024 Audio Generation Audio Synthesis
— Unverified 0Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Jul 18, 2024 Decoder Handwriting Recognition
— Unverified 0VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Jul 17, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 1Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation Jul 9, 2024 Decoder Image Generation
Code Code Available 0Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition Jul 9, 2024 Contrastive Learning Optical Character Recognition (OCR)
— Unverified 0High-Throughput Phenotyping using Computer Vision and Machine Learning Jul 8, 2024 Image Segmentation Optical Character Recognition
Code Code Available 0Semantic Segmentation for Real-World and Synthetic Vehicle's Forward-Facing Camera Images Jul 7, 2024 Domain Adaptation Optical Character Recognition (OCR)
— Unverified 0FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding Jul 6, 2024 Optical Character Recognition (OCR) Visual Question Answering (VQA)
Code Code Available 1Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR Technologies Jul 5, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge Jul 5, 2024 Instance Segmentation Optical Character Recognition (OCR)
— Unverified 0Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction Jul 4, 2024 Language Modeling Language Modelling
Code Code Available 0Proposal Report for the 2nd SciCAP Competition 2024 Jul 2, 2024 Document Summarization Optical Character Recognition (OCR)
— Unverified 0A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding Jul 2, 2024 document understanding Key Information Extraction
Code Code Available 2MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Jul 1, 2024 Benchmarking document understanding
Code Code Available 2Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription Jun 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation Jun 25, 2024 Computational Efficiency Optical Character Recognition (OCR)
Code Code Available 0MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data Jun 24, 2024 Data Augmentation Optical Character Recognition (OCR)
Code Code Available 5News Deja Vu: Connecting Past and Present with Semantic Search Jun 21, 2024 Articles Optical Character Recognition (OCR)
— Unverified 0GUI Action Narrator: Where and When Did That Action Take Place? Jun 19, 2024 Optical Character Recognition (OCR) Video Captioning
— Unverified 0Unifying Multimodal Retrieval via Document Screenshot Embedding Jun 17, 2024 Language Modelling Natural Questions
— Unverified 0GUICourse: From General Vision Language Models to Versatile GUI Agents Jun 17, 2024 Natural Language Visual Grounding Optical Character Recognition (OCR)
Code Code Available 2OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst Jun 14, 2024 Image Captioning Language Modeling
— Unverified 0Enhancing Question Answering on Charts Through Effective Pre-training Tasks Jun 14, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation Jun 12, 2024 Document Level Machine Translation Document Translation
Code Code Available 0Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval Jun 11, 2024 Image Retrieval Image to text
— Unverified 0Scaling Automatic Extraction of Pseudocode Jun 7, 2024 Code Generation Optical Character Recognition
— Unverified 0CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset Jun 6, 2024 object-detection Object Detection
Code Code Available 1Improving Text Generation on Images with Synthetic Captions Jun 1, 2024 Optical Character Recognition (OCR) Text Generation
— Unverified 0Towards Unified Multi-granularity Text Detection with Interactive Attention May 30, 2024 Document Layout Analysis Optical Character Recognition (OCR)
— Unverified 0RealitySummary: Exploring On-Demand Mixed Reality Text Summarization and Question Answering using Large Language Models May 28, 2024 Document Enhancement Mixed Reality
— Unverified 0Notes on Applicability of GPT-4 to Document Understanding May 28, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities May 25, 2024 Boundary Detection Optical Character Recognition
— Unverified 0Focus Anywhere for Fine-grained Multi-page Document Understanding May 23, 2024 document understanding Optical Character Recognition (OCR)
Code Code Available 5Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2From Text to Pixel: Advancing Long-Context Understanding in MLLMs May 23, 2024 Language Modeling Language Modelling
Code Code Available 1Transfer Learning Approach for Railway Technical Map (RTM) Component Identification May 21, 2024 Management object-detection
— Unverified 0GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding May 6, 2024 Contrastive Learning document understanding
Code Code Available 0Callico: a Versatile Open-Source Document Image Annotation Platform May 2, 2024 Document Layout Analysis HTR
— Unverified 0CREPE: Coordinate-Aware End-to-End Document Parser May 1, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents Apr 30, 2024 8k Diversity
Code Code Available 0Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 0ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images Apr 29, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Apr 25, 2024 4k Language Modeling
— Unverified 0Mixed Text Recognition with Efficient Parameter Fine-Tuning and Transformer Apr 19, 2024 Decoder Optical Character Recognition
— Unverified 0Improvement in Semantic Address Matching using Natural Language Processing Apr 17, 2024 Optical Character Recognition (OCR)
— Unverified 0MathWriting: A Dataset For Handwritten Mathematical Expression Recognition Apr 16, 2024 Form Optical Character Recognition (OCR)
— Unverified 0