| Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | May 29, 2025 | Handwritten Mathmatical Expression RecognitionLanguage Modeling | CodeCode Available | 1 |
| TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance | May 29, 2025 | Image Super-ResolutionOptical Character Recognition | —Unverified | 0 |
| MT^3: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning | May 26, 2025 | document understandingMachine Translation | —Unverified | 0 |
| Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation | May 25, 2025 | Anomaly DetectionHomography Estimation | —Unverified | 0 |
| How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads | May 21, 2025 | Optical Character RecognitionOptical Character Recognition (OCR) | —Unverified | 0 |
| Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR | May 20, 2025 | ArticlesImage Super-Resolution | —Unverified | 0 |
| Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? | May 19, 2025 | Logical ReasoningOptical Character Recognition | CodeCode Available | 1 |
| LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images? | May 18, 2025 | Logical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline | May 16, 2025 | Abstractive Text SummarizationLanguage Modeling | CodeCode Available | 0 |
| PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language | May 15, 2025 | BenchmarkingOptical Character Recognition | CodeCode Available | 0 |