| Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception | Oct 16, 2024 | Binary ClassificationChunking | CodeCode Available | 3 |
| EAFormer: Scene Text Segmentation with Edge-Aware Transformers | Jul 24, 2024 | DecoderSegmentation | CodeCode Available | 3 |
| Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation | Jan 31, 2024 | Hierarchical Text Segmentationparameter-efficient fine-tuning | CodeCode Available | 3 |
| ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations | Feb 16, 2025 | Text Segmentation | CodeCode Available | 1 |
| Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model | Jan 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| WAS: Dataset and Methods for Artistic Text Segmentation | Jul 31, 2024 | DecoderDiversity | CodeCode Available | 1 |
| Filtered Semi-Markov CRF | Nov 29, 2023 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network | Jun 13, 2023 | DecoderSegmentation | CodeCode Available | 1 |
| CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization | May 27, 2023 | BinarizationImage Enhancement | CodeCode Available | 1 |
| Three-stage binarization of color document images based on discrete wavelet transform and generative adversarial networks | Nov 29, 2022 | AvgBinarization | CodeCode Available | 1 |
| DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding | Nov 28, 2022 | object-detectionObject Detection | CodeCode Available | 1 |
| Self-supervised Character-to-Character Distillation for Text Recognition | Nov 1, 2022 | Data AugmentationRepresentation Learning | CodeCode Available | 1 |
| Toward Unifying Text Segmentation and Long Document Summarization | Oct 28, 2022 | ArticlesDocument Summarization | CodeCode Available | 1 |
| Self-supervised Implicit Glyph Attention for Text Recognition | Mar 7, 2022 | Scene Text RecognitionText Segmentation | CodeCode Available | 1 |
| WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition | Oct 7, 2021 | Label Error DetectionOptical Character Recognition | CodeCode Available | 1 |
| Structural Text Segmentation of Legal Documents | Dec 7, 2020 | Change DetectionInformation Retrieval | CodeCode Available | 1 |
| Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach | Nov 27, 2020 | SegmentationStyle Transfer | CodeCode Available | 1 |
| Chapter Captor: Text Segmentation in Novels | Nov 9, 2020 | SegmentationText Segmentation | CodeCode Available | 1 |
| Text Segmentation by Cross Segment Attention | Apr 30, 2020 | Discourse SegmentationInformation Retrieval | CodeCode Available | 1 |
| Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation | Jan 3, 2020 | Cross-Lingual Word EmbeddingsMulti-Task Learning | CodeCode Available | 1 |
| CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases | Oct 27, 2016 | Joint Entity and Relation ExtractionRelation | CodeCode Available | 1 |
| Khmer Word Segmentation Using Conditional Random Fields | Oct 15, 2015 | SegmentationText Segmentation | CodeCode Available | 1 |
| The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation | Jun 10, 2025 | Text Segmentation | —Unverified | 0 |
| BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation | May 22, 2025 | SegmentationText Segmentation | —Unverified | 0 |
| BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law | May 21, 2025 | Answer GenerationQuestion Answering | —Unverified | 0 |