Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 447 papers

Title	Date	Tasks	Status	Score
SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation	Oct 15, 2024	ChunkingRAG	CodeCode Available	5
Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation	Apr 28, 2025	ChunkingRAG	CodeCode Available	5
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs	Oct 22, 2024	ChunkingHallucination	CodeCode Available	5
Punctuation Restoration Improves Structure Understanding Without Supervision	Feb 13, 2024	ChunkingLanguage Modeling	CodeCode Available	5
Opening the black box of language acquisition	Feb 18, 2024	ChunkingLanguage Acquisition	CodeCode Available	5
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning	May 17, 2025	AllChunking	CodeCode Available	5
Query-Based Keyphrase Extraction from Long Documents	May 11, 2022	ChunkingKeyphrase Extraction	CodeCode Available	5
Open Information Extraction via Chunks	May 5, 2023	ChunkingOpen Information Extraction	CodeCode Available	5
Augmenting Neural Networks with First-order Logic	Jun 14, 2019	ChunkingNatural Language Inference	CodeCode Available	5
Chunking Historical German	May 1, 2021	ChunkingPOS	CodeCode Available	5
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering	Feb 15, 2025	ChunkingInformation Retrieval	CodeCode Available	5
Neural Models for Sequence Chunking	Jan 15, 2017	ChunkingNatural Language Understanding	CodeCode Available	5
A Tree Search Algorithm for Sequence Labeling	Apr 29, 2018	ChunkingDecision Making	CodeCode Available	5
Neural Sequence Segmentation as Determining the Leftmost Segments	Apr 15, 2021	ChunkingPart-Of-Speech Tagging	CodeCode Available	5
NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit	Aug 24, 2017	Chunkingnamed-entity-recognition	CodeCode Available	5
Semi-supervised sequence tagging with bidirectional language models	Apr 29, 2017	Chunkingnamed-entity-recognition	CodeCode Available	5
Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation	Jun 1, 2024	ChunkingRAG	CodeCode Available	5
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks	Nov 5, 2016	ChunkingMulti-Task Learning	CodeCode Available	5
LLM-TA: An LLM-Enhanced Thematic Analysis Pipeline for Transcripts from Parents of Children with Congenital Heart Disease	Feb 3, 2025	ChunkingPrompt Engineering	CodeCode Available	5
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation	Mar 14, 2025	Abstractive Text SummarizationChunking	CodeCode Available	5
Named Entity Recognition in Tweets: An Experimental Study	Jul 1, 2011	Chunkingnamed-entity-recognition	CodeCode Available	5
Large-scale image segmentation based on distributed clustering algorithms	Jun 21, 2021	ChunkingClustering	CodeCode Available	5
Keystroke dynamics as signal for shallow syntactic parsing	Oct 11, 2016	CCG SupertaggingChunking	CodeCode Available	5
Language-Agnostic Syllabification with Neural Sequence Labeling	Sep 29, 2019	Chunkingnamed-entity-recognition	CodeCode Available	5
J2N -- Nominal Adjective Identification and its Application	Sep 22, 2024	Chunkingcoreference-resolution	CodeCode Available	5
Building Odia Shallow Parser	Apr 19, 2022	ChunkingMachine Translation	CodeCode Available	5
Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese Geographic Re-Ranking	Sep 4, 2023	ChunkingMulti-Task Learning	CodeCode Available	5
Natural Language Processing (almost) from Scratch	Mar 2, 2011	Chunkingnamed-entity-recognition	CodeCode Available	5
Boundary-based MWE segmentation with text partitioning	Aug 5, 2016	ChunkingSegmentation	CodeCode Available	5
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP	Jun 1, 2019	ChunkingNamed Entity Recognition (NER)	CodeCode Available	5
Integrating Supertag Features into Neural Discontinuous Constituent Parsing	Oct 11, 2024	ChunkingDependency Parsing	CodeCode Available	5
CAG: Chunked Augmented Generation for Google Chrome's Built-in Gemini Nano	Dec 24, 2024	Chunking	CodeCode Available	5
Financial Report Chunking for Effective Retrieval Augmented Generation	Feb 5, 2024	Chunkingdocument understanding	CodeCode Available	5
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease	Mar 6, 2025	ChunkingLanguage Modeling	CodeCode Available	5
A Feature-Rich Vietnamese Named-Entity Recognition Model	Mar 12, 2018	Chunkingmodel	CodeCode Available	5
Large scale visual place recognition with sub-linear storage growth	Oct 23, 2018	Chunkingfeature selection	CodeCode Available	5
Fine-Grained Error Analysis and Fair Evaluation of Labeled Spans	Jun 1, 2022	ChunkingNER	CodeCode Available	5
FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling	Apr 5, 2025	ChunkingNature-Inspired Optimization Algorithm	CodeCode Available	5
Experiential Explanations for Reinforcement Learning	Oct 10, 2022	Chunkingcounterfactual	CodeCode Available	5
Evaluation of Word Vector Representations by Subspace Alignment	Sep 1, 2015	ChunkingNamed Entity Recognition (NER)	CodeCode Available	5
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models	Jun 1, 2025	ChunkingMulti-hop Question Answering	CodeCode Available	5
Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?	Nov 22, 2017	ChunkingNER	CodeCode Available	5
Evaluating Relaxations of Logic for Neural Networks: A Comprehensive Study	Jul 28, 2021	ChunkingInductive Bias	CodeCode Available	5
ChuLo: Chunk-Level Key Information Representation for Long Document Processing	Oct 14, 2024	ChunkingClassification	CodeCode Available	5
Gated Task Interaction Framework for Multi-task Sequence Tagging	Sep 29, 2019	ChunkingMulti-Task Learning	CodeCode Available	5
BIRA: Improved Predictive Exchange Word Clustering	Jun 1, 2016	ChunkingClustering	CodeCode Available	5
Design Challenges and Misconceptions in Neural Sequence Labeling	Jun 12, 2018	ChunkingMisconceptions	CodeCode Available	5
Chunking: Continual Learning is not just about Distribution Shift	Oct 3, 2023	ChunkingContinual Learning	CodeCode Available	5
Def2Vec: Extensible Word Embeddings from Dictionary Definitions	Dec 16, 2023	Chunkingnamed-entity-recognition	CodeCode Available	5
Discourse Sense Classification from Scratch using Focused RNNs	Aug 1, 2016	ChunkingClassification	CodeCode Available	5

Show:10 25 50

← PrevPage 2 of 9Next →

All datasets CoNLL-2000 Penn Treebank CoNLL 2003 (German)CoNLL 2003 (English)CoNLL 2003

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACE	Exact Span F1	97.3	—	Unverified
2	BERT-CRF (Replicated in AdaSeq)	Exact Span F1	97.18	—	Unverified
3	ELMo + MAT + Multi-Task	Exact Span F1	97.04	—	Unverified
4	CVT+Multi-Task+Large	Exact Span F1	96.98	—	Unverified
5	ELMo + Multi-Task	Exact Span F1	96.83	—	Unverified
6	Flair	Exact Span F1	96.72	—	Unverified
7	SeqVAT	Exact Span F1	95.45	—	Unverified
8	Adversarial Training	Exact Span F1	95.25	—	Unverified
9	BiLSTM-CRF	Exact Span F1	95.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1 score	97.3	—	Unverified
2	Flair embeddings	F1 score	96.72	—	Unverified
3	JMT	F1 score	95.77	—	Unverified
4	Low supervision	F1 score	95.57	—	Unverified
5	IntNet + BiLSTM-CRF	F1 score	95.29	—	Unverified
6	Suzuki and Isozaki	F1 score	95.15	—	Unverified
7	NCRF++	F1 score	95.06	—	Unverified
8	BI-LSTM-CRF (Senna) (ours)	F1 score	94.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1	95	—	Unverified
2	Wang et al., 2020	F1	94.4	—	Unverified
3	AIN	F1	94.04	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Wang et al., 2020	F1	92	—	Unverified
2	AIN	F1	91.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Def2Vec	AUC	93.07	—	Unverified