Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 447 papers

Title	Date	Tasks	Status	Hype	Score
Liger Kernel: Efficient Triton Kernels for LLM Training	Oct 14, 2024	ChunkingGPU	CodeCode Available	9	5
TrustRAG: An Information Assistant with Retrieval Augmented Generation	Feb 19, 2025	Answer GenerationChunking	CodeCode Available	5	5
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success	Feb 27, 2025	Action GenerationChunking	CodeCode Available	5	5
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation	Aug 8, 2024	ChunkingFact Checking	CodeCode Available	4	5
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models	Sep 7, 2024	ChunkingRetrieval	CodeCode Available	3	5
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System	Mar 12, 2025	ChunkingComputational Efficiency	CodeCode Available	3	5
Real-Time Execution of Action Chunking Flow Policies	Jun 9, 2025	ChunkingVision-Language-Action	CodeCode Available	3	5
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception	Oct 16, 2024	Binary ClassificationChunking	CodeCode Available	3	5
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation	Jun 10, 2024	ChunkingSpeech Separation	CodeCode Available	3	5
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree	Jun 18, 2025	ChunkingCode Generation	CodeCode Available	2	5
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models	Jun 3, 2024	ChunkingMamba	CodeCode Available	2	5
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling	Aug 30, 2024	Chunking	CodeCode Available	2	5
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering	Oct 23, 2024	ChunkingQuestion Answering	CodeCode Available	2	5
DadmaTools: Natural Language Processing Toolkit for Persian Language	Jul 1, 2022	ChunkingConstituency Parsing	CodeCode Available	2	5
Autoregressive Action Sequence Learning for Robotic Manipulation	Oct 4, 2024	ChunkingLanguage Modeling	CodeCode Available	2	5
LumberChunker: Long-Form Narrative Document Segmentation	Jun 25, 2024	ChunkingForm	CodeCode Available	2	5
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning	Jun 12, 2025	Answer GenerationChunking	CodeCode Available	2	5
tsflex: flexible time series processing & feature extraction	Nov 24, 2021	ChunkingTime Series	CodeCode Available	1	5
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network	Nov 4, 2024	ChunkingLanguage Modelling	CodeCode Available	1	5
Unsupervised Technical Domain Terms Extraction using Term Extractor	Jan 22, 2021	ChunkingTask 2	CodeCode Available	1	5
Fast and Accurate Factual Inconsistency Detection Over Long Documents	Oct 19, 2023	ChunkingNatural Language Inference	CodeCode Available	1	5
Review highlights: opinion mining on reviews: a hybrid model for rule selection in aspect extraction	Oct 18, 2017	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	CodeCode Available	1	5
AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network	Sep 17, 2020	ChunkingVariational Inference	CodeCode Available	1	5
S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis	Jan 8, 2025	ArticlesChunking	CodeCode Available	1	5
Sparse Modular Activation for Efficient Sequence Modeling	Jun 19, 2023	ChunkingLanguage Modeling	CodeCode Available	1	5
Semi-supervised Multitask Learning for Sequence Labeling	Apr 24, 2017	ChunkingGrammatical Error Detection	CodeCode Available	1	5
Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension	May 16, 2020	ChunkingMachine Reading Comprehension	CodeCode Available	1	5
Paradigm Shift in Natural Language Processing	Sep 26, 2021	ChunkingNER	CodeCode Available	1	5
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos	Mar 9, 2025	Action LocalizationBoundary Detection	CodeCode Available	1	5
On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message Passing	Jan 4, 2025	ChunkingImputation	CodeCode Available	1	5
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs	Feb 25, 2025	BenchmarkingChunking	CodeCode Available	1	5
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity	Oct 16, 2024	ChunkingDiversity	CodeCode Available	1	5
NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems	Dec 20, 2021	BIG-bench Machine LearningChunking	CodeCode Available	1	5
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering	May 26, 2025	ChunkingLarge Language Model	CodeCode Available	1	5
Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP Documents	Jan 20, 2025	ChunkingRAG	CodeCode Available	1	5
Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System	Jun 21, 2024	ChunkingContact-rich Manipulation	CodeCode Available	1	5
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths	Jun 12, 2022	ChunkingDocument Classification	CodeCode Available	1	5
Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards	Aug 21, 2024	ChunkingComputational Efficiency	CodeCode Available	1	5
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications	Oct 12, 2021	Action DetectionActivity Detection	CodeCode Available	1	5
Capturing Global Informativeness in Open Domain Keyphrase Extraction	Apr 28, 2020	ChunkingInformativeness	CodeCode Available	1	5
Automated Concatenation of Embeddings for Structured Prediction	Oct 10, 2020	Aspect ExtractionChunking	CodeCode Available	1	5
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings	May 30, 2025	ChunkingComputational Efficiency	CodeCode Available	1	5
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum	May 21, 2024	2k8k	CodeCode Available	1	5
Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning	May 8, 2021	Chinese Named Entity RecognitionChunking	CodeCode Available	1	5
Recurrent Attention Networks for Long-text Modeling	Jun 12, 2023	Chunking	CodeCode Available	1	5
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks	Jul 21, 2017	ChunkingEvent Detection	CodeCode Available	1	5
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation	May 22, 2025	Chunking	CodeCode Available	1	5
Attamba: Attending To Multi-Token States	Nov 26, 2024	ChunkingState Space Models	CodeCode Available	1	5
FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling	Apr 5, 2025	ChunkingNature-Inspired Optimization Algorithm	CodeCode Available	0	5
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP	Jun 1, 2019	ChunkingNamed Entity Recognition (NER)	CodeCode Available	0	5

Show:10 25 50

← PrevPage 1 of 9Next →

All datasets CoNLL-2000 Penn Treebank CoNLL 2003 (German)CoNLL 2003 (English)CoNLL 2003

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACE	Exact Span F1	97.3	—	Unverified
2	BERT-CRF (Replicated in AdaSeq)	Exact Span F1	97.18	—	Unverified
3	ELMo + MAT + Multi-Task	Exact Span F1	97.04	—	Unverified
4	CVT+Multi-Task+Large	Exact Span F1	96.98	—	Unverified
5	ELMo + Multi-Task	Exact Span F1	96.83	—	Unverified
6	Flair	Exact Span F1	96.72	—	Unverified
7	SeqVAT	Exact Span F1	95.45	—	Unverified
8	Adversarial Training	Exact Span F1	95.25	—	Unverified
9	BiLSTM-CRF	Exact Span F1	95.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1 score	97.3	—	Unverified
2	Flair embeddings	F1 score	96.72	—	Unverified
3	JMT	F1 score	95.77	—	Unverified
4	Low supervision	F1 score	95.57	—	Unverified
5	IntNet + BiLSTM-CRF	F1 score	95.29	—	Unverified
6	Suzuki and Isozaki	F1 score	95.15	—	Unverified
7	NCRF++	F1 score	95.06	—	Unverified
8	BI-LSTM-CRF (Senna) (ours)	F1 score	94.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1	95	—	Unverified
2	Wang et al., 2020	F1	94.4	—	Unverified
3	AIN	F1	94.04	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Wang et al., 2020	F1	92	—	Unverified
2	AIN	F1	91.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Def2Vec	AUC	93.07	—	Unverified