Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 447 papers

Title	Date	Tasks	Status	Hype
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling	Jul 10, 2025	Chunking	—Unverified	0
CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs	Jul 9, 2025	ChunkingRAG	—Unverified	0
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation	Jun 24, 2025	ChunkingVision-Language-Action	—Unverified	0
Can LLMs Replace Humans During Code Chunking?	Jun 24, 2025	Chunking	—Unverified	0
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree	Jun 18, 2025	ChunkingCode Generation	CodeCode Available	2
Chunk Twice, Embed Once: A Systematic Study of Segmentation and Representation Trade-offs in Chemistry-Aware Retrieval-Augmented Generation	Jun 13, 2025	ChunkingRAG	—Unverified	0
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning	Jun 12, 2025	Answer GenerationChunking	CodeCode Available	2
Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning	Jun 9, 2025	ChunkingQuestion Generation	—Unverified	0
Real-Time Execution of Action Chunking Flow Policies	Jun 9, 2025	ChunkingVision-Language-Action	CodeCode Available	3
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models	Jun 1, 2025	ChunkingMulti-hop Question Answering	CodeCode Available	0
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech	May 31, 2025	Chunking	—Unverified	0
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings	May 30, 2025	ChunkingComputational Efficiency	CodeCode Available	1
Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning	May 30, 2025	Chunkinggraph construction	—Unverified	0
Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis	May 27, 2025	ChunkingInformation Retrieval	CodeCode Available	0
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering	May 26, 2025	ChunkingLarge Language Model	CodeCode Available	1
Retrieval-Augmented Generation for Service Discovery: Chunking Strategies and Benchmarking	May 25, 2025	BenchmarkingChunking	—Unverified	0
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation	May 22, 2025	Chunking	CodeCode Available	1
HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation	May 22, 2025	ChunkingDeep Hashing	—Unverified	0
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning	May 17, 2025	AllChunking	CodeCode Available	0
Concept-Guided Interpretability via Neural Chunking	May 16, 2025	Chunking	—Unverified	0
Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency	May 13, 2025	ChunkingRAG	—Unverified	0
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation	May 7, 2025	ChunkingGenre classification	—Unverified	0
A New HOPE: Domain-agnostic Automatic Evaluation of Text Chunking	May 4, 2025	ChunkingRAG	—Unverified	0
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs	May 3, 2025	ChunkingQuestion Answering	—Unverified	0
CHORUS: Zero-shot Hierarchical Retrieval and Orchestration for Generating Linear Programming Code	May 2, 2025	ChunkingCode Generation	—Unverified	0
Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation	Apr 28, 2025	ChunkingRAG	CodeCode Available	0
A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task	Apr 18, 2025	AttributeBinary Classification	—Unverified	0
Bridging Industrial Expertise and XR with LLM-Powered Conversational Agents	Apr 7, 2025	ChunkingRAG	—Unverified	0
FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear Scaling	Apr 5, 2025	ChunkingNature-Inspired Optimization Algorithm	CodeCode Available	0
Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers	Apr 2, 2025	ChunkingImitation Learning	—Unverified	0
Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment	Apr 2, 2025	ChunkingDiagnostic	—Unverified	0
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization	Apr 1, 2025	ChunkingState Space Models	—Unverified	0
Text Chunking for Document Classification for Urban System Management using Large Language Models	Mar 31, 2025	ChunkingDocument Classification	CodeCode Available	0
Niyama : Breaking the Silos of LLM Inference Serving	Mar 28, 2025	ChunkingFairness	—Unverified	0
CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation	Mar 25, 2025	ChunkingRAG	—Unverified	0
SLIDE: Sliding Localized Information for Document Extraction	Mar 23, 2025	Chunkinggraph construction	—Unverified	0
Learning Bimanual Manipulation via Action Chunking and Inter-Arm Coordination with Transformers	Mar 18, 2025	ChunkingImitation Learning	—Unverified	0
AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications	Mar 17, 2025	ChunkingGPU	—Unverified	0
From Dionysius Emerges Apollo -- Learning Patterns and Abstractions from Perceptual Sequences	Mar 14, 2025	Chunking	—Unverified	0
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation	Mar 14, 2025	Abstractive Text SummarizationChunking	CodeCode Available	0
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System	Mar 12, 2025	ChunkingComputational Efficiency	CodeCode Available	3
The Pitfalls of Imitation Learning when Actions are Continuous	Mar 12, 2025	ChunkingImitation Learning	—Unverified	0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos	Mar 9, 2025	Action LocalizationBoundary Detection	CodeCode Available	1
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney Disease	Mar 6, 2025	ChunkingLanguage Modeling	CodeCode Available	0
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns	Mar 5, 2025	Chunking	—Unverified	0
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking	Mar 4, 2025	ChunkingGeneral Knowledge	—Unverified	0
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding	Mar 4, 2025	ChunkingVision-Language-Action	—Unverified	0
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success	Feb 27, 2025	Action GenerationChunking	CodeCode Available	5
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs	Feb 25, 2025	BenchmarkingChunking	CodeCode Available	1
TrustRAG: An Information Assistant with Retrieval Augmented Generation	Feb 19, 2025	Answer GenerationChunking	CodeCode Available	5

Show:10 25 50

← PrevPage 1 of 9Next →

All datasets CoNLL-2000 Penn Treebank CoNLL 2003 (German)CoNLL 2003 (English)CoNLL 2003

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACE	Exact Span F1	97.3	—	Unverified
2	BERT-CRF (Replicated in AdaSeq)	Exact Span F1	97.18	—	Unverified
3	ELMo + MAT + Multi-Task	Exact Span F1	97.04	—	Unverified
4	CVT+Multi-Task+Large	Exact Span F1	96.98	—	Unverified
5	ELMo + Multi-Task	Exact Span F1	96.83	—	Unverified
6	Flair	Exact Span F1	96.72	—	Unverified
7	SeqVAT	Exact Span F1	95.45	—	Unverified
8	Adversarial Training	Exact Span F1	95.25	—	Unverified
9	BiLSTM-CRF	Exact Span F1	95.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1 score	97.3	—	Unverified
2	Flair embeddings	F1 score	96.72	—	Unverified
3	JMT	F1 score	95.77	—	Unverified
4	Low supervision	F1 score	95.57	—	Unverified
5	IntNet + BiLSTM-CRF	F1 score	95.29	—	Unverified
6	Suzuki and Isozaki	F1 score	95.15	—	Unverified
7	NCRF++	F1 score	95.06	—	Unverified
8	BI-LSTM-CRF (Senna) (ours)	F1 score	94.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1	95	—	Unverified
2	Wang et al., 2020	F1	94.4	—	Unverified
3	AIN	F1	94.04	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Wang et al., 2020	F1	92	—	Unverified
2	AIN	F1	91.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Def2Vec	AUC	93.07	—	Unverified