Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 447 papers

Title	Date	Tasks	Status	Hype
Two eyes, Two views, and finally, One summary! Towards Multi-modal Multi-tasking Knowledge-Infused Medical Dialogue Summarization	Jul 21, 2024	ChunkingConversation Summarization	CodeCode Available	0
CUSIDE-array: A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations	Jul 13, 2024	Chunkingspeech-recognition	—Unverified	0
Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations	Jul 5, 2024	ChunkingFew-Shot Learning	CodeCode Available	0
LumberChunker: Long-Form Narrative Document Segmentation	Jun 25, 2024	ChunkingForm	CodeCode Available	2
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs	Jun 21, 2024	4kChunking	—Unverified	0
Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System	Jun 21, 2024	ChunkingContact-rich Manipulation	CodeCode Available	1
Leveraging Large Language Models for Web Scraping	Jun 12, 2024	ChunkingRAG	—Unverified	0
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation	Jun 10, 2024	ChunkingSpeech Separation	CodeCode Available	3
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models	Jun 3, 2024	ChunkingMamba	CodeCode Available	2
Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation	Jun 1, 2024	ChunkingRAG	CodeCode Available	0
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum	May 21, 2024	2k8k	CodeCode Available	1
Equipping Transformer with Random-Access Reading for Long-Context Understanding	May 21, 2024	ChunkingLong-Context Understanding	—Unverified	0
PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4	May 21, 2024	Chunkingvalid	—Unverified	0
ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers	May 9, 2024	ChunkingImitation Learning	—Unverified	0
Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning	May 6, 2024	ChunkingNavigate	CodeCode Available	0
Multi-view Content-aware Indexing for Long Document Retrieval	Apr 23, 2024	ChunkingQuestion Answering	—Unverified	0
Improving Retrieval for RAG based Question Answering Models on Financial Documents	Mar 23, 2024	ChunkingQuestion Answering	—Unverified	0
Opening the black box of language acquisition	Feb 18, 2024	ChunkingLanguage Acquisition	CodeCode Available	0
BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models	Feb 18, 2024	ChunkingLanguage Modeling	—Unverified	0
Grounding Language Model with Chunking-Free In-Context Retrieval	Feb 15, 2024	ChunkingLanguage Modeling	—Unverified	0
Punctuation Restoration Improves Structure Understanding Without Supervision	Feb 13, 2024	ChunkingLanguage Modeling	CodeCode Available	0
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT	Feb 12, 2024	BenchmarkingChunking	—Unverified	0
Financial Report Chunking for Effective Retrieval Augmented Generation	Feb 5, 2024	Chunkingdocument understanding	CodeCode Available	0
Def2Vec: Extensible Word Embeddings from Dictionary Definitions	Dec 16, 2023	Chunkingnamed-entity-recognition	CodeCode Available	0
Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models	Nov 27, 2023	Chunkingcoreference-resolution	—Unverified	0

Show:10 25 50

← PrevPage 5 of 18Next →

All datasets CoNLL-2000 Penn Treebank CoNLL 2003 (German)CoNLL 2003 (English)CoNLL 2003

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	ACE	Exact Span F1	97.3	—	Unverified
2	BERT-CRF (Replicated in AdaSeq)	Exact Span F1	97.18	—	Unverified
3	ELMo + MAT + Multi-Task	Exact Span F1	97.04	—	Unverified
4	CVT+Multi-Task+Large	Exact Span F1	96.98	—	Unverified
5	ELMo + Multi-Task	Exact Span F1	96.83	—	Unverified
6	Flair	Exact Span F1	96.72	—	Unverified
7	SeqVAT	Exact Span F1	95.45	—	Unverified
8	Adversarial Training	Exact Span F1	95.25	—	Unverified
9	BiLSTM-CRF	Exact Span F1	95.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1 score	97.3	—	Unverified
2	Flair embeddings	F1 score	96.72	—	Unverified
3	JMT	F1 score	95.77	—	Unverified
4	Low supervision	F1 score	95.57	—	Unverified
5	IntNet + BiLSTM-CRF	F1 score	95.29	—	Unverified
6	Suzuki and Isozaki	F1 score	95.15	—	Unverified
7	NCRF++	F1 score	95.06	—	Unverified
8	BI-LSTM-CRF (Senna) (ours)	F1 score	94.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ACE	F1	95	—	Unverified
2	Wang et al., 2020	F1	94.4	—	Unverified
3	AIN	F1	94.04	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Wang et al., 2020	F1	92	—	Unverified
2	AIN	F1	91.71	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Def2Vec	AUC	93.07	—	Unverified