SOTAVerified

Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Showing 150 of 447 papers

TitleStatusHype
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
TrustRAG: An Information Assistant with Retrieval Augmented GenerationCode5
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented GenerationCode4
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding ModelsCode3
Real-Time Execution of Action Chunking Flow PoliciesCode3
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical PerceptionCode3
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation SystemCode3
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech SeparationCode3
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM modelsCode2
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time SamplingCode2
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question AnsweringCode2
DadmaTools: Natural Language Processing Toolkit for Persian LanguageCode2
Autoregressive Action Sequence Learning for Robotic ManipulationCode2
LumberChunker: Long-Form Narrative Document SegmentationCode2
tsflex: flexible time series processing & feature extractionCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Unsupervised Technical Domain Terms Extraction using Term ExtractorCode1
Sparse Modular Activation for Efficient Sequence ModelingCode1
S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic AnalysisCode1
Recurrent Chunking Mechanisms for Long-Text Machine Reading ComprehensionCode1
Semi-supervised Multitask Learning for Sequence LabelingCode1
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for NetworkCode1
Review highlights: opinion mining on reviews: a hybrid model for rule selection in aspect extractionCode1
Fast and Accurate Factual Inconsistency Detection Over Long DocumentsCode1
AIN: Fast and Accurate Sequence Labeling with Approximate Inference NetworkCode1
Paradigm Shift in Natural Language ProcessingCode1
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling TasksCode1
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsCode1
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document EmbeddingsCode1
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data DiversityCode1
NetKet 3: Machine Learning Toolbox for Many-Body Quantum SystemsCode1
Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP DocumentsCode1
Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation SystemCode1
ChordMixer: A Scalable Neural Attention Model for Sequences with Different LengthsCode1
Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP StandardsCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
Capturing Global Informativeness in Open Domain Keyphrase ExtractionCode1
Automated Concatenation of Embeddings for Structured PredictionCode1
Improving Named Entity Recognition by External Context Retrieving and Cooperative LearningCode1
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringCode1
On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message PassingCode1
Dataset Decomposition: Faster LLM Training with Variable Sequence Length CurriculumCode1
Recurrent Attention Networks for Long-text ModelingCode1
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask GenerationCode1
Attamba: Attending To Multi-Token StatesCode1
Fine-Grained Error Analysis and Fair Evaluation of Labeled SpansCode0
FLAIR: An Easy-to-Use Framework for State-of-the-Art NLPCode0
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACEExact Span F197.3Unverified
2BERT-CRF (Replicated in AdaSeq)Exact Span F197.18Unverified
3ELMo + MAT + Multi-TaskExact Span F197.04Unverified
4CVT+Multi-Task+LargeExact Span F196.98Unverified
5ELMo + Multi-TaskExact Span F196.83Unverified
6FlairExact Span F196.72Unverified
7SeqVATExact Span F195.45Unverified
8Adversarial TrainingExact Span F195.25Unverified
9BiLSTM-CRFExact Span F195.18Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF1 score97.3Unverified
2Flair embeddingsF1 score96.72Unverified
3JMTF1 score95.77Unverified
4Low supervisionF1 score95.57Unverified
5IntNet + BiLSTM-CRFF1 score95.29Unverified
6Suzuki and IsozakiF1 score95.15Unverified
7NCRF++F1 score95.06Unverified
8BI-LSTM-CRF (Senna) (ours)F1 score94.46Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF195Unverified
2Wang et al., 2020F194.4Unverified
3AINF194.04Unverified
#ModelMetricClaimedVerifiedStatus
1Wang et al., 2020F192Unverified
2AINF191.71Unverified
#ModelMetricClaimedVerifiedStatus
1Def2VecAUC93.07Unverified