SOTAVerified

Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Showing 150 of 447 papers

TitleStatusHype
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
TrustRAG: An Information Assistant with Retrieval Augmented GenerationCode5
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented GenerationCode4
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech SeparationCode3
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation SystemCode3
Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical PerceptionCode3
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding ModelsCode3
Real-Time Execution of Action Chunking Flow PoliciesCode3
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM modelsCode2
LumberChunker: Long-Form Narrative Document SegmentationCode2
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time SamplingCode2
Autoregressive Action Sequence Learning for Robotic ManipulationCode2
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question AnsweringCode2
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
DadmaTools: Natural Language Processing Toolkit for Persian LanguageCode2
tsflex: flexible time series processing & feature extractionCode1
TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for NetworkCode1
Unsupervised Technical Domain Terms Extraction using Term ExtractorCode1
Semi-supervised Multitask Learning for Sequence LabelingCode1
Recurrent Attention Networks for Long-text ModelingCode1
Attamba: Attending To Multi-Token StatesCode1
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask GenerationCode1
Review highlights: opinion mining on reviews: a hybrid model for rule selection in aspect extractionCode1
S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic AnalysisCode1
Sparse Modular Activation for Efficient Sequence ModelingCode1
Automated Concatenation of Embeddings for Structured PredictionCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Recurrent Chunking Mechanisms for Long-Text Machine Reading ComprehensionCode1
On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message PassingCode1
Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling TasksCode1
Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP StandardsCode1
Capturing Global Informativeness in Open Domain Keyphrase ExtractionCode1
Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation SystemCode1
Paradigm Shift in Natural Language ProcessingCode1
Fast and Accurate Factual Inconsistency Detection Over Long DocumentsCode1
AIN: Fast and Accurate Sequence Labeling with Approximate Inference NetworkCode1
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document EmbeddingsCode1
Dataset Decomposition: Faster LLM Training with Variable Sequence Length CurriculumCode1
Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP DocumentsCode1
ChordMixer: A Scalable Neural Attention Model for Sequences with Different LengthsCode1
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control CommunicationsCode1
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data DiversityCode1
NetKet 3: Machine Learning Toolbox for Many-Body Quantum SystemsCode1
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringCode1
Improving Named Entity Recognition by External Context Retrieving and Cooperative LearningCode1
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsCode1
An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese0
An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences0
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACEExact Span F197.3Unverified
2BERT-CRF (Replicated in AdaSeq)Exact Span F197.18Unverified
3ELMo + MAT + Multi-TaskExact Span F197.04Unverified
4CVT+Multi-Task+LargeExact Span F196.98Unverified
5ELMo + Multi-TaskExact Span F196.83Unverified
6FlairExact Span F196.72Unverified
7SeqVATExact Span F195.45Unverified
8Adversarial TrainingExact Span F195.25Unverified
9BiLSTM-CRFExact Span F195.18Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF1 score97.3Unverified
2Flair embeddingsF1 score96.72Unverified
3JMTF1 score95.77Unverified
4Low supervisionF1 score95.57Unverified
5IntNet + BiLSTM-CRFF1 score95.29Unverified
6Suzuki and IsozakiF1 score95.15Unverified
7NCRF++F1 score95.06Unverified
8BI-LSTM-CRF (Senna) (ours)F1 score94.46Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF195Unverified
2Wang et al., 2020F194.4Unverified
3AINF194.04Unverified
#ModelMetricClaimedVerifiedStatus
1Wang et al., 2020F192Unverified
2AINF191.71Unverified
#ModelMetricClaimedVerifiedStatus
1Def2VecAUC93.07Unverified