SOTAVerified

Chunking

Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.

Example:

| Vinken | , | 61 | years | old | | --- | ---| --- | --- | --- | | B-NLP| I-NP | I-NP | I-NP | I-NP |

Papers

Showing 150 of 447 papers

TitleStatusHype
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling0
CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs0
CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation0
Can LLMs Replace Humans During Code Chunking?0
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax TreeCode2
Chunk Twice, Embed Once: A Systematic Study of Segmentation and Representation Trade-offs in Chemistry-Aware Retrieval-Augmented Generation0
TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document ReasoningCode2
Knowledge Compression via Question Generation: Enhancing Multihop Document Retrieval without Fine-tuning0
Real-Time Execution of Action Chunking Flow PoliciesCode3
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language ModelsCode0
LID Models are Actually Accent Classifiers: Implications and Solutions for LID on Accented Speech0
Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document EmbeddingsCode1
Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning0
Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset AnalysisCode0
NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question AnsweringCode1
Retrieval-Augmented Generation for Service Discovery: Chunking Strategies and Benchmarking0
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask GenerationCode1
HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation0
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement LearningCode0
Concept-Guided Interpretability via Neural Chunking0
Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency0
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation0
A New HOPE: Domain-agnostic Automatic Evaluation of Text Chunking0
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs0
CHORUS: Zero-shot Hierarchical Retrieval and Orchestration for Generating Linear Programming Code0
Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented GenerationCode0
A Baseline for Self-state Identification and Classification in Mental Health Data: CLPsych 2025 Task0
Bridging Industrial Expertise and XR with LLM-Powered Conversational Agents0
FlexChunk: Enabling 100M×100M Out-of-Core SpMV (~1.8 min, ~1.7 GB RAM) with Near-Linear ScalingCode0
Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers0
Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment0
ParallelFlow: Parallelizing Linear Transformers via Flow Discretization0
Text Chunking for Document Classification for Urban System Management using Large Language ModelsCode0
Niyama : Breaking the Silos of LLM Inference Serving0
CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation0
SLIDE: Sliding Localized Information for Document Extraction0
Learning Bimanual Manipulation via Action Chunking and Inter-Arm Coordination with Transformers0
AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications0
From Dionysius Emerges Apollo -- Learning Patterns and Abstractions from Perceptual Sequences0
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generationCode0
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation SystemCode3
The Pitfalls of Imitation Learning when Actions are Continuous0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
KidneyTalk-open: No-code Deployment of a Private Large Language Model with Medical Documentation-Enhanced Knowledge Database for Kidney DiseaseCode0
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns0
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking0
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding0
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and SuccessCode5
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMsCode1
TrustRAG: An Information Assistant with Retrieval Augmented GenerationCode5
Show:102550
← PrevPage 1 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ACEExact Span F197.3Unverified
2BERT-CRF (Replicated in AdaSeq)Exact Span F197.18Unverified
3ELMo + MAT + Multi-TaskExact Span F197.04Unverified
4CVT+Multi-Task+LargeExact Span F196.98Unverified
5ELMo + Multi-TaskExact Span F196.83Unverified
6FlairExact Span F196.72Unverified
7SeqVATExact Span F195.45Unverified
8Adversarial TrainingExact Span F195.25Unverified
9BiLSTM-CRFExact Span F195.18Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF1 score97.3Unverified
2Flair embeddingsF1 score96.72Unverified
3JMTF1 score95.77Unverified
4Low supervisionF1 score95.57Unverified
5IntNet + BiLSTM-CRFF1 score95.29Unverified
6Suzuki and IsozakiF1 score95.15Unverified
7NCRF++F1 score95.06Unverified
8BI-LSTM-CRF (Senna) (ours)F1 score94.46Unverified
#ModelMetricClaimedVerifiedStatus
1ACEF195Unverified
2Wang et al., 2020F194.4Unverified
3AINF194.04Unverified
#ModelMetricClaimedVerifiedStatus
1Wang et al., 2020F192Unverified
2AINF191.71Unverified
#ModelMetricClaimedVerifiedStatus
1Def2VecAUC93.07Unverified