| Liger Kernel: Efficient Triton Kernels for LLM Training | Oct 14, 2024 | ChunkingGPU | CodeCode Available | 9 |
| Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success | Feb 27, 2025 | Action GenerationChunking | CodeCode Available | 5 |
| TrustRAG: An Information Assistant with Retrieval Augmented Generation | Feb 19, 2025 | Answer GenerationChunking | CodeCode Available | 5 |
| Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | Aug 8, 2024 | ChunkingFact Checking | CodeCode Available | 4 |
| Real-Time Execution of Action Chunking Flow Policies | Jun 9, 2025 | ChunkingVision-Language-Action | CodeCode Available | 3 |
| MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | Mar 12, 2025 | ChunkingComputational Efficiency | CodeCode Available | 3 |
| Meta-Chunking: Learning Text Segmentation and Semantic Completion via Logical Perception | Oct 16, 2024 | Binary ClassificationChunking | CodeCode Available | 3 |
| Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models | Sep 7, 2024 | ChunkingRetrieval | CodeCode Available | 3 |
| Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation | Jun 10, 2024 | ChunkingSpeech Separation | CodeCode Available | 3 |
| cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree | Jun 18, 2025 | ChunkingCode Generation | CodeCode Available | 2 |
| TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning | Jun 12, 2025 | Answer GenerationChunking | CodeCode Available | 2 |
| LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | Oct 23, 2024 | ChunkingQuestion Answering | CodeCode Available | 2 |
| Autoregressive Action Sequence Learning for Robotic Manipulation | Oct 4, 2024 | ChunkingLanguage Modeling | CodeCode Available | 2 |
| Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling | Aug 30, 2024 | Chunking | CodeCode Available | 2 |
| LumberChunker: Long-Form Narrative Document Segmentation | Jun 25, 2024 | ChunkingForm | CodeCode Available | 2 |
| Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models | Jun 3, 2024 | ChunkingMamba | CodeCode Available | 2 |
| DadmaTools: Natural Language Processing Toolkit for Persian Language | Jul 1, 2022 | ChunkingConstituency Parsing | CodeCode Available | 2 |
| Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings | May 30, 2025 | ChunkingComputational Efficiency | CodeCode Available | 1 |
| NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering | May 26, 2025 | ChunkingLarge Language Model | CodeCode Available | 1 |
| ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation | May 22, 2025 | Chunking | CodeCode Available | 1 |
| TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Mar 9, 2025 | Action LocalizationBoundary Detection | CodeCode Available | 1 |
| Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs | Feb 25, 2025 | BenchmarkingChunking | CodeCode Available | 1 |
| Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP Documents | Jan 20, 2025 | ChunkingRAG | CodeCode Available | 1 |
| S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis | Jan 8, 2025 | ArticlesChunking | CodeCode Available | 1 |
| On LLM-Enhanced Mixed-Type Data Imputation with High-Order Message Passing | Jan 4, 2025 | ChunkingImputation | CodeCode Available | 1 |
| Attamba: Attending To Multi-Token States | Nov 26, 2024 | ChunkingState Space Models | CodeCode Available | 1 |
| TeleOracle: Fine-Tuned Retrieval-Augmented Generation with Long-Context Support for Network | Nov 4, 2024 | ChunkingLanguage Modelling | CodeCode Available | 1 |
| CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity | Oct 16, 2024 | ChunkingDiversity | CodeCode Available | 1 |
| Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards | Aug 21, 2024 | ChunkingComputational Efficiency | CodeCode Available | 1 |
| Learning Variable Compliance Control From a Few Demonstrations for Bimanual Robot with Haptic Feedback Teleoperation System | Jun 21, 2024 | ChunkingContact-rich Manipulation | CodeCode Available | 1 |
| Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum | May 21, 2024 | 2k8k | CodeCode Available | 1 |
| Fast and Accurate Factual Inconsistency Detection Over Long Documents | Oct 19, 2023 | ChunkingNatural Language Inference | CodeCode Available | 1 |
| Sparse Modular Activation for Efficient Sequence Modeling | Jun 19, 2023 | ChunkingLanguage Modeling | CodeCode Available | 1 |
| Recurrent Attention Networks for Long-text Modeling | Jun 12, 2023 | Chunking | CodeCode Available | 1 |
| ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths | Jun 12, 2022 | ChunkingDocument Classification | CodeCode Available | 1 |
| NetKet 3: Machine Learning Toolbox for Many-Body Quantum Systems | Dec 20, 2021 | BIG-bench Machine LearningChunking | CodeCode Available | 1 |
| tsflex: flexible time series processing & feature extraction | Nov 24, 2021 | ChunkingTime Series | CodeCode Available | 1 |
| BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications | Oct 12, 2021 | Action DetectionActivity Detection | CodeCode Available | 1 |
| Paradigm Shift in Natural Language Processing | Sep 26, 2021 | ChunkingNER | CodeCode Available | 1 |
| Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning | May 8, 2021 | Chinese Named Entity RecognitionChunking | CodeCode Available | 1 |
| Unsupervised Technical Domain Terms Extraction using Term Extractor | Jan 22, 2021 | ChunkingTask 2 | CodeCode Available | 1 |
| Automated Concatenation of Embeddings for Structured Prediction | Oct 10, 2020 | Aspect ExtractionChunking | CodeCode Available | 1 |
| AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network | Sep 17, 2020 | ChunkingVariational Inference | CodeCode Available | 1 |
| Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension | May 16, 2020 | ChunkingMachine Reading Comprehension | CodeCode Available | 1 |
| Capturing Global Informativeness in Open Domain Keyphrase Extraction | Apr 28, 2020 | ChunkingInformativeness | CodeCode Available | 1 |
| Review highlights: opinion mining on reviews: a hybrid model for rule selection in aspect extraction | Oct 18, 2017 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | CodeCode Available | 1 |
| Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks | Jul 21, 2017 | ChunkingEvent Detection | CodeCode Available | 1 |
| Semi-supervised Multitask Learning for Sequence Labeling | Apr 24, 2017 | ChunkingGrammatical Error Detection | CodeCode Available | 1 |
| Dynamic Chunking for End-to-End Hierarchical Sequence Modeling | Jul 10, 2025 | Chunking | —Unverified | 0 |
| CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs | Jul 9, 2025 | ChunkingRAG | —Unverified | 0 |