| LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | Mar 19, 2024 | GSM8KLanguage Modelling | CodeCode Available | 9 |
| Recurrent Context Compression: Efficiently Expanding the Context Window of LLM | Jun 10, 2024 | Long-Context UnderstandingQuestion Answering | CodeCode Available | 2 |
| Neural Retrievers are Biased Towards LLM-Generated Content | Oct 31, 2023 | Information RetrievalRetrieval | CodeCode Available | 1 |
| A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models | Aug 20, 2017 | GPUText Compression | CodeCode Available | 1 |
| FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Sep 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| LLMZip: Lossless Text Compression using Large Language Models | Jun 6, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| TokAlign: Efficient Vocabulary Adaptation via Token Alignment | Jun 4, 2025 | SentenceText Compression | CodeCode Available | 1 |
| L3TC: Leveraging RWKV for Learned Lossless Low-Complexity Text Compression | Dec 21, 2024 | Data CompressionText Compression | CodeCode Available | 1 |
| BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training | Sep 6, 2024 | Text Compression | CodeCode Available | 1 |
| Sequential Recurrent Neural Networks for Language Modeling | Mar 23, 2017 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An Enhanced Text Compression Approach Using Transformer-based Language Models | Dec 15, 2024 | de-enText Compression | —Unverified | 0 |
| A Neural Network Approach for Mixing Language Models | Aug 23, 2017 | Text Compression | —Unverified | 0 |
| Approximating Human-Like Few-shot Learning with GPT-based Compression | Aug 14, 2023 | Data CompressionFew-Shot Learning | —Unverified | 0 |
| A study for Image compression using Re-Pair algorithm | Jan 30, 2019 | Image CompressionText Compression | —Unverified | 0 |
| Beyond Text Compression: Evaluating Tokenizers Across Scales | Jun 3, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression | Sep 7, 2019 | Language ModellingReading Comprehension | —Unverified | 0 |
| EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression | Aug 25, 2023 | Keyphrase ExtractionLanguage Modeling | —Unverified | 0 |
| Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method | May 12, 2025 | Semantic CompressionSemantic Similarity | —Unverified | 0 |
| Measuring Information Distortion in Hierarchical Ultra long Novel Generation:The Optimal Expansion Ratio | May 18, 2025 | Text Compression | —Unverified | 0 |
| Optimal alphabet for single text compression | Jan 13, 2022 | Text Compression | —Unverified | 0 |
| Semantic Text Compression for Classification | Sep 19, 2023 | ClassificationDecoder | —Unverified | 0 |
| Text Compression-aided Transformer Encoding | Feb 11, 2021 | Text Compression | —Unverified | 0 |
| Text Compression for Efficient Language Generation | Mar 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Text Compression for Sentiment Analysis via Evolutionary Algorithms | Sep 20, 2017 | Data CompressionEvolutionary Algorithms | —Unverified | 0 |
| Theoretical Analysis of Byte-Pair Encoding | Nov 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |