| IntellectSeeker: A Personalized Literature Management System with the Probabilistic Model and Large Language Model | Dec 10, 2024 | ArticlesFew-Shot Learning | CodeCode Available | 0 |
| Theoretical Analysis of Byte-Pair Encoding | Nov 13, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | Sep 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| AlphaZip: Neural Network-Enhanced Lossless Text Compression | Sep 23, 2024 | BenchmarkingData Compression | CodeCode Available | 0 |
| BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training | Sep 6, 2024 | Text Compression | CodeCode Available | 1 |
| XCompress: LLM assisted Python-based text compression toolkit | Aug 12, 2024 | BenchmarkingLanguage Modeling | CodeCode Available | 0 |
| Recurrent Context Compression: Efficiently Expanding the Context Window of LLM | Jun 10, 2024 | Long-Context UnderstandingQuestion Answering | CodeCode Available | 2 |
| Variational Bayesian Methods for a Tree-Structured Stick-Breaking Process Mixture of Gaussians by Application of the Bayes Codes for Context Tree Models | May 1, 2024 | Computational EfficiencyText Compression | —Unverified | 0 |
| LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | Mar 19, 2024 | GSM8KLanguage Modelling | CodeCode Available | 9 |
| Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance | Mar 10, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |