| Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs | Apr 16, 2024 | Long-Context UnderstandingToken Reduction | CodeCode Available | 1 |
| ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers | Jun 14, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 1 |
| Token Cropr: Faster ViTs for Quite a Few Tasks | Dec 1, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters | Nov 5, 2024 | Token ReductionVisual Reasoning | CodeCode Available | 1 |
| Window Token Concatenation for Efficient Visual Large Language Models | Apr 5, 2025 | Token Reduction | CodeCode Available | 1 |
| ZipR1: Reinforcing Token Sparsity in MLLMs | Apr 23, 2025 | Token Reduction | —Unverified | 0 |
| AdaFV: Rethinking of Visual-Language alignment for VLM acceleration | Jan 16, 2025 | Token Reduction | —Unverified | 0 |
| Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Jun 5, 2025 | GPUText-to-Video Generation | —Unverified | 0 |
| AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration | Dec 16, 2024 | DenoisingToken Reduction | —Unverified | 0 |
| Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems | Oct 3, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Deploying Foundation Model Powered Agent Services: A Survey | Dec 18, 2024 | modelModel Compression | —Unverified | 0 |
| DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models | May 20, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs | Apr 23, 2025 | Token ReductionVideo Understanding | —Unverified | 0 |
| Dynamic Token Reduction during Generation for Vision Language Models | Jan 24, 2025 | DecoderToken Reduction | —Unverified | 0 |
| EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation | May 16, 2025 | DiversityRAG | —Unverified | 0 |
| Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features | Apr 1, 2025 | Token Reduction | —Unverified | 0 |
| Efficient Multi-modal Large Language Models via Visual Token Grouping | Nov 26, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| FIT-RAG: Black-Box RAG with Factual Information and Token Reduction | Mar 21, 2024 | Open-Domain Question AnsweringQuestion Answering | —Unverified | 0 |
| freePruner: A Training-free Approach for Large Multimodal Model Acceleration | Nov 23, 2024 | QuantizationQuestion Answering | —Unverified | 0 |
| Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method | May 12, 2025 | Semantic CompressionSemantic Similarity | —Unverified | 0 |
| ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition | Dec 21, 2024 | Efficient ViTsToken Reduction | —Unverified | 0 |
| Knowing When to Stop: Dynamic Context Cutoff for Large Language Models | Feb 3, 2025 | Token Reduction | —Unverified | 0 |
| Learning Free Token Reduction for Multi-Modal Large Language Models | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning | Mar 31, 2025 | Semantic SegmentationToken Reduction | —Unverified | 0 |
| MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction | Feb 2, 2025 | HallucinationToken Reduction | —Unverified | 0 |