| Learning Compact Vision Tokens for Efficient Large Multimodal Models | Jun 8, 2025 | Multimodal ReasoningToken Reduction | CodeCode Available | 1 |
| Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration | Jun 6, 2025 | Depth Estimationobject-detection | —Unverified | 0 |
| Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Jun 5, 2025 | GPUText-to-Video Generation | —Unverified | 0 |
| Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings | Jun 5, 2025 | RetrievalToken Reduction | —Unverified | 0 |
| SiLVR: A Simple Language-based Video Reasoning Framework | May 30, 2025 | MathMME | CodeCode Available | 1 |
| One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory | May 29, 2025 | Contrastive LearningText Retrieval | CodeCode Available | 2 |
| VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models | May 28, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models | May 26, 2025 | Token Reduction | CodeCode Available | 1 |
| The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training | May 25, 2025 | Reinforcement Learning (RL)Token Reduction | —Unverified | 0 |
| Not All Tokens Are What You Need In Thinking | May 23, 2025 | AllToken Reduction | CodeCode Available | 0 |
| Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality | May 23, 2025 | In-Context LearningToken Reduction | CodeCode Available | 3 |
| Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms | May 22, 2025 | Token Reduction | CodeCode Available | 1 |
| Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMM | May 21, 2025 | DecoderToken Reduction | CodeCode Available | 1 |
| DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models | May 20, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference | May 18, 2025 | Token Reduction | —Unverified | 0 |
| EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation | May 16, 2025 | DiversityRAG | —Unverified | 0 |
| Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration | May 16, 2025 | DenoisingToken Reduction | CodeCode Available | 0 |
| Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method | May 12, 2025 | Semantic CompressionSemantic Similarity | —Unverified | 0 |
| ZipR1: Reinforcing Token Sparsity in MLLMs | Apr 23, 2025 | Token Reduction | —Unverified | 0 |
| DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs | Apr 23, 2025 | Token ReductionVideo Understanding | —Unverified | 0 |
| Dynamic Compressing Prompts for Efficient Inference of Large Language Models | Apr 15, 2025 | Token Reduction | CodeCode Available | 0 |
| PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models | Apr 11, 2025 | ClusteringLanguage Modeling | CodeCode Available | 2 |
| Window Token Concatenation for Efficient Visual Large Language Models | Apr 5, 2025 | Token Reduction | CodeCode Available | 1 |
| Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features | Apr 1, 2025 | Token Reduction | —Unverified | 0 |
| Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning | Mar 31, 2025 | Semantic SegmentationToken Reduction | —Unverified | 0 |
| Faster Parameter-Efficient Tuning with Token Redundancy Reduction | Mar 26, 2025 | Token Reduction | CodeCode Available | 0 |
| Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models | Mar 21, 2025 | Computational EfficiencyToken Reduction | —Unverified | 0 |
| Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Mar 14, 2025 | GPUMamba | —Unverified | 0 |
| Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens | Mar 11, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning | Mar 10, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study | Mar 9, 2025 | QuantizationToken Reduction | —Unverified | 0 |
| BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression | Mar 4, 2025 | Large Language ModelMachine Translation | CodeCode Available | 0 |
| Knowing When to Stop: Dynamic Context Cutoff for Large Language Models | Feb 3, 2025 | Token Reduction | —Unverified | 0 |
| MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction | Feb 2, 2025 | HallucinationToken Reduction | —Unverified | 0 |
| Learning Free Token Reduction for Multi-Modal Large Language Models | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Dynamic Token Reduction during Generation for Vision Language Models | Jan 24, 2025 | DecoderToken Reduction | —Unverified | 0 |
| AdaFV: Rethinking of Visual-Language alignment for VLM acceleration | Jan 16, 2025 | Token Reduction | —Unverified | 0 |
| FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance | Jan 5, 2025 | Token Reduction | CodeCode Available | 1 |
| Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks | Jan 1, 2025 | Computational EfficiencyDiversity | CodeCode Available | 0 |
| Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model | Jan 1, 2025 | DenoisingToken Reduction | CodeCode Available | 0 |
| Cross-Layer Cache Aggregation for Token Reduction in Ultra-Fine-Grained Image Recognition | Dec 31, 2024 | Fine-Grained Image RecognitionToken Reduction | CodeCode Available | 0 |
| FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models | Dec 30, 2024 | Question AnsweringToken Reduction | CodeCode Available | 2 |
| ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition | Dec 21, 2024 | Efficient ViTsToken Reduction | —Unverified | 0 |
| Deploying Foundation Model Powered Agent Services: A Survey | Dec 18, 2024 | modelModel Compression | —Unverified | 0 |
| Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-training | Dec 17, 2024 | MambaToken Reduction | CodeCode Available | 1 |
| AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration | Dec 16, 2024 | DenoisingToken Reduction | —Unverified | 0 |
| Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers | Dec 13, 2024 | Token Reduction | CodeCode Available | 0 |
| Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation | Dec 13, 2024 | Token Reduction | CodeCode Available | 1 |
| TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation | Dec 10, 2024 | General KnowledgeText Generation | —Unverified | 0 |