SOTAVerified

Token Reduction

Papers

Showing 150 of 78 papers

TitleStatusHype
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings0
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models0
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training0
Not All Tokens Are What You Need In ThinkingCode0
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning0
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference0
EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation0
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT AccelerationCode0
Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method0
ZipR1: Reinforcing Token Sparsity in MLLMs0
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
Dynamic Compressing Prompts for Efficient Inference of Large Language ModelsCode0
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
Window Token Concatenation for Efficient Visual Large Language ModelsCode1
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features0
Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning0
Faster Parameter-Efficient Tuning with Token Redundancy ReductionCode0
Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models0
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers0
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 TokensCode0
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study0
BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt CompressionCode0
Knowing When to Stop: Dynamic Context Cutoff for Large Language Models0
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction0
Learning Free Token Reduction for Multi-Modal Large Language Models0
Dynamic Token Reduction during Generation for Vision Language Models0
AdaFV: Rethinking of Visual-Language alignment for VLM acceleration0
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceCode1
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level TasksCode0
Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion ModelCode0
Cross-Layer Cache Aggregation for Token Reduction in Ultra-Fine-Grained Image RecognitionCode0
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition0
Deploying Foundation Model Powered Agent Services: A Survey0
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-trainingCode1
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration0
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision TransformersCode0
Enhancing Multimodal Large Language Models Complex Reason via Similarity ComputationCode1
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.