SOTAVerified

Token Reduction

Papers

Showing 150 of 78 papers

TitleStatusHype
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token ReductionCode2
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal ModelsCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Which Tokens to Use? Investigating Token Reduction in Vision TransformersCode1
Rethinking Token Reduction for State Space ModelsCode1
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMsCode1
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-trainingCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
AdaViT: Adaptive Tokens for Efficient Vision TransformerCode1
Bridging Local Details and Global Context in Text-Attributed GraphsCode1
PuMer: Pruning and Merging Tokens for Efficient Vision Language ModelsCode1
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceCode1
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision TransformersCode1
TR-BERT: Dynamic Token Reduction for Accelerating BERT InferenceCode1
Enhancing Multimodal Large Language Models Complex Reason via Similarity ComputationCode1
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language ModelCode1
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMsCode1
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersCode1
Token Cropr: Faster ViTs for Quite a Few TasksCode1
Inference Optimal VLMs Need Fewer Visual Tokens and More ParametersCode1
Window Token Concatenation for Efficient Visual Large Language ModelsCode1
ZipR1: Reinforcing Token Sparsity in MLLMs0
AdaFV: Rethinking of Visual-Language alignment for VLM acceleration0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration0
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems0
Deploying Foundation Model Powered Agent Services: A Survey0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
Dynamic Token Reduction during Generation for Vision Language Models0
EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation0
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features0
Efficient Multi-modal Large Language Models via Visual Token Grouping0
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction0
freePruner: A Training-free Approach for Large Multimodal Model Acceleration0
Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method0
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition0
Knowing When to Stop: Dynamic Context Cutoff for Large Language Models0
Learning Free Token Reduction for Multi-Modal Large Language Models0
Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning0
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.