SOTAVerified

Token Reduction

Papers

Showing 125 of 78 papers

TitleStatusHype
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token ReductionCode2
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal ModelsCode2
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
AdaViT: Adaptive Tokens for Efficient Vision TransformerCode1
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceCode1
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
PuMer: Pruning and Merging Tokens for Efficient Vision Language ModelsCode1
Rethinking Token Reduction for State Space ModelsCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision TransformersCode1
Inference Optimal VLMs Need Fewer Visual Tokens and More ParametersCode1
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMsCode1
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Bridging Local Details and Global Context in Text-Attributed GraphsCode1
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersCode1
Enhancing Multimodal Large Language Models Complex Reason via Similarity ComputationCode1
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language ModelCode1
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-trainingCode1
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.