SOTAVerified

Token Reduction

Papers

Showing 150 of 78 papers

TitleStatusHype
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token ReductionCode2
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal ModelsCode2
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
Window Token Concatenation for Efficient Visual Large Language ModelsCode1
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceCode1
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-trainingCode1
Enhancing Multimodal Large Language Models Complex Reason via Similarity ComputationCode1
Token Cropr: Faster ViTs for Quite a Few TasksCode1
Inference Optimal VLMs Need Fewer Visual Tokens and More ParametersCode1
Rethinking Token Reduction for State Space ModelsCode1
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language ModelCode1
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMsCode1
Bridging Local Details and Global Context in Text-Attributed GraphsCode1
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersCode1
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMsCode1
Which Tokens to Use? Investigating Token Reduction in Vision TransformersCode1
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision TransformersCode1
PuMer: Pruning and Merging Tokens for Efficient Vision Language ModelsCode1
AdaViT: Adaptive Tokens for Efficient Vision TransformerCode1
TR-BERT: Dynamic Token Reduction for Accelerating BERT InferenceCode1
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration0
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models0
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training0
Not All Tokens Are What You Need In ThinkingCode0
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference0
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT AccelerationCode0
EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation0
Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method0
ZipR1: Reinforcing Token Sparsity in MLLMs0
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
Dynamic Compressing Prompts for Efficient Inference of Large Language ModelsCode0
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features0
Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning0
Faster Parameter-Efficient Tuning with Token Redundancy ReductionCode0
Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models0
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.