SOTAVerified

Token Reduction

Papers

Showing 150 of 78 papers

TitleStatusHype
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language UnderstandingCode3
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal ModelsCode2
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token ReductionCode2
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token PruningCode2
Bridging Local Details and Global Context in Text-Attributed GraphsCode1
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersCode1
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision TransformersCode1
Which Tokens to Use? Investigating Token Reduction in Vision TransformersCode1
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
TR-BERT: Dynamic Token Reduction for Accelerating BERT InferenceCode1
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMsCode1
Faster Vision Mamba is Rebuilt in Minutes via Merged Token Re-trainingCode1
Token Cropr: Faster ViTs for Quite a Few TasksCode1
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
Rethinking Token Reduction for State Space ModelsCode1
PuMer: Pruning and Merging Tokens for Efficient Vision Language ModelsCode1
Window Token Concatenation for Efficient Visual Large Language ModelsCode1
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
Enhancing Multimodal Large Language Models Complex Reason via Similarity ComputationCode1
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language ModelCode1
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced PerformanceCode1
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMsCode1
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Inference Optimal VLMs Need Fewer Visual Tokens and More ParametersCode1
AdaViT: Adaptive Tokens for Efficient Vision TransformerCode1
Dynamic Compressing Prompts for Efficient Inference of Large Language ModelsCode0
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT AccelerationCode0
BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt CompressionCode0
Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion ModelCode0
Cross-Layer Cache Aggregation for Token Reduction in Ultra-Fine-Grained Image RecognitionCode0
Faster Parameter-Efficient Tuning with Token Redundancy ReductionCode0
HaltingVT: Adaptive Token Halting Transformer for Efficient Video RecognitionCode0
Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 TokensCode0
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision TransformersCode0
Not All Tokens Are What You Need In ThinkingCode0
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level TasksCode0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method0
freePruner: A Training-free Approach for Large Multimodal Model Acceleration0
Local Information Matters: Inference Acceleration For Grounded Conversation Generation Models Through Adaptive Local-Aware Token Pruning0
FIT-RAG: Black-Box RAG with Factual Information and Token Reduction0
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction0
AdaFV: Rethinking of Visual-Language alignment for VLM acceleration0
Efficient Multi-modal Large Language Models via Visual Token Grouping0
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.