SOTAVerified

Token Reduction

Papers

Showing 125 of 78 papers

TitleStatusHype
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration0
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings0
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models0
FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language ModelsCode1
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training0
Not All Tokens Are What You Need In ThinkingCode0
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to MultimodalityCode3
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention MechanismsCode1
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning0
Streamline Without Sacrifice -- Squeeze out Computation Redundancy in LMMCode1
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference0
EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation0
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT AccelerationCode0
Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method0
ZipR1: Reinforcing Token Sparsity in MLLMs0
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
Dynamic Compressing Prompts for Efficient Inference of Large Language ModelsCode0
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsCode2
Window Token Concatenation for Efficient Visual Large Language ModelsCode1
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.