LLM Inference Unveiled: Survey and Roofline Model Insights Feb 26, 2024 Knowledge Distillation Language Modelling
Code Code Available 4BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation Feb 16, 2024 Knowledge Distillation Quantization
Code Code Available 4QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Feb 6, 2024 Quantization
Code Code Available 4Large Language Models for Time Series: A Survey Feb 2, 2024 Quantization Survey
Code Code Available 4Fast Inference of Mixture-of-Experts Language Models with Offloading Dec 28, 2023 Mixture-of-Experts Quantization
Code Code Available 4Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO Nov 8, 2023 Quantization Text Generation
Code Code Available 4Efficient Post-training Quantization with FP8 Formats Sep 26, 2023 image-classification Image Classification
Code Code Available 4Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Sep 11, 2023 Quantization
Code Code Available 4INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation Jun 13, 2023 Language Modeling Language Modelling
Code Code Available 4SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot Jan 2, 2023 Common Sense Reasoning Language Modelling
Code Code Available 4The case for 4-bit precision: k-bit Inference Scaling Laws Dec 19, 2022 Quantization
Code Code Available 4FP8 Formats for Deep Learning Sep 12, 2022 Deep Learning Quantization
Code Code Available 4The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Mar 14, 2022 CPU Quantization
Code Code Available 4Link and code: Fast indexing with graphs and compact regression codes Apr 26, 2018 Image Similarity Search Quantization
Code Code Available 4Billion-scale similarity search with GPUs Feb 28, 2017 GPU Image Similarity Search
Code Code Available 4Polysemous codes Sep 7, 2016 Quantization
Code Code Available 4Highly Compressed Tokenizer Can Generate Without Training Jun 9, 2025 Image Generation Quantization
Code Code Available 3TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation May 8, 2025 Quantization
Code Code Available 3Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs Feb 20, 2025 Quantization
Code Code Available 3ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Feb 4, 2025 Quantization
Code Code Available 3HAC++: Towards 100X Compression of 3D Gaussian Splatting Jan 21, 2025 3DGS Attribute
Code Code Available 3MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Jan 2, 2025 Contrastive Learning Key Detection
Code Code Available 3A Survey on Large Language Model Acceleration based on KV Cache Management Dec 27, 2024 Language Modeling Language Modelling
Code Code Available 3A Survey on Inference Optimization Techniques for Mixture of Experts Models Dec 18, 2024 Computational Efficiency Distributed Computing
Code Code Available 3VidTok: A Versatile and Open-Source Video Tokenizer Dec 17, 2024 Quantization SSIM
Code Code Available 3APOLLO: SGD-like Memory, AdamW-level Performance Dec 6, 2024 GPU Quantization
Code Code Available 3TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Dec 4, 2024 Image Generation Image Reconstruction
Code Code Available 3XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation Dec 2, 2024 Image Reconstruction Quantization
Code Code Available 3Scaling Transformers for Low-Bitrate High-Quality Speech Coding Nov 29, 2024 Quantization
Code Code Available 3Pushing the Limits of Large Language Model Quantization via the Linearity Theorem Nov 26, 2024 GPU Language Modeling
Code Code Available 3Addressing Representation Collapse in Vector Quantized Models with One Linear Layer Nov 4, 2024 Quantization Representation Learning
Code Code Available 3Data Generation for Hardware-Friendly Post-Training Quantization Oct 29, 2024 Data Augmentation GPU
Code Code Available 3COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Oct 25, 2024 Language Modeling Language Modelling
Code Code Available 3DPLM-2: A Multimodal Diffusion Protein Language Model Oct 17, 2024 Language Modeling Language Modelling
Code Code Available 3Latent Action Pretraining from Videos Oct 15, 2024 Quantization Robot Manipulation
Code Code Available 3FlatQuant: Flatness Matters for LLM Quantization Oct 12, 2024 Quantization
Code Code Available 3ImageFolder: Autoregressive Image Generation with Folded Tokens Oct 2, 2024 Image Generation Image Reconstruction
Code Code Available 3TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control Sep 24, 2024 Clustering Language Modelling
Code Code Available 3BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec Sep 9, 2024 Quantization
Code Code Available 3TinyAgent: Function Calling at the Edge Sep 1, 2024 Language Modelling Quantization
Code Code Available 3Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Aug 30, 2024 Audio Compression Audio Generation
Code Code Available 3ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models Aug 16, 2024 GPU Model Compression
Code Code Available 3Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields Aug 7, 2024 3DGS Model Compression
Code Code Available 3Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection Jul 30, 2024 object-detection Object Detection
Code Code Available 3Fast Matrix Multiplications for Lookup Table-Quantized LLMs Jul 15, 2024 Quantization
Code Code Available 3EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Jul 10, 2024 GPU Quantization
Code Code Available 3Image and Video Tokenization with Binary Spherical Quantization Jun 11, 2024 Decoder Image Generation
Code Code Available 3CV-VAE: A Compatible Video VAE for Latent Generative Video Models May 30, 2024 Quantization
Code Code Available 3Ditto: Quantization-aware Secure Inference of Transformers upon MPC May 9, 2024 Quantization
Code Code Available 3MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts Apr 22, 2024 Common Sense Reasoning GPU
Code Code Available 3