Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design May 2, 2024 Model Compression Neural Network Compression
Code Code Available 2Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey May 1, 2024 Quantization
Code Code Available 2An empirical study of LLaMA3 quantization: from LLMs to MLLMs Apr 22, 2024 Language Modelling Large Language Model
Code Code Available 2decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points Apr 19, 2024 Quantization
Code Code Available 2MAexp: A Generic Platform for RL-based Multi-Agent Exploration Apr 19, 2024 Diversity Multi-agent Reinforcement Learning
Code Code Available 2AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval Apr 9, 2024 All Information Retrieval
Code Code Available 2AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution Apr 4, 2024 Image Super-Resolution Quantization
Code Code Available 2Efficient Multi-Vector Dense Retrieval Using Bit Vectors Apr 3, 2024 Quantization Retrieval
Code Code Available 2Transformer based Pluralistic Image Completion with Reduced Information Loss Mar 31, 2024 Decoder Image Inpainting
Code Code Available 2Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Mar 19, 2024 Quantization
Code Code Available 2CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification Mar 14, 2024 Classification Crowd Counting
Code Code Available 2GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM Mar 8, 2024 Quantization
Code Code Available 2QAQ: Quality Adaptive Quantization for LLM KV Cache Mar 7, 2024 Quantization Question Answering
Code Code Available 2ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Mar 6, 2024 Quantization
Code Code Available 2Evaluating Quantized Large Language Models Feb 28, 2024 Mamba Quantization
Code Code Available 2Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech Feb 26, 2024 Quantization Speech Enhancement
Code Code Available 2Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs Feb 16, 2024 Quantization
Code Code Available 2BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains Feb 15, 2024 Few-Shot Learning Medical Question Answering
Code Code Available 2QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference Feb 15, 2024 GPU Quantization
Code Code Available 2Accurate LoRA-Finetuning Quantization of LLMs via Information Retention Feb 8, 2024 MMLU Quantization
Code Code Available 2QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning Feb 6, 2024 Image Generation Model Compression
Code Code Available 2LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection Jan 29, 2024 3D Object Detection Autonomous Vehicles
Code Code Available 2Residual Quantization with Implicit Neural Codebooks Jan 26, 2024 Data Compression Quantization
Code Code Available 2LeanVec: Searching vectors faster by making them fit Dec 26, 2023 Cross-Modal Retrieval Dimensionality Reduction
Code Code Available 2TinySAM: Pushing the Envelope for Efficient Segment Anything Model Dec 21, 2023 Knowledge Distillation Quantization
Code Code Available 2StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Dec 17, 2023 Quantization Singing Voice Synthesis
Code Code Available 2ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks Dec 14, 2023 Abstractive Text Summarization Code Generation
Code Code Available 2CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization Nov 30, 2023 3DGS NeRF
Code Code Available 2LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS Nov 28, 2023 Knowledge Distillation NeRF
Code Code Available 2Compact 3D Gaussian Representation for Radiance Field Nov 22, 2023 3DGS Model Compression
Code Code Available 2Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication Nov 16, 2023 Quantization
Code Code Available 2Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation Nov 15, 2023 Quantization Recommendation Systems
Code Code Available 2Efficient LLM Inference on CPUs Nov 1, 2023 Quantization
Code Code Available 2Atom: Low-bit Quantization for Efficient and Accurate LLM Serving Oct 29, 2023 GPU Quantization
Code Code Available 2LLM-FP4: 4-Bit Floating-Point Quantized Transformers Oct 25, 2023 Common Sense Reasoning Quantization
Code Code Available 2BitNet: Scaling 1-bit Transformers for Large Language Models Oct 17, 2023 Language Modeling Language Modelling
Code Code Available 2LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Oct 12, 2023 Natural Language Understanding Quantization
Code Code Available 2ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers Sep 28, 2023 GPU Instruction Following
Code Code Available 2Transformer-VQ: Linear-Time Transformers via Vector Quantization Sep 28, 2023 8k Decoder
Code Code Available 2EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization Sep 20, 2023 Knowledge Distillation object-detection
Code Code Available 2SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models Aug 31, 2023 Decoder Language Modeling
Code Code Available 2OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Aug 25, 2023 Common Sense Reasoning Computational Efficiency
Code Code Available 2Dataset Quantization Aug 21, 2023 Dataset Distillation object-detection
Code Code Available 2QuIP: 2-Bit Quantization of Large Language Models With Guarantees Jul 25, 2023 Quantization
Code Code Available 2SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression Jun 5, 2023 GPU Language Modelling
Code Code Available 2MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training May 31, 2023 Language Modelling Quantization
Code Code Available 2EMP-SSL: Towards Self-Supervised Learning in One Training Epoch Apr 8, 2023 Quantization Self-Supervised Learning
Code Code Available 2Similarity search in the blink of an eye with compressed indices Apr 7, 2023 Quantization
Code Code Available 2Binarized Neural Machine Translation Feb 9, 2023 Binarization Machine Translation
Code Code Available 2Q-Diffusion: Quantizing Diffusion Models Feb 8, 2023 Image Generation Noise Estimation
Code Code Available 2