Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 4925 papers

Title	Date	Tasks	Status	Hype
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jul 11, 2024	GPUQuantization	CodeCode Available	12
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System	Feb 8, 2025	DecoderLanguage Modeling	CodeCode Available	11
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models	Dec 13, 2024	In-Context LearningQuantization	CodeCode Available	11
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning	Aug 10, 2024	HallucinationOptical Character Recognition	CodeCode Available	11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens	Jul 7, 2024	Language ModellingLarge Language Model	CodeCode Available	11
OpenVLA: An Open-Source Vision-Language-Action Model	Jun 13, 2024	Imitation LearningLanguage Modelling	CodeCode Available	9
SageAttention2++: A More Efficient Implementation of SageAttention2	May 27, 2025	QuantizationVideo Generation	CodeCode Available	7
Chinese-Vicuna: A Chinese Instruction-following Llama-based Model	Apr 17, 2025	Code GenerationCPU	CodeCode Available	7
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	Nov 17, 2024	Image GenerationQuantization	CodeCode Available	7
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	Oct 10, 2024	4kImage Animation	CodeCode Available	7
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Oct 3, 2024	Image GenerationQuantization	CodeCode Available	7
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration	Apr 24, 2024	ManagementPrompt Engineering	CodeCode Available	7
Chronos: Learning the Language of Time Series	Mar 12, 2024	Gaussian ProcessesLanguage Modeling	CodeCode Available	7
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Jan 3, 2024	DiversityQuantization	CodeCode Available	7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7
SqueezeLLM: Dense-and-Sparse Quantization	Jun 13, 2023	GPUQuantization	CodeCode Available	6
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration	Jun 1, 2023	Autonomous DrivingCloud Computing	CodeCode Available	6
QLoRA: Efficient Finetuning of Quantized LLMs	May 23, 2023	ChatbotGPU	CodeCode Available	6
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models	Nov 18, 2022	Quantization	CodeCode Available	6
GLM-130B: An Open Bilingual Pre-trained Model	Oct 5, 2022	Language ModelingLanguage Modelling	CodeCode Available	6
Quantized Training of Gradient Boosting Decision Trees	Jul 20, 2022	Quantization	CodeCode Available	6
SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Dec 13, 2024	MambaQuantization	CodeCode Available	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Aug 21, 2024	GPUQuantization	CodeCode Available	5
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients	Jul 11, 2024	Quantization	CodeCode Available	5
Autoregressive Image Generation without Vector Quantization	Jun 17, 2024	Image GenerationQuantization	CodeCode Available	5
SpinQuant: LLM quantization with learned rotations	May 26, 2024	Quantization	CodeCode Available	5
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression	May 23, 2024	Quantization	CodeCode Available	5
SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks	Apr 15, 2024	Quantization	CodeCode Available	5
Extreme Compression of Large Language Models via Additive Quantization	Jan 11, 2024	CPUGPU	CodeCode Available	5
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving	Oct 11, 2023	Language ModelingLanguage Modelling	CodeCode Available	5
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models	Sep 26, 2023	Quantization	CodeCode Available	5
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation	Sep 19, 2022	DecoderImage Generation	CodeCode Available	5
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications	Sep 7, 2022	GPUObject Detection	CodeCode Available	5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	Aug 15, 2022	GPULanguage Modelling	CodeCode Available	5
BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster	Apr 3, 2022	AutoMLDistributed Computing	CodeCode Available	5
Scaling Law for Quantization-Aware Training	May 20, 2025	Quantization	CodeCode Available	4
UniTok: A Unified Tokenizer for Visual Generation and Understanding	Feb 27, 2025	Quantization	CodeCode Available	4
Autoregressive Video Generation without Vector Quantization	Dec 18, 2024	Image GenerationPrediction	CodeCode Available	4
Taming Scalable Visual Tokenizer for Autoregressive Image Generation	Dec 3, 2024	Image GenerationImage Reconstruction	CodeCode Available	4
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models	Nov 7, 2024	GPUQuantization	CodeCode Available	4
BitNet a4.8: 4-bit Activations for 1-bit LLMs	Nov 7, 2024	Quantization	CodeCode Available	4
SNAC: Multi-Scale Neural Audio Codec	Oct 18, 2024	Audio CompressionAudio Generation	CodeCode Available	4
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads	Oct 14, 2024	GPUQuantization	CodeCode Available	4
Restructuring Vector Quantization with the Rotation Trick	Oct 8, 2024	Quantization	CodeCode Available	4
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models	Sep 25, 2024	Quantization	CodeCode Available	4
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge	Jun 25, 2024	Computational EfficiencyCPU	CodeCode Available	4
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit	May 9, 2024	BenchmarkingComputational Efficiency	CodeCode Available	4
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	May 7, 2024	GPULanguage Modelling	CodeCode Available	4

Show:10 25 50

← PrevPage 1 of 99Next →

All datasets ImageNet CIFAR-10 Wiki-40B AgeDB-30 CFP-FP COCO (Common Objects in Context)IJB-B IJB-C Knowledge-based:LFW

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	FQ-ViT (ViT-L)	Top-1 Accuracy (%)	85.03	—	Unverified
2	FQ-ViT (ViT-B)	Top-1 Accuracy (%)	83.31	—	Unverified
3	FQ-ViT (Swin-B)	Top-1 Accuracy (%)	82.97	—	Unverified
4	FQ-ViT (Swin-S)	Top-1 Accuracy (%)	82.71	—	Unverified
5	FQ-ViT (DeiT-B)	Top-1 Accuracy (%)	81.2	—	Unverified
6	FQ-ViT (Swin-T)	Top-1 Accuracy (%)	80.51	—	Unverified
7	FQ-ViT (DeiT-S)	Top-1 Accuracy (%)	79.17	—	Unverified
8	Xception W8A8	Top-1 Accuracy (%)	78.97	—	Unverified
9	ADLIK-MO-ResNet50-W4A4	Top-1 Accuracy (%)	77.88	—	Unverified
10	ADLIK-MO-ResNet50-W3A4	Top-1 Accuracy (%)	77.34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_3	MAP	160,327.04	—	Unverified
2	DTQ	MAP	0.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	OutEffHop-Bert_base	Perplexity	6.3	—	Unverified
2	OutEffHop-Bert_base	Perplexity	6.21	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	98.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	92.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SSD ResNet50 V1 FPN 640x640	MAP	34.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	95.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	96.38	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_5	All	84,809,664	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	99.8	—	Unverified