Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 4925 papers

Title	Date	Tasks	Status	Hype	Score
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation	Mar 27, 2025	Image GenerationQuantization	CodeCode Available	2	5
HAQ: Hardware-Aware Automated Quantization with Mixed Precision	Nov 21, 2018	QuantizationReinforcement Learning	CodeCode Available	2	5
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices	Mar 9, 2021	BIG-bench Machine LearningDiagnostic	CodeCode Available	2	5
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration	Apr 3, 2025	GPUQuantization	CodeCode Available	2	5
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%	Jun 17, 2024	image-classificationImage Classification	CodeCode Available	2	5
An Empirical Study of Qwen3 Quantization	May 4, 2025	Natural Language UnderstandingQuantization	CodeCode Available	2	5
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance	May 11, 2025	Language ModelingLanguage Modelling	CodeCode Available	2	5
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect	Mar 6, 2024	Quantization	CodeCode Available	2	5
Similarity search in the blink of an eye with compressed indices	Apr 7, 2023	Quantization	CodeCode Available	2	5
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval	Jul 17, 2024	DecoderImage Enhancement	CodeCode Available	2	5
Efficient LLM Inference on CPUs	Nov 1, 2023	Quantization	CodeCode Available	2	5
An empirical study of LLaMA3 quantization: from LLMs to MLLMs	Apr 22, 2024	Language ModellingLarge Language Model	CodeCode Available	2	5
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax	Apr 29, 2025	Quantization	CodeCode Available	2	5
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference	Jul 4, 2022	Quantization	CodeCode Available	2	5
Compressing Large Language Models using Low Rank and Low Precision Decomposition	May 29, 2024	Quantization	CodeCode Available	2	5
AnalogNAS-Bench: A NAS Benchmark for Analog In-Memory Computing	Jun 23, 2025	Neural Architecture SearchQuantization	CodeCode Available	2	5
From Tiny Machine Learning to Tiny Deep Learning: A Survey	Jun 21, 2025	AutoMLModel Optimization	CodeCode Available	2	5
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting	Jan 26, 2025	Quantization	CodeCode Available	2	5
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching	May 26, 2025	QuantizationSpeech Enhancement	CodeCode Available	2	5
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention	Feb 8, 2024	MMLUQuantization	CodeCode Available	2	5
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference	Jan 13, 2021	Code GenerationDeep Learning	CodeCode Available	2	5
Compressing Volumetric Radiance Fields to 1 MB	Nov 29, 2022	Model CompressionNeRF	CodeCode Available	2	5
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM	Mar 8, 2024	Quantization	CodeCode Available	2	5
CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification	Mar 14, 2024	ClassificationCrowd Counting	CodeCode Available	2	5
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization	Nov 30, 2023	3DGSNeRF	CodeCode Available	2	5

Show:10 25 50

← PrevPage 10 of 197Next →

All datasets ImageNet CIFAR-10 Wiki-40B AgeDB-30 CFP-FP COCO (Common Objects in Context)IJB-B IJB-C Knowledge-based:LFW

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	FQ-ViT (ViT-L)	Top-1 Accuracy (%)	85.03	—	Unverified
2	FQ-ViT (ViT-B)	Top-1 Accuracy (%)	83.31	—	Unverified
3	FQ-ViT (Swin-B)	Top-1 Accuracy (%)	82.97	—	Unverified
4	FQ-ViT (Swin-S)	Top-1 Accuracy (%)	82.71	—	Unverified
5	FQ-ViT (DeiT-B)	Top-1 Accuracy (%)	81.2	—	Unverified
6	FQ-ViT (Swin-T)	Top-1 Accuracy (%)	80.51	—	Unverified
7	FQ-ViT (DeiT-S)	Top-1 Accuracy (%)	79.17	—	Unverified
8	Xception W8A8	Top-1 Accuracy (%)	78.97	—	Unverified
9	ADLIK-MO-ResNet50-W4A4	Top-1 Accuracy (%)	77.88	—	Unverified
10	ADLIK-MO-ResNet50-W3A4	Top-1 Accuracy (%)	77.34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_3	MAP	160,327.04	—	Unverified
2	DTQ	MAP	0.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	OutEffHop-Bert_base	Perplexity	6.3	—	Unverified
2	OutEffHop-Bert_base	Perplexity	6.21	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	98.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	92.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SSD ResNet50 V1 FPN 640x640	MAP	34.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	95.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	96.38	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_5	All	84,809,664	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	99.8	—	Unverified