Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 4925 papers

Title	Date	Tasks	Status	Hype
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jul 11, 2024	GPUQuantization	CodeCode Available	12
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System	Feb 8, 2025	DecoderLanguage Modeling	CodeCode Available	11
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models	Dec 13, 2024	In-Context LearningQuantization	CodeCode Available	11
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning	Aug 10, 2024	HallucinationOptical Character Recognition	CodeCode Available	11
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens	Jul 7, 2024	Language ModellingLarge Language Model	CodeCode Available	11
OpenVLA: An Open-Source Vision-Language-Action Model	Jun 13, 2024	Imitation LearningLanguage Modelling	CodeCode Available	9
SageAttention2++: A More Efficient Implementation of SageAttention2	May 27, 2025	QuantizationVideo Generation	CodeCode Available	7
Chinese-Vicuna: A Chinese Instruction-following Llama-based Model	Apr 17, 2025	Code GenerationCPU	CodeCode Available	7
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization	Nov 17, 2024	Image GenerationQuantization	CodeCode Available	7
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	Oct 10, 2024	4kImage Animation	CodeCode Available	7
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration	Oct 3, 2024	Image GenerationQuantization	CodeCode Available	7
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration	Apr 24, 2024	ManagementPrompt Engineering	CodeCode Available	7
Chronos: Learning the Language of Time Series	Mar 12, 2024	Gaussian ProcessesLanguage Modeling	CodeCode Available	7
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations	Jan 3, 2024	DiversityQuantization	CodeCode Available	7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7
SqueezeLLM: Dense-and-Sparse Quantization	Jun 13, 2023	GPUQuantization	CodeCode Available	6
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration	Jun 1, 2023	Autonomous DrivingCloud Computing	CodeCode Available	6
QLoRA: Efficient Finetuning of Quantized LLMs	May 23, 2023	ChatbotGPU	CodeCode Available	6
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models	Nov 18, 2022	Quantization	CodeCode Available	6
GLM-130B: An Open Bilingual Pre-trained Model	Oct 5, 2022	Language ModelingLanguage Modelling	CodeCode Available	6
Quantized Training of Gradient Boosting Decision Trees	Jul 20, 2022	Quantization	CodeCode Available	6
SCBench: A KV Cache-Centric Analysis of Long-Context Methods	Dec 13, 2024	MambaQuantization	CodeCode Available	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Aug 21, 2024	GPUQuantization	CodeCode Available	5

Show:10 25 50

← PrevPage 1 of 197Next →

All datasets ImageNet CIFAR-10 Wiki-40B AgeDB-30 CFP-FP COCO (Common Objects in Context)IJB-B IJB-C Knowledge-based:LFW

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	FQ-ViT (ViT-L)	Top-1 Accuracy (%)	85.03	—	Unverified
2	FQ-ViT (ViT-B)	Top-1 Accuracy (%)	83.31	—	Unverified
3	FQ-ViT (Swin-B)	Top-1 Accuracy (%)	82.97	—	Unverified
4	FQ-ViT (Swin-S)	Top-1 Accuracy (%)	82.71	—	Unverified
5	FQ-ViT (DeiT-B)	Top-1 Accuracy (%)	81.2	—	Unverified
6	FQ-ViT (Swin-T)	Top-1 Accuracy (%)	80.51	—	Unverified
7	FQ-ViT (DeiT-S)	Top-1 Accuracy (%)	79.17	—	Unverified
8	Xception W8A8	Top-1 Accuracy (%)	78.97	—	Unverified
9	ADLIK-MO-ResNet50-W4A4	Top-1 Accuracy (%)	77.88	—	Unverified
10	ADLIK-MO-ResNet50-W3A4	Top-1 Accuracy (%)	77.34	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_3	MAP	160,327.04	—	Unverified
2	DTQ	MAP	0.79	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	OutEffHop-Bert_base	Perplexity	6.3	—	Unverified
2	OutEffHop-Bert_base	Perplexity	6.21	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	98.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	92.92	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SSD ResNet50 V1 FPN 640x640	MAP	34.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	95.13	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		TAR @ FAR=1e-4	96.38	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	3DCNN_VIVA_5	All	84,809,664	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1		Accuracy	99.8	—	Unverified