SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 651700 of 4925 papers

TitleStatusHype
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank ResidualsCode1
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
Autoregressive Video Generation without Vector QuantizationCode4
Self-control: A Better Conditional Mechanism for Masked Autoregressive Model0
On the Compression of Language Models for Code: An Empirical Study on CodeBERT0
More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression0
VidTok: A Versatile and Open-Source Video TokenizerCode3
Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting0
Fast and Slow Gradient Approximation for Binary Neural Network OptimizationCode0
Quantifying Climate Change Impacts on Renewable Energy Generation: A Super-Resolution Recurrent Diffusion Model0
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models0
FinLoRA: Finetuning Quantized Financial Large Language Models Using Low-Rank Adaptation0
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation0
Relation-Guided Adversarial Learning for Data-free Knowledge TransferCode1
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion ModelsCode1
VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression0
Nanoscaling Floating-Point (NxFP): NanoMantissa, Adaptive Microexponents, and Code Recycling for Direct-Cast Compression of Large Language Models0
ProFe: Communication-Efficient Decentralized Federated Learning via Distillation and Prototypes0
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs0
Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its DeploymentCode0
Enhancing Off-Grid One-Bit DOA Estimation with Learning-Based Sparse Bayesian Approach for Non-Uniform Sparse Array0
Adaptive Quantization Resolution and Power Control for Federated Learning over Cell-free Networks0
TinySubNets: An efficient and low capacity continual learning strategyCode0
Memory-Efficient 4-bit Preconditioned Stochastic Optimization0
Progressive Compression with Universally Quantized Diffusion Models0
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens0
VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization0
TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation0
MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization0
SCBench: A KV Cache-Centric Analysis of Long-Context MethodsCode5
Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity0
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language ModelsCode11
DQA: An Efficient Method for Deep Quantization of Deep Neural Network Activations0
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal DictionariesCode1
CRVQ: Channel-relaxed Vector Quantization for Extreme Compression of LLMs0
Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices0
On Round-Off Errors and Gaussian Blur in Superresolution and in Image Registration0
Breaking the Bias: Recalibrating the Attention of Industrial Anomaly Detection0
TurboAttention: Efficient Attention Approximation For High Throughputs LLMs0
Low-Rank Correction for Quantized LLMs0
Machine learning-driven conservative-to-primitive conversion in hybrid piecewise polytropic and tabulated equations of state0
Post-Training Non-Uniform Quantization for Convolutional Neural Networks0
QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex0
FP=xINT:A Low-Bit Series Expansion Algorithm for Post-Training Quantization0
Compression for Better: A General and Stable Lossless Compression Framework0
Federated Split Learning with Model Pruning and Gradient Quantization in Wireless Networks0
Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion0
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization0
Fuzzy Norm-Explicit Product Quantization for Recommender Systems0
SizeGS: Size-aware Compression of 3D Gaussians with Hierarchical Mixed Precision Quantization0
Show:102550
← PrevPage 14 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified