SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 14011450 of 4925 papers

TitleStatusHype
GSB: Group Superposition Binarization for Vision Transformer with Limited Training SamplesCode0
Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model TuningCode0
Deep Neural Network Compression with Single and Multiple Level QuantizationCode0
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to GiantCode0
Genie: Show Me the Data for QuantizationCode0
A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained SettingsCode0
Deep Metric Learning to RankCode0
General Point Model Pretraining with Autoencoding and AutoregressiveCode0
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMsCode0
GT-SVQ: A Linear-Time Graph Transformer for Node Classification Using Spiking Vector QuantizationCode0
Deep Log-Likelihood Ratio QuantizationCode0
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image ClassificationCode0
Deep Learning with Low Precision by Half-wave Gaussian QuantizationCode0
Generalized Learning Vector Quantization for Classification in Randomized Neural Networks and Hyperdimensional ComputingCode0
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge DeploymentCode0
GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language ModelsCode0
An Integrated Approach to Produce Robust Models with High EfficiencyCode0
Effective Communication with Dynamic Feature CompressionCode0
A Comprehensive Evaluation of Quantization Strategies for Large Language ModelsCode0
Deep Learning-Based Quantization of L-Values for Gray-Coded ModulationCode0
Adaptive Computation Modules: Granular Conditional Computation For Efficient InferenceCode0
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model PerformanceCode0
FTT-NAS: Discovering Fault-Tolerant Convolutional Neural ArchitectureCode0
Generalized Relevance Learning Grassmann QuantizationCode0
Guetzli: Perceptually Guided JPEG EncoderCode0
FP4DiT: Towards Effective Floating Point Quantization for Diffusion TransformersCode0
Autoregressive Co-Training for Learning Discrete Speech RepresentationsCode0
Deep Learning as a Mixed Convex-Combinatorial Optimization ProblemCode0
Floating-Point Quantization Analysis of Multi-Layer Perceptron Artificial Neural NetworksCode0
FlexRound: Learnable Rounding based on Element-wise Division for Post-Training QuantizationCode0
FLoCoRA: Federated learning compression with low-rank adaptationCode0
Foundations of Large Language Model Compression -- Part 1: Weight QuantizationCode0
Deep Image Compression via End-to-End LearningCode0
Focused Quantization for Sparse CNNsCode0
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative ModelsCode0
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAsCode0
Flexible framework for audio reconstructionCode0
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow AvoidanceCode0
Deep Hashing via Householder QuantizationCode0
Flexible Mixed Precision Quantization for Learned Image CompressionCode0
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-designCode0
Filtering Empty Camera Trap Images in Embedded SystemsCode0
Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian ProcessesCode0
Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data HeterogeneityCode0
FEDZIP: A Compression Framework for Communication-Efficient Federated LearningCode0
Deep Convolutional AutoEncoder-based Lossy Image CompressionCode0
Federated Learning via Plurality VoteCode0
Find the Lady: Permutation and Re-Synchronization of Deep Neural NetworksCode0
Efficient Cross-Modal Retrieval via Deep Binary Hashing and QuantizationCode0
Deep Compressive Autoencoder for Action Potential Compression in Large-Scale Neural RecordingCode0
Show:102550
← PrevPage 29 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified