SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 13511400 of 4925 papers

TitleStatusHype
Harnessing Large Language Models Locally: Empirical Results and Implications for AI PCCode0
A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-OffCode0
Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone NetworksCode0
DQRM: Deep Quantized Recommendation ModelsCode0
HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization PerformanceCode0
Deep Triplet QuantizationCode0
Mirror Descent View for Neural Network QuantizationCode0
GT-SVQ: A Linear-Time Graph Transformer for Node Classification Using Spiking Vector QuantizationCode0
Deep Task-Based Analog-to-Digital ConversionCode0
GSB: Group Superposition Binarization for Vision Transformer with Limited Training SamplesCode0
Mixed-Precision Quantization and Parallel Implementation of Multispectral Riemannian Classification for Brain--Machine InterfacesCode0
Mixed-Precision Quantization for Deep Vision Models with Integer Quadratic ProgrammingCode0
Guetzli: Perceptually Guided JPEG EncoderCode0
Bag of Tricks for Optimizing Transformer EfficiencyCode0
DeepShift: Towards Multiplication-Less Neural NetworksCode0
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing UnitsCode0
GQFedWAvg: Optimization-Based Quantized Federated Learning in General Edge Computing SystemsCode0
Deep reverse tone mappingCode0
Deep residual network for steganalysis of digital imagesCode0
Deep Recurrent Quantization for Generating Sequential Binary CodesCode0
Goten: GPU-Outsourcing Trusted Execution of Neural Network Training and PredictionCode0
Hardening DNNs against Transfer Attacks during Network Compression using Greedy Adversarial PruningCode0
Deep Priority HashingCode0
Deep Optimized Multiple Description Image Coding via Scalar Quantization LearningCode0
Genie: Show Me the Data for QuantizationCode0
General Point Model Pretraining with Autoencoding and AutoregressiveCode0
Generalized Relevance Learning Grassmann QuantizationCode0
Generalized Learning Vector Quantization for Classification in Randomized Neural Networks and Hyperdimensional ComputingCode0
A2Q+: Improving Accumulator-Aware Weight QuantizationCode0
Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model TuningCode0
Deep Neural Network Compression with Single and Multiple Level QuantizationCode0
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to GiantCode0
FTT-NAS: Discovering Fault-Tolerant Convolutional Neural ArchitectureCode0
A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained SettingsCode0
Deep Metric Learning to RankCode0
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware OptimizationCode0
GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language ModelsCode0
Deep Log-Likelihood Ratio QuantizationCode0
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image ClassificationCode0
Deep Learning with Low Precision by Half-wave Gaussian QuantizationCode0
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge DeploymentCode0
FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-designCode0
A Comprehensive Evaluation of Quantization Strategies for Large Language ModelsCode0
ECQ^x: Explainability-Driven Quantization for Low-Bit and Sparse DNNsCode0
Foundations of Large Language Model Compression -- Part 1: Weight QuantizationCode0
FP4DiT: Towards Effective Floating Point Quantization for Diffusion TransformersCode0
Neural Network Assisted Lifting Steps For Improved Fully Scalable Lossy Image Compression in JPEG 2000Code0
FLoCoRA: Federated learning compression with low-rank adaptationCode0
Deep Learning-Based Quantization of L-Values for Gray-Coded ModulationCode0
Floating-Point Quantization Analysis of Multi-Layer Perceptron Artificial Neural NetworksCode0
Show:102550
← PrevPage 28 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified