SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 851900 of 4925 papers

TitleStatusHype
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense RetrievalCode1
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion ModelsCode1
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined RotationCode1
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data AugmentationCode1
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion ModelsCode1
Design Methodology for Deep Out-of-Distribution Detectors in Real-Time Cyber-Physical SystemsCode1
Scientific Image Restoration AnywhereCode1
Search for Efficient Large Language ModelsCode1
ARB-LLM: Alternating Refined Binarizations for Large Language ModelsCode1
Matching-oriented Product Quantization For Ad-hoc RetrievalCode1
Arch-Net: Model Distillation for Architecture Agnostic Model DeploymentCode1
Seizure Detection and Prediction by Parallel Memristive Convolutional Neural NetworksCode1
Differentiable JPEG: The Devil is in the DetailsCode1
Deep Transferring QuantizationCode1
Deep PeNSieve: A deep learning framework based on the posit number systemCode1
Deep Geometry Post-Processing for Decompressed Point CloudsCode1
Deep Learning-Enabled One-Bit DoA EstimationCode1
DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two QuantizationCode1
With Shared Microexponents, A Little Shifting Goes a Long WayCode1
Sharpness-aware Quantization for Deep Neural NetworksCode1
Differentiable Model Compression via Pseudo Quantization NoiseCode1
Dataset Quantization with Active Learning based Adaptive SamplingCode1
Compact representations of convolutional neural networks via weight pruning and quantizationCode1
A Refined Analysis of Massive Activations in LLMsCode1
Data-Free Network Quantization With Adversarial Knowledge DistillationCode1
SLMRec: Distilling Large Language Models into Small for Sequential RecommendationCode1
A holistic approach to polyphonic music transcription with neural networksCode1
Data-Free Quantization Through Weight Equalization and Bias CorrectionCode1
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector QuantizationCode1
D^2-DPM: Dual Denoising for Quantized Diffusion Probabilistic ModelsCode1
SparseDNN: Fast Sparse Deep Learning Inference on CPUsCode1
Sparse Fine-tuning for Inference Acceleration of Large Language ModelsCode1
CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image TranslationCode1
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingCode1
DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution NetworksCode1
Mixed Precision DNNs: All you need is a good parametrizationCode1
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint ReductionCode1
Join the High Accuracy Club on ImageNet with A Binary Neural Network TicketCode1
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic QuantizationCode1
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMsCode1
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures0
A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays0
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model0
A Tiny CNN Architecture for Medical Face Mask Detection for Resource-Constrained Endpoints0
A Gridless Compressive Sensing Based Channel Estimation for Millimeter Wave MIMO OFDM Systems with One-Bit Quantization0
Achieving Robustness in Blind Modulo Analog-to-Digital Conversion0
Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information0
ATHEENA: A Toolflow for Hardware Early-Exit Network Automation0
A Targeted Acceleration and Compression Framework for Low bit Neural Networks0
A Greedy Bit-flip Training Algorithm for Binarized Knowledge Graph Embeddings0
Show:102550
← PrevPage 18 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified