SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 19011925 of 4925 papers

TitleStatusHype
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks0
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study0
Received Power Maximization Using Nonuniform Discrete Phase Shifts for RISs With a Limited Phase Range0
HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization0
FLoCoRA: Federated learning compression with low-rank adaptationCode0
Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE0
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation MetricsCode0
Q-SNNs: Quantized Spiking Neural Networks0
High-Fidelity Facial Albedo Estimation via Texture Quantization0
SDQ: Sparse Decomposed Quantization for LLM Inference0
Attention-aware Post-training Quantization without Backpropagation0
MSE Minimization in RIS-Aided MU-MIMO with Discrete Phase Shifts and Fronthaul Quantization0
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates0
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization0
Deep-Learning-Based Channel Estimation for Distributed MIMO with 1-bit Radio-Over-Fiber Fronthaul0
Promoting Data and Model Privacy in Federated Learning through Quantized LoRA0
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization0
An Analysis on Quantizing Diffusion Transformers0
Optimization of Armv9 architecture general large language model inference performance based on Llama.cppCode0
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?0
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training0
GEB-1.3B: Open Lightweight Large Language Model0
Precipitation Nowcasting Using Physics Informed Discriminator Generative Models0
Optimizing Byte-level Representation for End-to-end ASR0
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model0
Show:102550
← PrevPage 77 of 197Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified