SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 12511300 of 4925 papers

TitleStatusHype
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation MetricsCode0
SDQ: Sparse Decomposed Quantization for LLM Inference0
High-Fidelity Facial Albedo Estimation via Texture Quantization0
Q-SNNs: Quantized Spiking Neural Networks0
Attention-aware Post-training Quantization without Backpropagation0
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates0
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language ModelsCode1
MSE Minimization in RIS-Aided MU-MIMO with Discrete Phase Shifts and Fronthaul Quantization0
Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization0
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint ShrinkingCode1
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%Code2
QTIP: Quantization with Trellises and Incoherence ProcessingCode1
Autoregressive Image Generation without Vector QuantizationCode5
Deep-Learning-Based Channel Estimation for Distributed MIMO with 1-bit Radio-Over-Fiber Fronthaul0
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization0
Promoting Data and Model Privacy in Federated Learning through Quantized LoRA0
An Analysis on Quantizing Diffusion Transformers0
Optimization of Armv9 architecture general large language model inference performance based on Llama.cppCode0
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and ToolboxCode1
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training0
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?0
Optimizing Byte-level Representation for End-to-end ASR0
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model0
QQQ: Quality Quattuor-Bit Quantization for Large Language ModelsCode2
Precipitation Nowcasting Using Physics Informed Discriminator Generative Models0
GEB-1.3B: Open Lightweight Large Language Model0
Human-level molecular optimization driven by mol-gene evolution0
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis0
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language ModelsCode2
Q-S5: Towards Quantized State Space ModelsCode0
MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction0
OpenVLA: An Open-Source Vision-Language-Action ModelCode9
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models0
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases0
Compressive Beam Alignment for Indoor Millimeter-Wave Systems0
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization0
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment0
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
FoldToken2: Learning compact, invariant and generative protein structure language0
Image and Video Tokenization with Binary Spherical QuantizationCode3
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text0
TernaryLLM: Ternarized Large Language Model0
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs0
2DQuant: Low-bit Post-Training Quantization for Image Super-ResolutionCode1
Topological Analysis for Detecting Anomalies (TADA) in Time Series0
Low-Rank Quantization-Aware Training for LLMsCode2
Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks0
Efficient Neural Compression with Inference-time Decoding0
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization0
From Analog to Digital: Multi-Order Digital Joint Coding-Modulation for Semantic CommunicationCode1
Show:102550
← PrevPage 26 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified