SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 9511000 of 4925 papers

TitleStatusHype
Enabling On-Device Medical AI Assistants via Input-Driven Saliency Adaptation0
Towards AI-Native Fronthaul: Neural Compression for NextG Cloud RAN0
Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition0
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical ModelCode0
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning0
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering0
FPTQuant: Function-Preserving Transforms for LLM Quantization0
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion0
Massive MIMO with 1-Bit DACs: Data Detection for Quantized Linear Precoding with Dithering0
PCDVQ: Enhancing Vector Quantization for Large Language Models via Polar Coordinate Decoupling0
Kernel k-Medoids as General Vector Quantization0
Nonlinear Sparse Bayesian Learning Methods with Application to Massive MIMO Channel Estimation with Hardware Impairments0
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing0
STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector QuantizationCode0
Quantized Dissipative Uncertain Model for Fractional T_S Fuzzy systems with Time_Varying Delays Under Networked Control System0
Enhancing Convergence, Privacy and Fairness for Wireless Personalized Federated Learning: Quantization-Assisted Min-Max Fair Scheduling0
MUC-G4: Minimal Unsat Core-Guided Incremental Verification for Deep Neural Network Compression0
Parameter Efficient Fine Tuning Llama 3.1 for Answering Arabic Legal Questions: A Case Study on Jordanian LawsCode0
Flexible Mixed Precision Quantization for Learned Image CompressionCode0
Quantitative Error Feedback for Quantization Noise Reduction of Filtering over Graphs0
Structured Pruning and Quantization for Learned Image CompressionCode0
Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 20250
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer0
Quantization-based Bounds on the Wasserstein Metric0
Power-of-Two (PoT) Weights in Large Language Models (LLMs)0
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal TextCode0
Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study0
Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach0
LittleBit: Ultra Low-Bit Quantization via Latent Factorization0
MuLoCo: Muon is a practical inner optimizer for DiLoCo0
Efficient Quantum Approximate kNN Algorithm via Granular-Ball Computing0
Merge-Friendly Post-Training Quantization for Multi-Target Domain AdaptationCode0
Revisiting Uncertainty Estimation and Calibration of Large Language Models0
Highly Efficient and Effective LLMs with Multi-Boolean Architectures0
Climate Finance BenchCode0
On the Interplay of Privacy, Persuasion and Quantization0
Does quantization affect models' performance on long-context tasks?Code0
Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation0
LPCM: Learning-based Predictive Coding for LiDAR Point Cloud Compression0
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge0
BrainStratify: Coarse-to-Fine Disentanglement of Intracranial Neural Dynamics0
Optimizing edge AI models on HPC systems with the edge in the loopCode0
Efficient Speech Translation through Model Compression and Knowledge DistillationCode0
Communication-Efficient Multi-Device Inference Acceleration for Transformer ModelsCode0
FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization0
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-TuningCode0
Adaptive Prediction-Powered AutoEval with Reliability and Efficiency GuaranteesCode0
Distinctive Feature Codec: Adaptive Segmentation for Efficient Speech Representation0
Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing0
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy CodingCode0
Show:102550
← PrevPage 20 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified