SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 101150 of 4925 papers

TitleStatusHype
FlatQuant: Flatness Matters for LLM QuantizationCode3
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens IntactCode3
Scaling Transformers for Low-Bitrate High-Quality Speech CodingCode3
Fast Matrix Multiplications for Lookup Table-Quantized LLMsCode3
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignCode3
APOLLO: SGD-like Memory, AdamW-level PerformanceCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
EfficientQAT: Efficient Quantization-Aware Training for Large Language ModelsCode3
Pushing the Limits of Large Language Model Quantization via the Linearity TheoremCode3
DPLM-2: A Multimodal Diffusion Protein Language ModelCode3
Ditto: Quantization-aware Secure Inference of Transformers upon MPCCode3
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsCode3
Data Generation for Hardware-Friendly Post-Training QuantizationCode3
OneBit: Towards Extremely Low-bit Large Language ModelsCode3
MotionGPT: Human Motion as a Foreign LanguageCode3
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector QuantizationCode3
8-bit Optimizers via Block-wise QuantizationCode3
A Survey on Large Language Model Acceleration based on KV Cache ManagementCode3
Addressing Representation Collapse in Vector Quantized Models with One Linear LayerCode3
Compact 3D Gaussian Splatting for Static and Dynamic Radiance FieldsCode3
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of ExpertsCode3
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsCode3
MAUVE Scores for Generative Models: Theory and PracticeCode2
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge UnderstandingCode2
MBQ: Modality-Balanced Quantization for Large Vision-Language ModelsCode2
CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise ClassificationCode2
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains MoreCode2
Accurate LoRA-Finetuning Quantization of LLMs via Information RetentionCode2
MAexp: A Generic Platform for RL-based Multi-Agent ExplorationCode2
Adapting Large Language Models by Integrating Collaborative Semantics for RecommendationCode2
4-bit Conformer with Native Quantization Aware Training for Speech RecognitionCode2
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingCode2
LoQT: Low-Rank Adapters for Quantized PretrainingCode2
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language ModelsCode2
LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor SearchCode2
Lossless Compression of Vector IDs for Approximate Nearest Neighbor SearchCode2
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPSCode2
LLM-FP4: 4-Bit Floating-Point Quantized TransformersCode2
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object DetectionCode2
Low-Rank Quantization-Aware Training for LLMsCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
BMInf: An Efficient Toolkit for Big Model Inference and TuningCode2
Bolt: Accelerated Data Mining with Fast Vector CompressionCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable ApproachesCode2
BitNet: Scaling 1-bit Transformers for Large Language ModelsCode2
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block QuantizationCode2
I-ViT: Integer-only Quantization for Efficient Vision Transformer InferenceCode2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical DomainsCode2
Show:102550
← PrevPage 3 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified