SOTAVerified

Quantization

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Papers

Showing 151200 of 4925 papers

TitleStatusHype
On-Device Training Under 256KB MemoryCode2
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution FittingCode2
Preventing Local Pitfalls in Vector Quantization via Optimal TransportCode2
Neural Network Compression Framework for fast model inferenceCode2
Compact 3D Gaussian Representation for Radiance FieldCode2
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural NetworksCode2
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular QuantizersCode2
MobileQuant: Mobile-friendly Quantization for On-device Language ModelsCode2
Model-Preserving Adaptive RoundingCode2
CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise ClassificationCode2
CompGS: Smaller and Faster Gaussian Splatting with Vector QuantizationCode2
MotionLLaMA: A Unified Framework for Motion Synthesis and ComprehensionCode2
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language ModelsCode2
PTQ4SAM: Post-Training Quantization for Segment AnythingCode2
QuIP: 2-Bit Quantization of Large Language Models With GuaranteesCode2
MAUVE Scores for Generative Models: Theory and PracticeCode2
MBQ: Modality-Balanced Quantization for Large Vision-Language ModelsCode2
Low-Rank Quantization-Aware Training for LLMsCode2
LoQT: Low-Rank Adapters for Quantized PretrainingCode2
MAexp: A Generic Platform for RL-based Multi-Agent ExplorationCode2
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains MoreCode2
BMInf: An Efficient Toolkit for Big Model Inference and TuningCode2
BitVLA: 1-bit Vision-Language-Action Models for Robotics ManipulationCode2
LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPSCode2
BitNet: Scaling 1-bit Transformers for Large Language ModelsCode2
LeanVec: Searching vectors faster by making them fitCode2
Bolt: Accelerated Data Mining with Fast Vector CompressionCode2
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object DetectionCode2
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingCode2
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical DomainsCode2
Binary Neural Networks: A SurveyCode2
LLM-FP4: 4-Bit Floating-Point Quantized TransformersCode2
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable ApproachesCode2
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language ModelsCode2
LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor SearchCode2
Lossless Compression of Vector IDs for Approximate Nearest Neighbor SearchCode2
Adapting Large Language Models by Integrating Collaborative Semantics for RecommendationCode2
Binarized Neural Machine TranslationCode2
INT-FlashAttention: Enabling Flash Attention for INT8 QuantizationCode2
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge UnderstandingCode2
I-ViT: Integer-only Quantization for Efficient Vision Transformer InferenceCode2
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block QuantizationCode2
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-ResolutionCode2
Compressing Large Language Models using Low Rank and Low Precision DecompositionCode2
Imp: Highly Capable Large Multimodal Models for Mobile DevicesCode2
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV CacheCode2
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive SurveyCode2
BHViT: Binarized Hybrid Vision TransformerCode2
I-BERT: Integer-only BERT QuantizationCode2
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group QuantizationCode2
Show:102550
← PrevPage 4 of 99Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1FQ-ViT (ViT-L)Top-1 Accuracy (%)85.03Unverified
2FQ-ViT (ViT-B)Top-1 Accuracy (%)83.31Unverified
3FQ-ViT (Swin-B)Top-1 Accuracy (%)82.97Unverified
4FQ-ViT (Swin-S)Top-1 Accuracy (%)82.71Unverified
5FQ-ViT (DeiT-B)Top-1 Accuracy (%)81.2Unverified
6FQ-ViT (Swin-T)Top-1 Accuracy (%)80.51Unverified
7FQ-ViT (DeiT-S)Top-1 Accuracy (%)79.17Unverified
8Xception W8A8Top-1 Accuracy (%)78.97Unverified
9ADLIK-MO-ResNet50-W4A4Top-1 Accuracy (%)77.88Unverified
10ADLIK-MO-ResNet50-W3A4Top-1 Accuracy (%)77.34Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_3MAP160,327.04Unverified
2DTQMAP0.79Unverified
#ModelMetricClaimedVerifiedStatus
1OutEffHop-Bert_basePerplexity6.3Unverified
2OutEffHop-Bert_basePerplexity6.21Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy98.13Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy92.92Unverified
#ModelMetricClaimedVerifiedStatus
1SSD ResNet50 V1 FPN 640x640MAP34.3Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-495.13Unverified
#ModelMetricClaimedVerifiedStatus
1TAR @ FAR=1e-496.38Unverified
#ModelMetricClaimedVerifiedStatus
13DCNN_VIVA_5All84,809,664Unverified
#ModelMetricClaimedVerifiedStatus
1Accuracy99.8Unverified