Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis Feb 18, 2025 Benchmarking Mamba
Code Code Available 0Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer Feb 17, 2025 GPU Quantization
— Unverified 0Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer Gate Feb 17, 2025 GPU Mixture-of-Experts
Code Code Available 0Towards Reasoning Ability of Small Language Models Feb 17, 2025 Quantization
— Unverified 0Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models? Feb 17, 2025 Quantization
— Unverified 0On the Logic Elements Associated with Round-Off Errors and Gaussian Blur in Image Registration: A Simple Case of Commingling Feb 17, 2025 Image Registration Quantization
— Unverified 0Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models Feb 17, 2025 Quantization
— Unverified 0On Quantizing Neural Representation for Variable-Rate Video Coding Feb 17, 2025 Quantization
Code Code Available 0Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View Feb 16, 2025 Language Modeling Language Modelling
Code Code Available 0CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs Feb 15, 2025 Computational Efficiency GPU
Code Code Available 1Weighted quantization using MMD: From mean field to mean shift via gradient flows Feb 14, 2025 Clustering Quantization
Code Code Available 0Towards Watermarking of Open-Source LLMs Feb 14, 2025 Quantization
— Unverified 0Low-Complexity On-Grid Channel Estimation for Partially-Connected Hybrid XL-MIMO Feb 14, 2025 Quantization
— Unverified 0CISSIR: Beam Codebooks with Self-Interference Reduction Guarantees for Integrated Sensing and Communication Beyond 5G Feb 14, 2025 Integrated sensing and communication ISAC
Code Code Available 1EmbBERT-Q: Breaking Memory Barriers in Embedded NLP Feb 14, 2025 Mamba Quantization
Code Code Available 0NestQuant: Nested Lattice Quantization for Matrix Products and LLMs Feb 13, 2025 Quantization
— Unverified 0RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models Feb 13, 2025 Quantization
— Unverified 0SQ-GAN: Semantic Image Communications Using Masked Vector Quantization Feb 13, 2025 Image Compression Quantization
Code Code Available 1Exploiting Non-uniform Quantization for Enhanced ILC in Wideband Digital Pre-distortion Feb 12, 2025 Quantization
— Unverified 0Compression of Site-Specific Deep Neural Networks for Massive MIMO Precoding Feb 12, 2025 Neural Architecture Search Neural Network Compression
— Unverified 0Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning Feb 12, 2025 Computational Efficiency Quantization
— Unverified 0Scalable Thermodynamic Second-order Optimization Feb 12, 2025 Quantization
— Unverified 0LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits Feb 12, 2025 parameter-efficient fine-tuning Quantization
— Unverified 0Loss Landscape Analysis for Reliable Quantized ML Models for Scientific Sensing Feb 12, 2025 Quantization
Code Code Available 0Vision-Language Models for Edge Networks: A Comprehensive Survey Feb 11, 2025 Autonomous Vehicles Image Captioning
— Unverified 0Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators Feb 11, 2025 Quantization
Code Code Available 0MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures Feb 11, 2025 Quantization
— Unverified 0HDCompression: Hybrid-Diffusion Image Compression for Ultra-Low Bitrates Feb 11, 2025 Image Compression Image Reconstruction
— Unverified 0Conditional Distribution Quantization in Machine Learning Feb 11, 2025 Quantization Uncertainty Quantification
— Unverified 0Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation Feb 10, 2025 Blood pressure estimation EEG
— Unverified 0Matryoshka Quantization Feb 10, 2025 Quantization
— Unverified 0Demystifying Singular Defects in Large Language Models Feb 10, 2025 Quantization
— Unverified 0GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units Feb 10, 2025 Event-based vision Quantization
Code Code Available 0Gradient Based Method for the Fusion of Lattice Quantizers Feb 9, 2025 Quantization
— Unverified 0Physics-Conditioned Diffusion Models for Lattice Gauge Theory Feb 8, 2025 Quantization
Code Code Available 0IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 11QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Feb 7, 2025 Image Generation Quantization
— Unverified 0AIQViT: Architecture-Informed Post-Training Quantization for Vision Transformers Feb 7, 2025 image-classification Image Classification
— Unverified 0Scalable and consistent embedding of probability measures into Hilbert spaces via measure quantization Feb 7, 2025 Quantization
— Unverified 0Efficient Evaluation of Quantization-Effects in Neural Codecs Feb 7, 2025 Decoder Quantization
— Unverified 0QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Feb 7, 2025 GPU Quantization
Code Code Available 2A Performance Analysis of You Only Look Once Models for Deployment on Constrained Computational Edge Devices in Drone Applications Feb 6, 2025 NVIDIA Jetson Orin Nano object-detection
— Unverified 0Exploring Model Invariance with Discrete Search for Ultra-Low-Bit Quantization Feb 6, 2025 Quantization
— Unverified 0KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference Feb 6, 2025 Mathematical Reasoning Quantization
Code Code Available 0TQ-DiT: Efficient Time-Aware Quantization for Diffusion Transformers Feb 6, 2025 Computational Efficiency Quantization
— Unverified 0Asymptotic Analysis of One-bit Quantized Box-Constrained Precoding in Large-Scale Multi-User Systems Feb 5, 2025 Quantization
— Unverified 0SensorChat: Answering Qualitative and Quantitative Questions during Long-Term Multimodal Sensor Interactions Feb 5, 2025 Quantization Question Answering
— Unverified 0HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference Feb 5, 2025 Language Modeling Language Modelling
— Unverified 0ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Feb 4, 2025 Quantization
Code Code Available 3Unlocking Efficient Large Inference Models: One-Bit Unrolling Tips the Scales Feb 4, 2025 Language Modeling Language Modelling
— Unverified 0