NeUQI: Near-Optimal Uniform Quantization Parameter Initialization May 23, 2025 Quantization
Code Code Available 0NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache May 23, 2025 Language Modeling Language Modelling
— Unverified 0Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? May 23, 2025 Medical Question Answering Quantization
— Unverified 0Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM May 23, 2025 Quantization
— Unverified 0Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization May 23, 2025 compressed sensing Quantization
— Unverified 0NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics May 22, 2025 Quantization
— Unverified 0DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection May 22, 2025 Quantization Safety Alignment
Code Code Available 0FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design May 22, 2025 GPU Image Generation
Code Code Available 0Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing May 22, 2025 Model Compression Neural Network Compression
— Unverified 0Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information May 21, 2025 Language Modeling Language Modelling
— Unverified 0Harnessing Large Language Models Locally: Empirical Results and Implications for AI PC May 21, 2025 CPU Quantization
Code Code Available 0InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference May 21, 2025 Edge-computing Quantization
— Unverified 0Is (Selective) Round-To-Nearest Quantization All You Need? May 21, 2025 All Quantization
— Unverified 0Rate-Distortion Optimization with Non-Reference Metrics for UGC Compression May 21, 2025 Quantization
— Unverified 0EfficientLLM: Efficiency in Large Language Models May 20, 2025 Mixture-of-Experts Quantization
— Unverified 0Layer-wise Quantization for Quantized Optimistic Dual Averaging May 20, 2025 Quantization
— Unverified 0Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference May 20, 2025 Quantization speech-recognition
— Unverified 0Through a Compressed Lens: Investigating the Impact of Quantization on LLM Explainability and Interpretability May 20, 2025 counterfactual Memorization
— Unverified 0Deep Unfolding with Kernel-based Quantization in MIMO Detection May 19, 2025 Density Estimation Edge-computing
— Unverified 0QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding May 19, 2025 Quantization Spoken Language Understanding
Code Code Available 0An Overview of Arithmetic Adaptations for Inference of Convolutional Neural Networks on Re-configurable Hardware May 19, 2025 Quantization
Code Code Available 0UniHM: Universal Human Motion Generation with Object Interactions in Indoor Scenes May 19, 2025 Human-Object Interaction Detection Motion Generation
— Unverified 0GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization May 19, 2025 Computational Efficiency Image Compression
— Unverified 0A3 : an Analytical Low-Rank Approximation Framework for Attention May 19, 2025 Quantization
— Unverified 0Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs May 19, 2025 Quantization Sensitivity
— Unverified 0KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache May 18, 2025 Quantization
— Unverified 0Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies May 18, 2025 Inductive Bias Knowledge Graphs
— Unverified 0CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design May 18, 2025 GPU Language Modeling
— Unverified 0PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement May 18, 2025 Quantization Video Enhancement
Code Code Available 0FedHQ: Hybrid Runtime Quantization for Federated Learning May 17, 2025 Federated Learning Quantization
— Unverified 0QVGen: Pushing the Limit of Quantized Video Generative Models May 16, 2025 Quantization
— Unverified 0MARRS: Masked Autoregressive Unit-based Reaction Synthesis May 16, 2025 Motion Generation Quantization
— Unverified 0Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training May 16, 2025 GPU Quantization
— Unverified 0Benchmarking CFAR and CNN-based Peak Detection Algorithms in ISAC under Hardware Impairments May 16, 2025 Benchmarking Integrated sensing and communication
— Unverified 0Addition is almost all you need: Compressing neural networks with double binary factorization May 16, 2025 All Binarization
Code Code Available 0Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization May 16, 2025 Quantization Text Generation
— Unverified 0Formal Uncertainty Propagation for Stochastic Dynamical Systems with Additive Noise May 16, 2025 Quantization Stochastic Optimization
— Unverified 0VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits May 15, 2025 Language Modeling Language Modelling
— Unverified 0TransPL: VQ-Code Transition Matrices for Pseudo-Labeling of Time Series Unsupervised Domain Adaptation May 15, 2025 Domain Adaptation Pseudo Label
Code Code Available 0A probabilistic framework for dynamic quantization May 15, 2025 Quantization
— Unverified 0Efficient Mixed Precision Quantization in Graph Neural Networks May 14, 2025 Graph Classification Node Classification
Code Code Available 0Zero-shot Quantization: A Comprehensive Survey May 14, 2025 Quantization Survey
— Unverified 0Multi-Layer Hierarchical Federated Learning with Quantization May 13, 2025 Federated Learning Quantization
— Unverified 0Resource-Efficient Language Models: Quantization for Fast and Accessible Inference May 13, 2025 Quantization
— Unverified 0Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption May 12, 2025 GPU Knowledge Base Question Answering
— Unverified 0Semantic Retention and Extreme Compression in LLMs: Can We Have Both? May 12, 2025 Language Modeling Language Modelling
— Unverified 0Cognitive Non-Coherent Jamming Techniques for Frequency Selective Attacks May 12, 2025 Quantization
— Unverified 0Efficient ANN-SNN Conversion with Error Compensation Learning May 12, 2025 Quantization
— Unverified 0QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads May 12, 2025 Quantization
— Unverified 0An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits May 12, 2025 All Knowledge Distillation
— Unverified 0