Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption May 12, 2025 GPU Knowledge Base Question Answering
— Unverified 0Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations May 10, 2025 Language Modeling Language Modelling
— Unverified 0Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference May 9, 2025 CPU GPU
— Unverified 0Turbo-ICL: In-Context Learning-Based Turbo Equalization May 9, 2025 Decoder Diversity
— Unverified 0LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization May 9, 2025 Protein Folding Protein Structure Prediction
— Unverified 0Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model May 8, 2025 Computational Efficiency Instance Segmentation
— Unverified 0ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation May 8, 2025 Quantization
— Unverified 0Low-bit Model Quantization for Deep Neural Networks: A Survey May 8, 2025 Quantization
Code Code Available 0LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities May 8, 2025 Fairness Quantization
Code Code Available 0Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning May 8, 2025 Quantization
— Unverified 0On-Device LLM for Context-Aware Wi-Fi Roaming May 7, 2025 Language Modeling Language Modelling
Code Code Available 03D Gaussian Splatting Data Compression with Mixture of Priors May 6, 2025 3DGS Data Compression
— Unverified 0Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation May 6, 2025 Disease Prediction Quantization
— Unverified 0PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs May 6, 2025 Quantization
— Unverified 0Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques May 5, 2025 Knowledge Distillation Mixture-of-Experts
— Unverified 0RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction May 5, 2025 Prognosis Quantization
— Unverified 0Rapid yet accurate Tile-circuit and device modeling for Analog In-Memory Computing May 5, 2025 Quantization
— Unverified 0Quantitative Analysis of Performance Drop in DeepSeek Model Quantization May 5, 2025 GPU Quantization
Code Code Available 0End-to-end fully-binarized network design: from Generic Learned Thermometer to Block Pruning May 5, 2025 Knowledge Distillation Quantization
— Unverified 0Radio: Rate-Distortion Optimization for Large Language Model Compression May 5, 2025 Language Modeling Language Modelling
— Unverified 0EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices May 5, 2025 4k Language Modeling
— Unverified 0Bielik 11B v2 Technical Report May 5, 2025 Language Modeling Language Modelling
— Unverified 0NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities May 5, 2025 Benchmarking Quantization
Code Code Available 0Quantizing Diffusion Models from a Sampling-Aware Perspective May 4, 2025 Denoising Noise Estimation
— Unverified 0PASCAL: Precise and Efficient ANN- SNN Conversion using Spike Accumulation and Adaptive Layerwise Activation May 3, 2025 Quantization
— Unverified 0Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth May 2, 2025 GSM8K Quantization
— Unverified 0LMDepth: Lightweight Mamba-based Monocular Depth Estimation for Real-World Deployment May 2, 2025 Autonomous Driving Computational Efficiency
— Unverified 0Efficient Vision-based Vehicle Speed Estimation May 2, 2025 Quantization vehicle detection
— Unverified 0Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free May 2, 2025 Quantization
— Unverified 0Aggregating empirical evidence from data strategy studies: a case on model quantization May 1, 2025 GPU Quantization
— Unverified 0Optimizing Deep Neural Networks using Safety-Guided Self Compression May 1, 2025 Language Modeling Language Modelling
Code Code Available 0Generative QoE Modeling: A Lightweight Approach for Telecom Networks Apr 30, 2025 Computational Efficiency Quantization
— Unverified 0Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques Apr 30, 2025 Dimensionality Reduction MTEB Benchmark
— Unverified 0Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models Apr 30, 2025 Quantization
— Unverified 0Clustering-Based Evolutionary Federated Multiobjective Optimization and Learning Apr 29, 2025 Clustering Diversity
— Unverified 0APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech Apr 29, 2025 Quantization
— Unverified 0FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs Apr 28, 2025 Quantization
— Unverified 0TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Apr 28, 2025 Quantization
— Unverified 0Partition Map-Based Fast Block Partitioning for VVC Inter Coding Apr 25, 2025 Quantization
Code Code Available 0Pushing the boundary on Natural Language Inference Apr 25, 2025 Fact Checking Information Retrieval
— Unverified 0On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration Apr 24, 2025 CPU Model Compression
— Unverified 0Fast Autoregressive Models for Continuous Latent Generation Apr 24, 2025 Denoising Image Generation
— Unverified 0Precision Neural Network Quantization via Learnable Adaptive Modules Apr 24, 2025 Computational Efficiency Quantization
— Unverified 0Distributed Optimization with Efficient Communication, Event-Triggered Solution Enhancement, and Operation Stopping Apr 23, 2025 Distributed Optimization Quantization
— Unverified 0TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs Apr 22, 2025 Quantization
— Unverified 0Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis Apr 22, 2025 GPU Quantization
— Unverified 0A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings Apr 22, 2025 Computational Efficiency GPU
Code Code Available 0Compute-Optimal LLMs Provably Generalize Better With Scale Apr 21, 2025 Generalization Bounds Quantization
— Unverified 0StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models Apr 21, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space Apr 20, 2025 Attribute Decoder
— Unverified 0