Turbo-ICL: In-Context Learning-Based Turbo Equalization May 9, 2025 Decoder Diversity
— Unverified 0MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design May 9, 2025 Mixture-of-Experts Quantization
Code Code Available 1LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities May 8, 2025 Fairness Quantization
Code Code Available 0Low-bit Model Quantization for Deep Neural Networks: A Survey May 8, 2025 Quantization
Code Code Available 0ReactDance: Progressive-Granular Representation for Long-Term Coherent Reactive Dance Generation May 8, 2025 Quantization
— Unverified 0Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model May 8, 2025 Computational Efficiency Instance Segmentation
— Unverified 0Diffusion Model Quantization: A Review May 8, 2025 model Quantization
Code Code Available 2TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation May 8, 2025 Quantization
Code Code Available 3Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning May 8, 2025 Quantization
— Unverified 0RGB-Event Fusion with Self-Attention for Collision Prediction May 7, 2025 Benchmarking Computational Efficiency
Code Code Available 1On-Device LLM for Context-Aware Wi-Fi Roaming May 7, 2025 Language Modeling Language Modelling
Code Code Available 03D Gaussian Splatting Data Compression with Mixture of Priors May 6, 2025 3DGS Data Compression
— Unverified 0PROM: Prioritize Reduction of Multiplications Over Lower Bit-Widths for Efficient CNNs May 6, 2025 Quantization
— Unverified 0Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation May 6, 2025 Disease Prediction Quantization
— Unverified 0Rapid yet accurate Tile-circuit and device modeling for Analog In-Memory Computing May 5, 2025 Quantization
— Unverified 0End-to-end fully-binarized network design: from Generic Learned Thermometer to Block Pruning May 5, 2025 Knowledge Distillation Quantization
— Unverified 0Radio: Rate-Distortion Optimization for Large Language Model Compression May 5, 2025 Language Modeling Language Modelling
— Unverified 0EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices May 5, 2025 4k Language Modeling
— Unverified 0Bielik 11B v2 Technical Report May 5, 2025 Language Modeling Language Modelling
— Unverified 0RobSurv: Vector Quantization-Based Multi-Modal Learning for Robust Cancer Survival Prediction May 5, 2025 Prognosis Quantization
— Unverified 0Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques May 5, 2025 Knowledge Distillation Mixture-of-Experts
— Unverified 0Quantitative Analysis of Performance Drop in DeepSeek Model Quantization May 5, 2025 GPU Quantization
Code Code Available 0NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities May 5, 2025 Benchmarking Quantization
Code Code Available 0An Empirical Study of Qwen3 Quantization May 4, 2025 Natural Language Understanding Quantization
Code Code Available 2Quantizing Diffusion Models from a Sampling-Aware Perspective May 4, 2025 Denoising Noise Estimation
— Unverified 0PASCAL: Precise and Efficient ANN- SNN Conversion using Spike Accumulation and Adaptive Layerwise Activation May 3, 2025 Quantization
— Unverified 0Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth May 2, 2025 GSM8K Quantization
— Unverified 0Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free May 2, 2025 Quantization
— Unverified 0LMDepth: Lightweight Mamba-based Monocular Depth Estimation for Real-World Deployment May 2, 2025 Autonomous Driving Computational Efficiency
— Unverified 0Efficient Vision-based Vehicle Speed Estimation May 2, 2025 Quantization vehicle detection
— Unverified 0Aggregating empirical evidence from data strategy studies: a case on model quantization May 1, 2025 GPU Quantization
— Unverified 0Optimizing Deep Neural Networks using Safety-Guided Self Compression May 1, 2025 Language Modeling Language Modelling
Code Code Available 0Fast and Low-Cost Genomic Foundation Models via Outlier Removal May 1, 2025 Adversarial Attack Adversarial Robustness
Code Code Available 1Generative QoE Modeling: A Lightweight Approach for Telecom Networks Apr 30, 2025 Computational Efficiency Quantization
— Unverified 0Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models Apr 30, 2025 Quantization
— Unverified 0Optimization of embeddings storage for RAG systems using quantization and dimensionality reduction techniques Apr 30, 2025 Dimensionality Reduction MTEB Benchmark
— Unverified 0Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Apr 29, 2025 Quantization
Code Code Available 2Clustering-Based Evolutionary Federated Multiobjective Optimization and Learning Apr 29, 2025 Clustering Diversity
— Unverified 0APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech Apr 29, 2025 Quantization
— Unverified 0TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate Apr 28, 2025 Quantization
— Unverified 0FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs Apr 28, 2025 Quantization
— Unverified 0Partition Map-Based Fast Block Partitioning for VVC Inter Coding Apr 25, 2025 Quantization
Code Code Available 0Pushing the boundary on Natural Language Inference Apr 25, 2025 Fact Checking Information Retrieval
— Unverified 0Fast Autoregressive Models for Continuous Latent Generation Apr 24, 2025 Denoising Image Generation
— Unverified 0Precision Neural Network Quantization via Learnable Adaptive Modules Apr 24, 2025 Computational Efficiency Quantization
— Unverified 0On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration Apr 24, 2025 CPU Model Compression
— Unverified 0Distributed Optimization with Efficient Communication, Event-Triggered Solution Enhancement, and Operation Stopping Apr 23, 2025 Distributed Optimization Quantization
— Unverified 0Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis Apr 22, 2025 GPU Quantization
— Unverified 0TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs Apr 22, 2025 Quantization
— Unverified 0A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings Apr 22, 2025 Computational Efficiency GPU
Code Code Available 0