Edge Computing for Physics-Driven AI in Computational MRI: A Feasibility Study May 30, 2025 Computational Efficiency Edge-computing
— Unverified 0LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text May 30, 2025 Quantization
Code Code Available 0Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach May 30, 2025 Automatic Speech Recognition Quantization
— Unverified 0Revisiting Uncertainty Estimation and Calibration of Large Language Models May 29, 2025 Mixture-of-Experts MMLU
— Unverified 0Model-Preserving Adaptive Rounding May 29, 2025 model Quantization
Code Code Available 2Merge-Friendly Post-Training Quantization for Multi-Target Domain Adaptation May 29, 2025 Domain Adaptation Multi-target Domain Adaptation
Code Code Available 0MuLoCo: Muon is a practical inner optimizer for DiLoCo May 29, 2025 Decoder Quantization
— Unverified 0Efficient Quantum Approximate kNN Algorithm via Granular-Ball Computing May 29, 2025 Quantization
— Unverified 0On the Interplay of Privacy, Persuasion and Quantization May 28, 2025 Decision Making Decoder
— Unverified 0Highly Efficient and Effective LLMs with Multi-Boolean Architectures May 28, 2025 Binarization Quantization
— Unverified 0Climate Finance Bench May 28, 2025 Logical Reasoning Quantization
Code Code Available 0Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design May 28, 2025 GPU Quantization
Code Code Available 1SageAttention2++: A More Efficient Implementation of SageAttention2 May 27, 2025 Quantization Video Generation
Code Code Available 7FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching May 26, 2025 Quantization Speech Enhancement
Code Code Available 2BrainStratify: Coarse-to-Fine Disentanglement of Intracranial Neural Dynamics May 26, 2025 Brain Computer Interface Disentanglement
— Unverified 0Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression May 26, 2025 Language Modeling Language Modelling
Code Code Available 1LPCM: Learning-based Predictive Coding for LiDAR Point Cloud Compression May 26, 2025 Quantization
— Unverified 0Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation May 26, 2025 Model Compression Quantization
— Unverified 0CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge May 26, 2025 Activity Recognition Quantization
— Unverified 0Efficient Speech Translation through Model Compression and Knowledge Distillation May 26, 2025 Knowledge Distillation Model Compression
Code Code Available 0TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization May 26, 2025 CPU GPU
Code Code Available 1Does quantization affect models' performance on long-context tasks? May 26, 2025 Quantization
Code Code Available 0Optimizing edge AI models on HPC systems with the edge in the loop May 26, 2025 Hardware Aware Neural Architecture Search Knowledge Distillation
Code Code Available 0Communication-Efficient Multi-Device Inference Acceleration for Transformer Models May 25, 2025 Quantization
Code Code Available 0FP4 All the Way: Fully Quantized Training of LLMs May 25, 2025 All Quantization
Code Code Available 1FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization May 25, 2025 Computational Efficiency CPU
— Unverified 0Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing May 24, 2025 Model Compression Quantization
— Unverified 0Mind the Gap: A Practical Attack on GGUF Quantization May 24, 2025 Code Generation Quantization
Code Code Available 1Distinctive Feature Codec: Adaptive Segmentation for Efficient Speech Representation May 24, 2025 Quantization Representation Learning
— Unverified 0LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning May 24, 2025 Computational Efficiency MMLU
Code Code Available 0Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees May 24, 2025 Quantization
Code Code Available 0Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding May 24, 2025 Quantization
Code Code Available 0DVD-Quant: Data-free Video Diffusion Transformers Quantization May 24, 2025 Data Free Quantization Quantization
Code Code Available 1PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs May 24, 2025 Quantization
Code Code Available 1NeUQI: Near-Optimal Uniform Quantization Parameter Initialization May 23, 2025 Quantization
Code Code Available 0Beyond Discreteness: Finite-Sample Analysis of Straight-Through Estimator for Quantization May 23, 2025 compressed sensing Quantization
— Unverified 0NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache May 23, 2025 Language Modeling Language Modelling
— Unverified 0Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM May 23, 2025 Quantization
— Unverified 0Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need? May 23, 2025 Medical Question Answering Quantization
— Unverified 0UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information May 23, 2025 Large Language Model Quantization
Code Code Available 1FPQVAR: Floating Point Quantization for Visual Autoregressive Model with FPGA Hardware Co-design May 22, 2025 GPU Image Generation
Code Code Available 0NQKV: A KV Cache Quantization Scheme Based on Normal Distribution Characteristics May 22, 2025 Quantization
— Unverified 0DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection May 22, 2025 Quantization Safety Alignment
Code Code Available 0Is Quantum Optimization Ready? An Effort Towards Neural Network Compression using Adiabatic Quantum Computing May 22, 2025 Model Compression Neural Network Compression
— Unverified 0Is (Selective) Round-To-Nearest Quantization All You Need? May 21, 2025 All Quantization
— Unverified 0Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information May 21, 2025 Language Modeling Language Modelling
— Unverified 0Harnessing Large Language Models Locally: Empirical Results and Implications for AI PC May 21, 2025 CPU Quantization
Code Code Available 0Rate-Distortion Optimization with Non-Reference Metrics for UGC Compression May 21, 2025 Quantization
— Unverified 0InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference May 21, 2025 Edge-computing Quantization
— Unverified 0Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis May 20, 2025 GPU parameter-efficient fine-tuning
Code Code Available 1