Timestep-Aware Correction for Quantized Diffusion Models Jul 4, 2024 Attribute Noise Estimation
— Unverified 0Low-latency machine learning FPGA accelerator for multi-qubit-state discrimination Jul 4, 2024 Quantization
— Unverified 0OSPC: Artificial VLM Features for Hateful Meme Detection Jul 3, 2024 Computational Efficiency Feature Engineering
— Unverified 0Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation Jul 3, 2024 Knowledge Distillation Quantization
— Unverified 0Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations Jul 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Fisher-aware Quantization for DETR Detectors with Critical-category Objectives Jul 3, 2024 object-detection Object Detection
— Unverified 0How Does Quantization Affect Multilingual LLMs? Jul 3, 2024 Mathematical Reasoning Quantization
— Unverified 0Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment Jul 3, 2024 Chatbot Computational Efficiency
— Unverified 0ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers Jul 3, 2024 Attribute image-classification
— Unverified 0Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization Jul 3, 2024 Anomaly Detection CPU
— Unverified 0GPTQT: Quantize Large Language Models Twice to Push the Efficiency Jul 3, 2024 Quantization
— Unverified 0SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic Jul 3, 2024 Quantization
— Unverified 0QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices Jul 2, 2024 GPU Quantization
Code Code Available 1PQCache: Product Quantization-based KVCache for Long Context LLM Inference Jul 1, 2024 GPU Quantization
— Unverified 0Linear and Nonlinear MMSE Estimation in One-Bit Quantized Systems under a Gaussian Mixture Prior Jul 1, 2024 Quantization
— Unverified 0KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Jul 1, 2024 Book summarization Quantization
Code Code Available 2Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks Jul 1, 2024 Quantization
Code Code Available 0Exploring FPGA designs for MX and beyond Jul 1, 2024 Efficient Neural Network Quantization
— Unverified 0Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression Jul 1, 2024 Quantization
— Unverified 0NeuroNAS: Enhancing Efficiency of Neuromorphic In-Memory Computing for Intelligent Mobile Agents through Hardware-Aware Spiking Neural Architecture Search Jun 30, 2024 Neural Architecture Search Quantization
— Unverified 0Toward a Diffusion-Based Generalist for Dense Vision Tasks Jun 29, 2024 Conditional Image Generation Image Generation
— Unverified 0Rateless Stochastic Coding for Delay-Constrained Semantic Communication Jun 28, 2024 Decoder Perceptual Distance
— Unverified 0Deep Fusion Model for Brain Tumor Classification Using Fine-Grained Gradient Preservation Jun 28, 2024 Brain Tumor Classification Classification
— Unverified 0LLMEasyQuant: Scalable Quantization for Parallel and Distributed LLM Inference Jun 28, 2024 GPU Quantization
Code Code Available 1Reliable edge machine learning hardware for scientific applications Jun 27, 2024 Quantization scientific discovery
— Unverified 0Fronthaul Quantization-Aware MU-MIMO Precoding for Sum Rate Maximization Jun 27, 2024 Quantization
— Unverified 0OutlierTune: Efficient Channel-Wise Quantization for Large Language Models Jun 27, 2024 Quantization
— Unverified 0Efficient course recommendations with T5-based ranking and summarization Jun 27, 2024 In-Context Learning Quantization
Code Code Available 0MCNC: Manifold Constrained Network Compression Jun 27, 2024 Model Compression Quantization
— Unverified 0A Quantization-based Technique for Privacy Preserving Distributed Learning Jun 26, 2024 Privacy Preserving Quantization
— Unverified 0FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization Jun 26, 2024 Federated Learning Quantization
— Unverified 0Differential error feedback for communication-efficient decentralized learning Jun 26, 2024 Quantization
— Unverified 0ViT-1.58b: Mobile Vision Transformers in the 1-bit Era Jun 26, 2024 image-classification Image Classification
Code Code Available 1T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jun 25, 2024 Computational Efficiency CPU
Code Code Available 4Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels Jun 25, 2024 Language Modelling Large Language Model
Code Code Available 0CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization Jun 25, 2024 Quantization
— Unverified 0Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers Jun 25, 2024 Image Generation Model Compression
Code Code Available 2BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks Jun 24, 2024 Quantization
— Unverified 0Approximate DCT and Quantization Techniques for Energy-Constrained Image Sensors Jun 24, 2024 Quantization
— Unverified 0Leveraging Knowledge Distillation for Lightweight Skin Cancer Classification: Balancing Accuracy and Computational Efficiency Jun 24, 2024 Cancer Classification Computational Efficiency
— Unverified 0Reducing the Memory Footprint of 3D Gaussian Splatting Jun 24, 2024 Novel View Synthesis Quantization
— Unverified 0Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other Jun 24, 2024 Quantization
— Unverified 0ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models Jun 24, 2024 Quantization
Code Code Available 1Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging Jun 24, 2024 MMLU Model Compression
Code Code Available 1Received Power Maximization Using Nonuniform Discrete Phase Shifts for RISs With a Limited Phase Range Jun 23, 2024 2k Quantization
— Unverified 0Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study Jun 23, 2024 NeRF Quantization
— Unverified 0EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting Jun 22, 2024 Language Modeling Language Modelling
Code Code Available 2HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization Jun 21, 2024 Quantization
— Unverified 0FLoCoRA: Federated learning compression with low-rank adaptation Jun 20, 2024 Federated Learning Model Compression
Code Code Available 0Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE Jun 20, 2024 Quantization
— Unverified 0