FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference Apr 19, 2025 Large Language Model Quantization
— Unverified 0Lightweight Road Environment Segmentation using Vector Quantization Apr 19, 2025 Autonomous Driving Image Segmentation
— Unverified 0Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs Apr 18, 2025 Quantization
— Unverified 0From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs Apr 18, 2025 Knowledge Distillation Model Compression
— Unverified 0The Binary and Ternary Quantization Can Improve Feature Discrimination Apr 18, 2025 Classification Quantization
— Unverified 0ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs Apr 17, 2025 Model Compression Quantization
Code Code Available 0FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning Apr 17, 2025 Federated Learning Quantization
— Unverified 0D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Apr 17, 2025 Mixture-of-Experts Model Compression
— Unverified 0GT-SVQ: A Linear-Time Graph Transformer for Node Classification Using Spiking Vector Quantization Apr 16, 2025 Graph Learning Graph Representation Learning
Code Code Available 0Résumé abstractif à partir d'une transcription audio Apr 16, 2025 Quantization
— Unverified 0ESC-MVQ: End-to-End Semantic Communication With Multi-Codebook Vector Quantization Apr 16, 2025 Decoder Quantization
— Unverified 0Neural Network Emulation of the Classical Limit in Quantum Systems via Learned Observable Mappings Apr 15, 2025 Philosophy Quantization
— Unverified 0GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM Apr 15, 2025 Quantization Reading Comprehension
— Unverified 0CSPLADE: Learned Sparse Retrieval with Causal Language Models Apr 15, 2025 Information Retrieval Quantization
— Unverified 0Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization Apr 13, 2025 Quantization
— Unverified 0Simultaneous Input and State Estimation under Output Quantization: A Gaussian Mixture approach Apr 13, 2025 Fault Detection Quantization
— Unverified 0Asymptotic stabilization under homomorphic encryption: A re-encryption free method Apr 12, 2025 Quantization
— Unverified 0Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning Apr 12, 2025 Federated Learning Quantization
— Unverified 0SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Apr 11, 2025 GPU Language Modeling
— Unverified 0MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization Apr 11, 2025 Image Generation Quantization
— Unverified 0Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion Apr 11, 2025 Image Generation Quantization
— Unverified 0MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer Apr 11, 2025 Motion Synthesis Quantization
— Unverified 0APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design Apr 10, 2025 Model Compression Quantization
Code Code Available 0PoGO: A Scalable Proof of Useful Work via Quantized Gradient Descent and Merkle Proofs Apr 10, 2025 GPU Quantization
— Unverified 0CHIME: A Compressive Framework for Holistic Interest Modeling Apr 9, 2025 Contrastive Learning Quantization
— Unverified 0BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation Apr 9, 2025 Quantization Recommendation Systems
— Unverified 0Achieving binary weight and activation for LLMs using Post-Training Quantization Apr 7, 2025 Quantization
— Unverified 0Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models Apr 7, 2025 Adversarial Robustness Diversity
— Unverified 0AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Apr 7, 2025 Quantization
— Unverified 0Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs Apr 7, 2025 Benchmarking Fairness
Code Code Available 0Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization Apr 7, 2025 Quantization Self-Supervised Learning
— Unverified 0Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection Apr 7, 2025 Autonomous Driving Decoder
— Unverified 0PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Apr 7, 2025 CPU GPU
Code Code Available 0Skin Color Measurement from Dermatoscopic Images: An Evaluation on a Synthetic Dataset Apr 6, 2025 Quantization
— Unverified 0Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications Apr 5, 2025 Autonomous Driving Image Reconstruction
— Unverified 0Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions Apr 4, 2025 Language Modeling Language Modelling
— Unverified 0Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats Apr 4, 2025 Cloud Detection Quantization
— Unverified 0Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Apr 4, 2025 Benchmarking GSM8K
— Unverified 0Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization Apr 3, 2025 3DGS 3D Reconstruction
Code Code Available 0HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement Apr 3, 2025 Image Enhancement Low-Light Image Enhancement
— Unverified 0Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression Apr 3, 2025 Image Compression Quantization
— Unverified 0Moment Quantization for Video Temporal Grounding Apr 3, 2025 Quantization Video Understanding
— Unverified 0When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks Apr 2, 2025 Benchmarking Language Modeling
— Unverified 0LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi Apr 2, 2025 Computational Efficiency Quantization
— Unverified 0QSViT: A Methodology for Quantizing Spiking Vision Transformers Apr 1, 2025 Quantization
— Unverified 0Model Hemorrhage and the Robustness Limits of Large Language Models Mar 31, 2025 Quantization
— Unverified 0Style Quantization for Data-Efficient GAN Training Mar 31, 2025 Navigate Quantization
— Unverified 0SQuat: Subspace-orthogonal KV Cache Quantization Mar 31, 2025 Quantization
— Unverified 0Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference Mar 30, 2025 GPU Quantization
— Unverified 0NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations Mar 29, 2025 3DGS NeRF
— Unverified 0