Compute-Optimal LLMs Provably Generalize Better With Scale Apr 21, 2025 Generalization Bounds Quantization
— Unverified 0StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models Apr 21, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Efficient Implicit Neural Compression of Point Clouds via Learnable Activation in Latent Space Apr 20, 2025 Attribute Decoder
— Unverified 0NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models Apr 20, 2025 Quantization
Code Code Available 1FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference Apr 19, 2025 Large Language Model Quantization
— Unverified 0Lightweight Road Environment Segmentation using Vector Quantization Apr 19, 2025 Autonomous Driving Image Segmentation
— Unverified 0Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs Apr 18, 2025 Quantization
— Unverified 0The Binary and Ternary Quantization Can Improve Feature Discrimination Apr 18, 2025 Classification Quantization
— Unverified 0From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs Apr 18, 2025 Knowledge Distillation Model Compression
— Unverified 0D^2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Apr 17, 2025 Mixture-of-Experts Model Compression
— Unverified 0ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs Apr 17, 2025 Model Compression Quantization
Code Code Available 0Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Apr 17, 2025 Code Generation CPU
Code Code Available 7FedX: Adaptive Model Decomposition and Quantization for IoT Federated Learning Apr 17, 2025 Federated Learning Quantization
— Unverified 0Hierarchical Vector Quantized Graph Autoencoder with Annealing-Based Code Selection Apr 17, 2025 Link Prediction Node Classification
Code Code Available 1GT-SVQ: A Linear-Time Graph Transformer for Node Classification Using Spiking Vector Quantization Apr 16, 2025 Graph Learning Graph Representation Learning
Code Code Available 0Résumé abstractif à partir d'une transcription audio Apr 16, 2025 Quantization
— Unverified 0ESC-MVQ: End-to-End Semantic Communication With Multi-Codebook Vector Quantization Apr 16, 2025 Decoder Quantization
— Unverified 0Enhancing Autonomous Driving Systems with On-Board Deployed Large Language Models Apr 15, 2025 Autonomous Driving Computational Efficiency
Code Code Available 2GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM Apr 15, 2025 Quantization Reading Comprehension
— Unverified 0CSPLADE: Learned Sparse Retrieval with Causal Language Models Apr 15, 2025 Information Retrieval Quantization
— Unverified 0Neural Network Emulation of the Classical Limit in Quantum Systems via Learned Observable Mappings Apr 15, 2025 Philosophy Quantization
— Unverified 0Simultaneous Input and State Estimation under Output Quantization: A Gaussian Mixture approach Apr 13, 2025 Fault Detection Quantization
— Unverified 0Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization Apr 13, 2025 Quantization
— Unverified 0Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning Apr 12, 2025 Federated Learning Quantization
— Unverified 0Asymptotic stabilization under homomorphic encryption: A re-encryption free method Apr 12, 2025 Quantization
— Unverified 0MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer Apr 11, 2025 Motion Synthesis Quantization
— Unverified 0SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting Apr 11, 2025 GPU Language Modeling
— Unverified 0Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion Apr 11, 2025 Image Generation Quantization
— Unverified 0MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization Apr 11, 2025 Image Generation Quantization
— Unverified 0APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design Apr 10, 2025 Model Compression Quantization
Code Code Available 0Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression Apr 10, 2025 Math MMLU
Code Code Available 1PoGO: A Scalable Proof of Useful Work via Quantized Gradient Descent and Merkle Proofs Apr 10, 2025 GPU Quantization
— Unverified 0CHIME: A Compressive Framework for Holistic Interest Modeling Apr 9, 2025 Contrastive Learning Quantization
— Unverified 0BBQRec: Behavior-Bind Quantization for Multi-Modal Sequential Recommendation Apr 9, 2025 Quantization Recommendation Systems
— Unverified 0Achieving binary weight and activation for LLMs using Post-Training Quantization Apr 7, 2025 Quantization
— Unverified 0AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design Apr 7, 2025 Quantization
— Unverified 0Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models Apr 7, 2025 Math Quantization
Code Code Available 2PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Apr 7, 2025 CPU GPU
Code Code Available 0Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models Apr 7, 2025 Adversarial Robustness Diversity
— Unverified 0Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization Apr 7, 2025 Quantization Self-Supervised Learning
— Unverified 0Balancing Robustness and Efficiency in Embedded DNNs Through Activation Function Selection Apr 7, 2025 Autonomous Driving Decoder
— Unverified 0Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs Apr 7, 2025 Benchmarking Fairness
Code Code Available 0Skin Color Measurement from Dermatoscopic Images: An Evaluation on a Synthetic Dataset Apr 6, 2025 Quantization
— Unverified 0Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications Apr 5, 2025 Autonomous Driving Image Reconstruction
— Unverified 0Efficient FPGA-accelerated Convolutional Neural Networks for Cloud Detection on CubeSats Apr 4, 2025 Cloud Detection Quantization
— Unverified 0Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions Apr 4, 2025 Language Modeling Language Modelling
— Unverified 0Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency Apr 4, 2025 Benchmarking GSM8K
— Unverified 0Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization Apr 3, 2025 3DGS 3D Reconstruction
Code Code Available 0MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators Apr 3, 2025 Mixture-of-Experts Quantization
Code Code Available 1APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers Apr 3, 2025 Quantization
Code Code Available 1