Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models Jan 16, 2024 GPU Quantization
Code Code Available 35 Scaling Transformers for Low-Bitrate High-Quality Speech Coding Nov 29, 2024 Quantization
Code Code Available 35 BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Feb 6, 2024 Binarization GPU
Code Code Available 35 Fast Matrix Multiplications for Lookup Table-Quantized LLMs Jul 15, 2024 Quantization
Code Code Available 35 Addressing Representation Collapse in Vector Quantized Models with One Linear Layer Nov 4, 2024 Quantization Representation Learning
Code Code Available 35 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Jul 10, 2024 GPU Quantization
Code Code Available 35 BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec Sep 9, 2024 Quantization
Code Code Available 35 RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation Jan 9, 2024 GPU Math
Code Code Available 35 DPLM-2: A Multimodal Diffusion Protein Language Model Oct 17, 2024 Language Modeling Language Modelling
Code Code Available 35 PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models Apr 3, 2024 GSM8K Quantization
Code Code Available 35 Pushing the Limits of Large Language Model Quantization via the Linearity Theorem Nov 26, 2024 GPU Language Modeling
Code Code Available 35 Data Generation for Hardware-Friendly Post-Training Quantization Oct 29, 2024 Data Augmentation GPU
Code Code Available 35 CV-VAE: A Compatible Video VAE for Latent Generative Video Models May 30, 2024 Quantization
Code Code Available 35 Behavior Generation with Latent Actions Mar 5, 2024 Autonomous Driving Decision Making
Code Code Available 35 MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Jan 2, 2025 Contrastive Learning Key Detection
Code Code Available 35 MotionGPT: Human Motion as a Foreign Language Jun 26, 2023 Language Modeling Language Modelling
Code Code Available 35 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Mar 5, 2024 Quantization Speech Synthesis
Code Code Available 35 8-bit Optimizers via Block-wise Quantization Oct 6, 2021 Language Modeling Language Modelling
Code Code Available 35 FlatQuant: Flatness Matters for LLM Quantization Oct 12, 2024 Quantization
Code Code Available 35 Compact 3D Scene Representation via Self-Organizing Gaussian Grids Dec 19, 2023 3DGS
Code Code Available 35 Highly Compressed Tokenizer Can Generate Without Training Jun 9, 2025 Image Generation Quantization
Code Code Available 35 OneBit: Towards Extremely Low-bit Large Language Models Feb 17, 2024 Quantization
Code Code Available 35 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Dec 4, 2024 Image Generation Image Reconstruction
Code Code Available 35 Atom: Low-bit Quantization for Efficient and Accurate LLM Serving Oct 29, 2023 GPU Quantization
Code Code Available 25 MAUVE Scores for Generative Models: Theory and Practice Dec 30, 2022 Quantization
Code Code Available 25 MBQ: Modality-Balanced Quantization for Large Vision-Language Models Dec 27, 2024 GPU Quantization
Code Code Available 25 Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding Feb 3, 2025 Quantization
Code Code Available 25 MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More Oct 8, 2024 Mixture-of-Experts Quantization
Code Code Available 25 Accurate LoRA-Finetuning Quantization of LLMs via Information Retention Feb 8, 2024 MMLU Quantization
Code Code Available 25 MAexp: A Generic Platform for RL-based Multi-Agent Exploration Apr 19, 2024 Diversity Multi-agent Reinforcement Learning
Code Code Available 25 4-bit Conformer with Native Quantization Aware Training for Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification Mar 14, 2024 Classification Crowd Counting
Code Code Available 25 MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training May 31, 2023 Language Modelling Quantization
Code Code Available 25 LoQT: Low-Rank Adapters for Quantized Pretraining May 26, 2024 GPU Language Modeling
Code Code Available 25 LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search Oct 24, 2024 Clustering GPU
Code Code Available 25 A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation Oct 2, 2024 Image Generation Quantization
Code Code Available 25 LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Oct 12, 2023 Natural Language Understanding Quantization
Code Code Available 25 Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search Jan 16, 2025 Quantization
Code Code Available 25 LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS Nov 28, 2023 Knowledge Distillation NeRF
Code Code Available 25 LLM-FP4: 4-Bit Floating-Point Quantized Transformers Oct 25, 2023 Common Sense Reasoning Quantization
Code Code Available 25 Low-Rank Quantization-Aware Training for LLMs Jun 10, 2024 GPU parameter-efficient fine-tuning
Code Code Available 25 MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization Jul 14, 2025 2k Image Generation
Code Code Available 25 KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Jul 1, 2024 Book summarization Quantization
Code Code Available 25 LeanVec: Searching vectors faster by making them fit Dec 26, 2023 Cross-Modal Retrieval Dimensionality Reduction
Code Code Available 25 Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Mar 19, 2024 Quantization
Code Code Available 25 INT-FlashAttention: Enabling Flash Attention for INT8 Quantization Sep 25, 2024 GPU Quantization
Code Code Available 25 I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference Jul 4, 2022 Quantization
Code Code Available 25 Bolt: Accelerated Data Mining with Fast Vector Compression Jun 30, 2017 Quantization
Code Code Available 25 Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs Feb 16, 2024 Quantization
Code Code Available 25 BMInf: An Efficient Toolkit for Big Model Inference and Tuning May 1, 2022 CPU GPU
Code Code Available 25