Qwen2 Technical Report Jul 15, 2024 Arithmetic Reasoning GSM8K
Code Code Available 13IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 11CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 11SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Aug 10, 2024 Hallucination Optical Character Recognition
Code Code Available 11FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jul 11, 2024 GPU Quantization
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11OpenVLA: An Open-Source Vision-Language-Action Model Jun 13, 2024 Imitation Learning Language Modelling
Code Code Available 9SageAttention2++: A More Efficient Implementation of SageAttention2 May 27, 2025 Quantization Video Generation
Code Code Available 7Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Apr 17, 2025 Code Generation CPU
Code Code Available 7SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Nov 17, 2024 Image Generation Quantization
Code Code Available 7Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Oct 10, 2024 4k Image Animation
Code Code Available 7SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Oct 3, 2024 Image Generation Quantization
Code Code Available 7Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration Apr 24, 2024 Management Prompt Engineering
Code Code Available 7Chronos: Learning the Language of Time Series Mar 12, 2024 Gaussian Processes Language Modeling
Code Code Available 7From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Jan 3, 2024 Diversity Quantization
Code Code Available 7GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Oct 31, 2022 GPU Language Modelling
Code Code Available 7SqueezeLLM: Dense-and-Sparse Quantization Jun 13, 2023 GPU Quantization
Code Code Available 6AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Jun 1, 2023 Autonomous Driving Cloud Computing
Code Code Available 6QLoRA: Efficient Finetuning of Quantized LLMs May 23, 2023 Chatbot GPU
Code Code Available 6SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Nov 18, 2022 Quantization
Code Code Available 6GLM-130B: An Open Bilingual Pre-trained Model Oct 5, 2022 Language Modeling Language Modelling
Code Code Available 6Quantized Training of Gradient Boosting Decision Trees Jul 20, 2022 Quantization
Code Code Available 6SCBench: A KV Cache-Centric Analysis of Long-Context Methods Dec 13, 2024 Mamba Quantization
Code Code Available 5Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Aug 22, 2024 Chatbot Instruction Following
Code Code Available 5MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Aug 21, 2024 GPU Quantization
Code Code Available 5Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Jul 11, 2024 Quantization
Code Code Available 5Autoregressive Image Generation without Vector Quantization Jun 17, 2024 Image Generation Quantization
Code Code Available 5SpinQuant: LLM quantization with learned rotations May 26, 2024 Quantization
Code Code Available 5PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression May 23, 2024 Quantization
Code Code Available 5SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks Apr 15, 2024 Quantization
Code Code Available 5Extreme Compression of Large Language Models via Additive Quantization Jan 11, 2024 CPU GPU
Code Code Available 5CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving Oct 11, 2023 Language Modeling Language Modelling
Code Code Available 5QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Sep 26, 2023 Quantization
Code Code Available 5MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Sep 19, 2022 Decoder Image Generation
Code Code Available 5YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Sep 7, 2022 GPU Object Detection
Code Code Available 5LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Aug 15, 2022 GPU Language Modelling
Code Code Available 5BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster Apr 3, 2022 AutoML Distributed Computing
Code Code Available 5Scaling Law for Quantization-Aware Training May 20, 2025 Quantization
Code Code Available 4UniTok: A Unified Tokenizer for Visual Generation and Understanding Feb 27, 2025 Quantization
Code Code Available 4Autoregressive Video Generation without Vector Quantization Dec 18, 2024 Image Generation Prediction
Code Code Available 4Taming Scalable Visual Tokenizer for Autoregressive Image Generation Dec 3, 2024 Image Generation Image Reconstruction
Code Code Available 4SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Nov 7, 2024 GPU Quantization
Code Code Available 4BitNet a4.8: 4-bit Activations for 1-bit LLMs Nov 7, 2024 Quantization
Code Code Available 4SNAC: Multi-Scale Neural Audio Codec Oct 18, 2024 Audio Compression Audio Generation
Code Code Available 4DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Oct 14, 2024 GPU Quantization
Code Code Available 4Restructuring Vector Quantization with the Rotation Trick Oct 8, 2024 Quantization
Code Code Available 4VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Sep 25, 2024 Quantization
Code Code Available 4T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Jun 25, 2024 Computational Efficiency CPU
Code Code Available 4LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit May 9, 2024 Benchmarking Computational Efficiency
Code Code Available 4QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving May 7, 2024 GPU Language Modelling
Code Code Available 4