Qwen2 Technical Report Jul 15, 2024 Arithmetic Reasoning GSM8K
Code Code Available 13IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 11CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 11SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Aug 10, 2024 Hallucination Optical Character Recognition
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jul 11, 2024 GPU Quantization
Code Code Available 11OpenVLA: An Open-Source Vision-Language-Action Model Jun 13, 2024 Imitation Learning Language Modelling
Code Code Available 9SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Nov 17, 2024 Image Generation Quantization
Code Code Available 7Chronos: Learning the Language of Time Series Mar 12, 2024 Gaussian Processes Language Modeling
Code Code Available 7SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Oct 3, 2024 Image Generation Quantization
Code Code Available 7From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Jan 3, 2024 Diversity Quantization
Code Code Available 7SageAttention2++: A More Efficient Implementation of SageAttention2 May 27, 2025 Quantization Video Generation
Code Code Available 7Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Oct 10, 2024 4k Image Animation
Code Code Available 7GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Oct 31, 2022 GPU Language Modelling
Code Code Available 7Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Apr 17, 2025 Code Generation CPU
Code Code Available 7Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration Apr 24, 2024 Management Prompt Engineering
Code Code Available 7Quantized Training of Gradient Boosting Decision Trees Jul 20, 2022 Quantization
Code Code Available 6QLoRA: Efficient Finetuning of Quantized LLMs May 23, 2023 Chatbot GPU
Code Code Available 6GLM-130B: An Open Bilingual Pre-trained Model Oct 5, 2022 Language Modeling Language Modelling
Code Code Available 6AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Jun 1, 2023 Autonomous Driving Cloud Computing
Code Code Available 6SqueezeLLM: Dense-and-Sparse Quantization Jun 13, 2023 GPU Quantization
Code Code Available 6SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Nov 18, 2022 Quantization
Code Code Available 6CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving Oct 11, 2023 Language Modeling Language Modelling
Code Code Available 5Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Jul 11, 2024 Quantization
Code Code Available 5SpinQuant: LLM quantization with learned rotations May 26, 2024 Quantization
Code Code Available 5Extreme Compression of Large Language Models via Additive Quantization Jan 11, 2024 CPU GPU
Code Code Available 5BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster Apr 3, 2022 AutoML Distributed Computing
Code Code Available 5MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Sep 19, 2022 Decoder Image Generation
Code Code Available 5PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression May 23, 2024 Quantization
Code Code Available 5SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks Apr 15, 2024 Quantization
Code Code Available 5Autoregressive Image Generation without Vector Quantization Jun 17, 2024 Image Generation Quantization
Code Code Available 5QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Sep 26, 2023 Quantization
Code Code Available 5Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Aug 22, 2024 Chatbot Instruction Following
Code Code Available 5MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Aug 21, 2024 GPU Quantization
Code Code Available 5SCBench: A KV Cache-Centric Analysis of Long-Context Methods Dec 13, 2024 Mamba Quantization
Code Code Available 5YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Sep 7, 2022 GPU Object Detection
Code Code Available 5LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Aug 15, 2022 GPU Language Modelling
Code Code Available 5Restructuring Vector Quantization with the Rotation Trick Oct 8, 2024 Quantization
Code Code Available 4QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Feb 6, 2024 Quantization
Code Code Available 4FP8 Formats for Deep Learning Sep 12, 2022 Deep Learning Quantization
Code Code Available 4QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving May 7, 2024 GPU Language Modelling
Code Code Available 4Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Sep 11, 2023 Quantization
Code Code Available 4Efficient Post-training Quantization with FP8 Formats Sep 26, 2023 image-classification Image Classification
Code Code Available 4Polysemous codes Sep 7, 2016 Quantization
Code Code Available 4BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation Feb 16, 2024 Knowledge Distillation Quantization
Code Code Available 4Fast Inference of Mixture-of-Experts Language Models with Offloading Dec 28, 2023 Mixture-of-Experts Quantization
Code Code Available 4SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot Jan 2, 2023 Common Sense Reasoning Language Modelling
Code Code Available 4Billion-scale similarity search with GPUs Feb 28, 2017 GPU Image Similarity Search
Code Code Available 4DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads Oct 14, 2024 GPU Quantization
Code Code Available 4LLM Inference Unveiled: Survey and Roofline Model Insights Feb 26, 2024 Knowledge Distillation Language Modelling
Code Code Available 4