Qwen2 Technical Report Jul 15, 2024 Arithmetic Reasoning GSM8K
Code Code Available 135 SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Aug 10, 2024 Hallucination Optical Character Recognition
Code Code Available 115 CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 115 CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 115 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Jul 11, 2024 GPU Quantization
Code Code Available 115 IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Feb 8, 2025 Decoder Language Modeling
Code Code Available 115 OpenVLA: An Open-Source Vision-Language-Action Model Jun 13, 2024 Imitation Learning Language Modelling
Code Code Available 95 SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Oct 3, 2024 Image Generation Quantization
Code Code Available 75 Chronos: Learning the Language of Time Series Mar 12, 2024 Gaussian Processes Language Modeling
Code Code Available 75 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations Jan 3, 2024 Diversity Quantization
Code Code Available 75 SageAttention2++: A More Efficient Implementation of SageAttention2 May 27, 2025 Quantization Video Generation
Code Code Available 75 SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization Nov 17, 2024 Image Generation Quantization
Code Code Available 75 Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration Apr 24, 2024 Management Prompt Engineering
Code Code Available 75 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Oct 10, 2024 4k Image Animation
Code Code Available 75 Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Apr 17, 2025 Code Generation CPU
Code Code Available 75 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Oct 31, 2022 GPU Language Modelling
Code Code Available 75 Quantized Training of Gradient Boosting Decision Trees Jul 20, 2022 Quantization
Code Code Available 65 QLoRA: Efficient Finetuning of Quantized LLMs May 23, 2023 Chatbot GPU
Code Code Available 65 GLM-130B: An Open Bilingual Pre-trained Model Oct 5, 2022 Language Modeling Language Modelling
Code Code Available 65 AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Jun 1, 2023 Autonomous Driving Cloud Computing
Code Code Available 65 SqueezeLLM: Dense-and-Sparse Quantization Jun 13, 2023 GPU Quantization
Code Code Available 65 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Nov 18, 2022 Quantization
Code Code Available 65 CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving Oct 11, 2023 Language Modeling Language Modelling
Code Code Available 55 Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Jul 11, 2024 Quantization
Code Code Available 55 SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks Apr 15, 2024 Quantization
Code Code Available 55 Extreme Compression of Large Language Models via Additive Quantization Jan 11, 2024 CPU GPU
Code Code Available 55 BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster Apr 3, 2022 AutoML Distributed Computing
Code Code Available 55 PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression May 23, 2024 Quantization
Code Code Available 55 MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Sep 19, 2022 Decoder Image Generation
Code Code Available 55 SpinQuant: LLM quantization with learned rotations May 26, 2024 Quantization
Code Code Available 55 LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Aug 15, 2022 GPU Language Modelling
Code Code Available 55 Autoregressive Image Generation without Vector Quantization Jun 17, 2024 Image Generation Quantization
Code Code Available 55 SCBench: A KV Cache-Centric Analysis of Long-Context Methods Dec 13, 2024 Mamba Quantization
Code Code Available 55 Jamba-1.5: Hybrid Transformer-Mamba Models at Scale Aug 22, 2024 Chatbot Instruction Following
Code Code Available 55 QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Sep 26, 2023 Quantization
Code Code Available 55 YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Sep 7, 2022 GPU Object Detection
Code Code Available 55 MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Aug 21, 2024 GPU Quantization
Code Code Available 55 Restructuring Vector Quantization with the Rotation Trick Oct 8, 2024 Quantization
Code Code Available 45 QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Feb 6, 2024 Quantization
Code Code Available 45 FP8 Formats for Deep Learning Sep 12, 2022 Deep Learning Quantization
Code Code Available 45 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving May 7, 2024 GPU Language Modelling
Code Code Available 45 Fast Inference of Mixture-of-Experts Language Models with Offloading Dec 28, 2023 Mixture-of-Experts Quantization
Code Code Available 45 Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Sep 11, 2023 Quantization
Code Code Available 45 BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation Feb 16, 2024 Knowledge Distillation Quantization
Code Code Available 45 Billion-scale similarity search with GPUs Feb 28, 2017 GPU Image Similarity Search
Code Code Available 45 Polysemous codes Sep 7, 2016 Quantization
Code Code Available 45 SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot Jan 2, 2023 Common Sense Reasoning Language Modelling
Code Code Available 45 BitNet a4.8: 4-bit Activations for 1-bit LLMs Nov 7, 2024 Quantization
Code Code Available 45 Link and code: Fast indexing with graphs and compact regression codes Apr 26, 2018 Image Similarity Search Quantization
Code Code Available 45 Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO Nov 8, 2023 Quantization Text Generation
Code Code Available 45