| XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Nov 22, 2024 | GPU | CodeCode Available | 5 |
| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 |
| KBLaM: Knowledge Base augmented Language Model | Oct 14, 2024 | 8kGPU | CodeCode Available | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 |
| Fast On-device LLM Inference with NPUs | Jul 8, 2024 | CPUGPU | CodeCode Available | 5 |
| FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion | Jun 11, 2024 | GPU | CodeCode Available | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 |
| FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning | Feb 29, 2024 | GPULanguage Modeling | CodeCode Available | 5 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 |
| LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models | Nov 8, 2023 | 8kGPU | CodeCode Available | 5 |
| ReLoRA: High-Rank Training Through Low-Rank Updates | Jul 11, 2023 | GPU | CodeCode Available | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 |
| FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | Mar 13, 2023 | CPUGPU | CodeCode Available | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 |
| YOLOv6 v3.0: A Full-Scale Reloading | Jan 13, 2023 | GPUObject Detection | CodeCode Available | 5 |
| Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments | Jan 10, 2023 | GPUImitation Learning | CodeCode Available | 5 |
| Point-E: A System for Generating 3D Point Clouds from Complex Prompts | Dec 16, 2022 | Generating 3D Point CloudsGPU | CodeCode Available | 5 |
| Deep Lake: a Lakehouse for Deep Learning | Sep 22, 2022 | Decision MakingDeep Learning | CodeCode Available | 5 |
| YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications | Sep 7, 2022 | GPUObject Detection | CodeCode Available | 5 |
| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 |
| TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second | Jul 5, 2022 | AutoMLBayesian Inference | CodeCode Available | 5 |
| Multi-head Temporal Latent Attention | May 19, 2025 | GPUspeech-recognition | CodeCode Available | 4 |
| Accelerating Visual-Policy Learning through Parallel Differentiable Simulation | May 15, 2025 | GPU | CodeCode Available | 4 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 |
| LettuceDetect: A Hallucination Detection Framework for RAG Applications | Feb 24, 2025 | 8kGPU | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| KernelBench: Can LLMs Write Efficient GPU Kernels? | Feb 14, 2025 | GPU | CodeCode Available | 4 |
| LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | Jan 7, 2025 | GPUVisual Question Answering (VQA) | CodeCode Available | 4 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| SocialED: A Python Library for Social Event Detection | Dec 18, 2024 | CPUEvent Detection | CodeCode Available | 4 |
| SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Nov 7, 2024 | GPUQuantization | CodeCode Available | 4 |
| DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Oct 14, 2024 | GPUQuantization | CodeCode Available | 4 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding | Sep 22, 2024 | Anomaly DetectionGPU | CodeCode Available | 4 |
| EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Aug 21, 2024 | 3D Instance SegmentationGPU | CodeCode Available | 4 |
| Deep Patch Visual SLAM | Aug 3, 2024 | GPUVisual Odometry | CodeCode Available | 4 |
| GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS | Aug 2, 2024 | GPUNavigate | CodeCode Available | 4 |
| NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals | Jul 18, 2024 | Experimental DesignGPU | CodeCode Available | 4 |
| fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Jul 1, 2024 | GPUPoint cloud reconstruction | CodeCode Available | 4 |
| On Scaling Up 3D Gaussian Splatting Training | Jun 26, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 |
| Mamba YOLO: A Simple Baseline for Object Detection with State Space Model | Jun 9, 2024 | GPUMamba | CodeCode Available | 4 |
| Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation | Jun 4, 2024 | Face SwappingGPU | CodeCode Available | 4 |
| Looking Backward: Streaming Video-to-Video Translation with Feature Banks | May 24, 2024 | GPUTranslation | CodeCode Available | 4 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 |
| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | May 7, 2024 | GPULanguage Modelling | CodeCode Available | 4 |
| Mamba-FETrack: Frame-Event Tracking via State Space Model | Apr 28, 2024 | GPUMamba | CodeCode Available | 4 |
| JetMoE: Reaching Llama2 Performance with 0.1M Dollars | Apr 11, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |