| LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | Jan 7, 2025 | GPUVisual Question Answering (VQA) | CodeCode Available | 4 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| SocialED: A Python Library for Social Event Detection | Dec 18, 2024 | CPUEvent Detection | CodeCode Available | 4 |
| SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models | Nov 7, 2024 | GPUQuantization | CodeCode Available | 4 |
| DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Oct 14, 2024 | GPUQuantization | CodeCode Available | 4 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding | Sep 22, 2024 | Anomaly DetectionGPU | CodeCode Available | 4 |
| EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Aug 21, 2024 | 3D Instance SegmentationGPU | CodeCode Available | 4 |
| Deep Patch Visual SLAM | Aug 3, 2024 | GPUVisual Odometry | CodeCode Available | 4 |
| GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS | Aug 2, 2024 | GPUNavigate | CodeCode Available | 4 |