| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 |
| EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Jul 10, 2024 | GPUQuantization | CodeCode Available | 3 |
| EvoTorch: Scalable Evolutionary Computation in Python | Feb 24, 2023 | GPUreinforcement-learning | CodeCode Available | 3 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| OctFusion: Octree-based Diffusion Models for 3D Shape Generation | Aug 27, 2024 | 3D Generation3D Shape Generation | CodeCode Available | 3 |
| ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Aug 16, 2024 | GPUModel Compression | CodeCode Available | 3 |
| Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jun 23, 2025 | Domain AdaptationGPU | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 |
| APOLLO: SGD-like Memory, AdamW-level Performance | Dec 6, 2024 | GPUQuantization | CodeCode Available | 3 |