| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 |
| TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second | Jul 5, 2022 | AutoMLBayesian Inference | CodeCode Available | 5 |
| Multi-head Temporal Latent Attention | May 19, 2025 | GPUspeech-recognition | CodeCode Available | 4 |
| Accelerating Visual-Policy Learning through Parallel Differentiable Simulation | May 15, 2025 | GPU | CodeCode Available | 4 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 |
| LettuceDetect: A Hallucination Detection Framework for RAG Applications | Feb 24, 2025 | 8kGPU | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| KernelBench: Can LLMs Write Efficient GPU Kernels? | Feb 14, 2025 | GPU | CodeCode Available | 4 |