| xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism | Nov 4, 2024 | GPU | CodeCode Available | 7 |
| "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | Nov 4, 2024 | GPULarge Language Model | —Unverified | 0 |
| RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Nov 4, 2024 | Answer GenerationGPU | CodeCode Available | 2 |
| Stochastic Communication Avoidance for Recommendation Systems | Nov 3, 2024 | Federated LearningGPU | —Unverified | 0 |
| CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks | Nov 2, 2024 | GPU | —Unverified | 0 |
| Fast and Memory-Efficient Video Diffusion Using Streamlined Inference | Nov 2, 2024 | GPUVideo Generation | CodeCode Available | 1 |
| NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference | Nov 2, 2024 | Code GenerationCPU | CodeCode Available | 0 |
| Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models | Nov 2, 2024 | GPU | —Unverified | 0 |
| Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference | Nov 1, 2024 | Decision MakingGaussian Processes | —Unverified | 0 |
| HopTrack: A Real-time Multi-Object Tracking System for Embedded Devices | Nov 1, 2024 | Autonomous DrivingGPU | CodeCode Available | 0 |