| TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | Apr 18, 2024 | GPU | CodeCode Available | 3 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| Allo: A Programming Model for Composable Accelerator Design | Apr 7, 2024 | GPUHigh-Level Synthesis | CodeCode Available | 3 |
| BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models | Apr 3, 2024 | GPUMath | CodeCode Available | 3 |
| Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration | Apr 2, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA | Apr 1, 2024 | GPUMultiobjective Optimization | CodeCode Available | 3 |
| 94% on CIFAR-10 in 3.29 Seconds on a Single GPU | Mar 30, 2024 | GPU | CodeCode Available | 3 |
| The Unreasonable Ineffectiveness of the Deeper Layers | Mar 26, 2024 | GPUQuantization | CodeCode Available | 3 |
| GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting | Mar 13, 2024 | GPUQuantization | CodeCode Available | 3 |
| Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve | Mar 4, 2024 | GPUScheduling | CodeCode Available | 3 |