| LookupFFN: Making Transformers Compute-lite for CPU inference | Mar 12, 2024 | CPUGPU | CodeCode Available | 1 |
| UniSparse: An Intermediate Language for General Sparse Format Customization | Mar 9, 2024 | AttributeCode Generation | CodeCode Available | 1 |
| LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization | Mar 2, 2024 | GPUQuantization | CodeCode Available | 1 |
| Efficient Lifelong Model Evaluation in an Era of Rapid Progress | Feb 29, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation | Feb 27, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control | Feb 27, 2024 | GPUImage Retrieval | CodeCode Available | 1 |
| PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures | Feb 26, 2024 | CPUGPU | CodeCode Available | 1 |
| Mechanistic Neural Networks for Scientific Machine Learning | Feb 20, 2024 | Equation DiscoveryGPU | CodeCode Available | 1 |
| BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation | Feb 18, 2024 | GPUQuestion Answering | CodeCode Available | 1 |
| Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment | Feb 15, 2024 | GPUReinforcement Learning (RL) | CodeCode Available | 1 |