| CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception | Apr 29, 2024 | Data VisualizationDecision Making | CodeCode Available | 1 |
| LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | Apr 29, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method | Apr 23, 2024 | DenoisingGPU | CodeCode Available | 1 |
| Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity | Apr 22, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| Evaluating Retrieval Quality in Retrieval-Augmented Generation | Apr 21, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs | Apr 16, 2024 | DecoderGPU | CodeCode Available | 1 |
| Interpolating neural network: A novel unification of machine learning and interpolation theory | Apr 16, 2024 | GPUPhysical Simulations | CodeCode Available | 1 |
| CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models | Apr 12, 2024 | GPU | CodeCode Available | 1 |
| Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding | Apr 10, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| LIPT: Latency-aware Image Processing Transformer | Apr 9, 2024 | DenoisingGPU | CodeCode Available | 1 |