| Pruner: A Speculative Exploration Mechanism to Accelerate Tensor Program Tuning | Feb 4, 2024 | GPUTransfer Learning | CodeCode Available | 1 |
| Scalable and Efficient Temporal Graph Representation Learning via Forward Recent Sampling | Feb 3, 2024 | GPUGraph Representation Learning | CodeCode Available | 0 |
| Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks | Feb 3, 2024 | GPUMolecular Property Prediction | CodeCode Available | 1 |
| InferCept: Efficient Intercept Support for Augmented Large Language Model Inference | Feb 2, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| PRIME: Protect Your Videos From Malicious Editing | Feb 2, 2024 | GPU | CodeCode Available | 0 |
| Faster Inference of Integer SWIN Transformer by Removing the GELU Activation | Feb 2, 2024 | GPUimage-classification | —Unverified | 0 |
| Enriched Physics-informed Neural Networks for Dynamic Poisson-Nernst-Planck Systems | Feb 1, 2024 | GPU | —Unverified | 0 |
| An Accurate and Low-Parameter Machine Learning Architecture for Next Location Prediction | Feb 1, 2024 | GPUPrediction | —Unverified | 0 |
| Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | Feb 1, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Jan 31, 2024 | GPUQuantization | CodeCode Available | 3 |
| Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers | Jan 31, 2024 | GPUWeather Forecasting | —Unverified | 0 |
| Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages | Jan 31, 2024 | GPUReading Comprehension | —Unverified | 0 |
| SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget | Jan 30, 2024 | GPUModel Compression | —Unverified | 0 |
| GPU Cluster Scheduling for Network-Sensitive Deep Learning | Jan 29, 2024 | Deep LearningGPU | —Unverified | 0 |
| SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design | Jan 29, 2024 | CPUGPU | CodeCode Available | 2 |
| M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining | Jan 29, 2024 | GPUzero-shot-classification | CodeCode Available | 0 |
| Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing | Jan 29, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy | Jan 26, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| The Case for Co-Designing Model Architectures with Hardware | Jan 25, 2024 | Deep LearningGPU | —Unverified | 0 |
| FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | Jan 25, 2024 | GPUQuantization | CodeCode Available | 3 |
| MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache | Jan 25, 2024 | GPUmodel | CodeCode Available | 3 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 |
| CNN architecture extraction on edge GPU | Jan 24, 2024 | GPUimage-classification | —Unverified | 0 |
| Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 | Jan 24, 2024 | GPUIn-Context Learning | —Unverified | 0 |
| InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction | Jan 23, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 1 |