| Pruner: A Speculative Exploration Mechanism to Accelerate Tensor Program Tuning | Feb 4, 2024 | GPUTransfer Learning | CodeCode Available | 1 |
| Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks | Feb 3, 2024 | GPUMolecular Property Prediction | CodeCode Available | 1 |
| Scalable and Efficient Temporal Graph Representation Learning via Forward Recent Sampling | Feb 3, 2024 | GPUGraph Representation Learning | CodeCode Available | 0 |
| InferCept: Efficient Intercept Support for Augmented Large Language Model Inference | Feb 2, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| PRIME: Protect Your Videos From Malicious Editing | Feb 2, 2024 | GPU | CodeCode Available | 0 |
| Faster Inference of Integer SWIN Transformer by Removing the GELU Activation | Feb 2, 2024 | GPUimage-classification | —Unverified | 0 |
| Enriched Physics-informed Neural Networks for Dynamic Poisson-Nernst-Planck Systems | Feb 1, 2024 | GPU | —Unverified | 0 |
| Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | Feb 1, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| An Accurate and Low-Parameter Machine Learning Architecture for Next Location Prediction | Feb 1, 2024 | GPUPrediction | —Unverified | 0 |
| KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Jan 31, 2024 | GPUQuantization | CodeCode Available | 3 |
| Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages | Jan 31, 2024 | GPUReading Comprehension | —Unverified | 0 |
| Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers | Jan 31, 2024 | GPUWeather Forecasting | —Unverified | 0 |
| SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget | Jan 30, 2024 | GPUModel Compression | —Unverified | 0 |
| GPU Cluster Scheduling for Network-Sensitive Deep Learning | Jan 29, 2024 | Deep LearningGPU | —Unverified | 0 |
| SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design | Jan 29, 2024 | CPUGPU | CodeCode Available | 2 |
| M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining | Jan 29, 2024 | GPUzero-shot-classification | CodeCode Available | 0 |
| Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing | Jan 29, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy | Jan 26, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| The Case for Co-Designing Model Architectures with Hardware | Jan 25, 2024 | Deep LearningGPU | —Unverified | 0 |
| MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache | Jan 25, 2024 | GPUmodel | CodeCode Available | 3 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 |
| FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | Jan 25, 2024 | GPUQuantization | CodeCode Available | 3 |
| CNN architecture extraction on edge GPU | Jan 24, 2024 | GPUimage-classification | —Unverified | 0 |
| Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 | Jan 24, 2024 | GPUIn-Context Learning | —Unverified | 0 |
| InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction | Jan 23, 2024 | 3D Semantic Occupancy PredictionAutonomous Driving | CodeCode Available | 1 |
| Edge-Enabled Real-time Railway Track Segmentation | Jan 21, 2024 | GPUQuantization | —Unverified | 0 |
| immrax: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAX | Jan 21, 2024 | Computational EfficiencyGPU | CodeCode Available | 1 |
| A Lightweight FPGA-based IDS-ECU Architecture for Automotive CAN | Jan 19, 2024 | GPUIntrusion Detection | —Unverified | 0 |
| Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network Pruning | Jan 19, 2024 | GPUKnowledge Distillation | —Unverified | 0 |
| Exact analytical algorithm for solvent accessible surface area and derivatives in implicit solvent molecular simulations on GPUs | Jan 19, 2024 | CPUGPU | —Unverified | 0 |
| Towards providing reliable job completion time predictions using PCS | Jan 18, 2024 | FairnessGPU | CodeCode Available | 0 |
| Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices | Jan 17, 2024 | Dynamic neural networksGPU | CodeCode Available | 1 |
| PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Jan 17, 2024 | GPUIncremental Learning | CodeCode Available | 4 |
| Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | Jan 17, 2024 | GPUImage Classification | CodeCode Available | 2 |
| LoMA: Lossless Compressed Memory Attention | Jan 16, 2024 | GPU | —Unverified | 0 |
| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | Jan 16, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 |
| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| TP-Aware Dequantization | Jan 15, 2024 | GPUQuantization | —Unverified | 0 |
| Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search | Jan 14, 2024 | GPUimage-classification | CodeCode Available | 0 |
| Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound Diagnosis | Jan 14, 2024 | Anomaly ClassificationCancer Classification | CodeCode Available | 0 |
| Parameter-Efficient Detoxification with Contrastive Decoding | Jan 13, 2024 | AttributeGPU | —Unverified | 0 |
| E^2-LLM: Efficient and Extreme Length Extension of Large Language Models | Jan 13, 2024 | 4kGPU | —Unverified | 0 |
| Efficient Parallel Algorithms for Inpainting-Based Representations of 4K Images -- Part I: Homogeneous Diffusion Inpainting | Jan 12, 2024 | 4kGPU | —Unverified | 0 |
| Efficient Parallel Data Optimization for Homogeneous Diffusion Inpainting of 4K Images | Jan 12, 2024 | 4kGPU | —Unverified | 0 |
| Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction | Jan 12, 2024 | Bandwidth ExtensionCPU | CodeCode Available | 2 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 |
| PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU | Jan 11, 2024 | ClusteringGPU | —Unverified | 0 |
| MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring | Jan 11, 2024 | Data CompressionGPU | —Unverified | 0 |
| Towards Safe Load Balancing based on Control Barrier Functions and Deep Reinforcement Learning | Jan 10, 2024 | Decision MakingDeep Reinforcement Learning | —Unverified | 0 |
| PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models | Jan 10, 2024 | GPUImage Generation | CodeCode Available | 7 |