| Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation | Aug 7, 2024 | GPUQuestion Answering | —Unverified | 0 |
| Quantum Annealing based Power Grid Partitioning for Parallel Simulation | Aug 7, 2024 | CPUGPU | —Unverified | 0 |
| PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training | Aug 7, 2024 | GPUMamba | —Unverified | 0 |
| L3iTC at the FinLLM Challenge Task: Quantization for Financial Text Classification & Summarization | Aug 6, 2024 | GPUQuantization | —Unverified | 0 |
| A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search | Aug 6, 2024 | BlockingGPU | —Unverified | 0 |
| SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving | Aug 5, 2024 | GPU | —Unverified | 0 |
| VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking | Aug 5, 2024 | 3D Single Object TrackingGPU | —Unverified | 0 |
| PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance | Aug 4, 2024 | GPUImage Generation | —Unverified | 0 |
| FT K-means: A High-Performance K-means on GPU with Fault Tolerance | Aug 2, 2024 | Code GenerationGPU | CodeCode Available | 0 |
| The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | Aug 2, 2024 | GPUHyperparameter Optimization | —Unverified | 0 |
| Data-Driven Traffic Simulation for an Intersection in a Metropolis | Aug 1, 2024 | GPUTrajectory Forecasting | —Unverified | 0 |
| Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research | Aug 1, 2024 | CPUGPU | —Unverified | 0 |
| Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion | Aug 1, 2024 | GPUNavigate | CodeCode Available | 0 |
| Finch: Prompt-guided Key-Value Cache Compression | Jul 31, 2024 | GPULanguage Modeling | —Unverified | 0 |
| ThinK: Thinner Key Cache by Query-Driven Pruning | Jul 30, 2024 | GPUQuantization | —Unverified | 0 |
| NeuroSEM: A hybrid framework for simulating multiphysics problems by coupling PINNs and spectral elements | Jul 30, 2024 | CPUGPU | CodeCode Available | 0 |
| GPU-based data processing for speeding-up correlation plenoptic imaging | Jul 30, 2024 | GPU | —Unverified | 0 |
| Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs | Jul 30, 2024 | GPU | —Unverified | 0 |
| ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development | Jul 29, 2024 | GPU | —Unverified | 0 |
| Graphite: A Graph-based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation | Jul 29, 2024 | GPUtext-classification | —Unverified | 0 |
| Simply Trainable Nearest Neighbour Machine Translation with GPU Inference | Jul 29, 2024 | Domain AdaptationGPU | —Unverified | 0 |
| SAPG: Split and Aggregate Policy Gradients | Jul 29, 2024 | Decision MakingGPU | —Unverified | 0 |
| Mini-batch Coresets for Memory-efficient Training of Large Language Models | Jul 28, 2024 | GPUNetwork Pruning | —Unverified | 0 |
| WindsorML: High-Fidelity Computational Fluid Dynamics Dataset For Automotive Aerodynamics | Jul 27, 2024 | GPU | —Unverified | 0 |
| NARVis: Neural Accelerated Rendering for Real-Time Scientific Point Cloud Visualization | Jul 26, 2024 | GPU | —Unverified | 0 |
| Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet | Jul 26, 2024 | Anomaly DetectionCPU | —Unverified | 0 |
| HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline | Jul 25, 2024 | GPU | —Unverified | 0 |
| Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption | Jul 25, 2024 | GPU | CodeCode Available | 0 |
| SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention | Jul 23, 2024 | Code GenerationGPU | —Unverified | 0 |
| A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism | Jul 22, 2024 | GPUNeural Architecture Search | —Unverified | 0 |
| Automated Road Safety: Enhancing Sign and Surface Damage Detection with AI | Jul 22, 2024 | Cloud ComputingGPU | —Unverified | 0 |
| LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme | Jul 21, 2024 | CPUFraud Detection | —Unverified | 0 |
| MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM | Jul 21, 2024 | Few-Shot LearningGPU | —Unverified | 0 |
| GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation | Jul 20, 2024 | GPUImage Generation | CodeCode Available | 0 |
| Neural topology optimization: the good, the bad, and the ugly | Jul 19, 2024 | GPUMisconceptions | —Unverified | 0 |
| Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference | Jul 19, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | Jul 19, 2024 | CPUGPU | —Unverified | 0 |
| LiNR: Model Based Neural Retrieval on GPUs at LinkedIn | Jul 18, 2024 | AttributeGPU | —Unverified | 0 |
| RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models | Jul 17, 2024 | GPUNutrition | —Unverified | 0 |
| SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization | Jul 17, 2024 | GPUQuantization | —Unverified | 0 |
| ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks | Jul 17, 2024 | CPUGPU | —Unverified | 0 |
| Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors | Jul 16, 2024 | GPUNeural Network Compression | —Unverified | 0 |
| Characterizing and Understanding HGNN Training on GPUs | Jul 16, 2024 | GPURecommendation Systems | —Unverified | 0 |
| MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training | Jul 16, 2024 | CPUGPU | —Unverified | 0 |
| Learning Multi-view Anomaly Detection | Jul 16, 2024 | Anomaly DetectionGPU | —Unverified | 0 |
| PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer | Jul 16, 2024 | 2D Object DetectionComputational Efficiency | —Unverified | 0 |
| MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models | Jul 16, 2024 | GPUMultiple-choice | —Unverified | 0 |
| SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation | Jul 15, 2024 | GPUReinforcement Learning (RL) | —Unverified | 0 |
| Differentiable Neural-Integrated Meshfree Method for Forward and Inverse Modeling of Finite Strain Hyperelasticity | Jul 15, 2024 | GPUPhysics-informed machine learning | CodeCode Available | 0 |
| NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis | Jul 15, 2024 | GPUNeRF | —Unverified | 0 |