| Sparfels: Fast Reconstruction from Sparse Unposed Imagery | May 4, 2025 | GPU | —Unverified | 0 |
| Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing | May 2, 2025 | GPUMultivariate Time Series Forecasting | —Unverified | 0 |
| Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation | May 2, 2025 | GPU | —Unverified | 0 |
| Aggregating empirical evidence from data strategy studies: a case on model quantization | May 1, 2025 | GPUQuantization | —Unverified | 0 |
| Efficient On-Chip Implementation of 4D Radar-Based 3D Object Detection on Hailo-8L | May 1, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| GPU Performance Portability needs Autotuning | Apr 30, 2025 | GPU | CodeCode Available | 2 |
| Sionna RT: Technical Report | Apr 30, 2025 | GPU | —Unverified | 0 |
| Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning | Apr 29, 2025 | CPUGPU | —Unverified | 0 |
| TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models | Apr 29, 2025 | BenchmarkingDataset Generation | CodeCode Available | 0 |
| STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction | Apr 28, 2025 | GPU | CodeCode Available | 2 |
| Mesh-Learner: Texturing Mesh with Spherical Harmonics | Apr 28, 2025 | 3D ReconstructionCPU | CodeCode Available | 1 |
| Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language | Apr 28, 2025 | Continual PretrainingGPU | —Unverified | 0 |
| semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage | Apr 28, 2025 | GPULarge Language Model | —Unverified | 0 |
| Taming the Titans: A Survey of Efficient LLM Inference Serving | Apr 28, 2025 | GPUMiscellaneous | CodeCode Available | 1 |
| FlashOverlap: A Lightweight Design for Efficiently Overlapping Communication and Computation | Apr 28, 2025 | GPU | —Unverified | 0 |
| Accelerating Mixture-of-Experts Training with Adaptive Expert Replication | Apr 28, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| NSFlow: An End-to-End FPGA Framework with Scalable Dataflow Architecture for Neuro-Symbolic AI | Apr 27, 2025 | GPU | —Unverified | 0 |
| Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion Collider | Apr 26, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| GPU accelerated program synthesis: Enumerate semantics, not syntax! | Apr 26, 2025 | CPUGPU | —Unverified | 0 |
| The Big Send-off: High Performance Collectives on GPU-based Supercomputers | Apr 25, 2025 | GPULanguage Modeling | —Unverified | 0 |
| CaRL: Learning Scalable Planning Policies with Simple Rewards | Apr 24, 2025 | Autonomous DrivingCARLA longest6 | CodeCode Available | 2 |
| L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference | Apr 24, 2025 | GPU | —Unverified | 0 |
| Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification | Apr 23, 2025 | Emotion ClassificationGPU | —Unverified | 0 |
| Fried Parameter Estimation from Single Wavefront Sensor Image with Artificial Neural Networks | Apr 23, 2025 | GPUparameter estimation | —Unverified | 0 |
| Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU | Apr 23, 2025 | GPUWeather Forecasting | —Unverified | 0 |
| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 |
| Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis | Apr 22, 2025 | GPUQuantization | —Unverified | 0 |
| Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning | Apr 22, 2025 | GPUMalware Classification | —Unverified | 0 |
| A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings | Apr 22, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| Splitwiser: Efficient LM inference with constrained resources | Apr 21, 2025 | GPUScheduling | CodeCode Available | 0 |
| LithOS: An Operating System for Efficient Machine Learning on GPUs | Apr 21, 2025 | BlockingGPU | —Unverified | 0 |
| Distribution-aware Dataset Distillation for Efficient Image Restoration | Apr 21, 2025 | 4kDataset Distillation | —Unverified | 0 |
| Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations | Apr 21, 2025 | GPUSurface Normal Estimation | —Unverified | 0 |
| Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects | Apr 21, 2025 | GPU | —Unverified | 0 |
| SG-Reg: Generalizable and Efficient Scene Graph Registration | Apr 20, 2025 | GPU | CodeCode Available | 2 |
| AlphaZero-Edu: Making AlphaZero Accessible to Everyone | Apr 20, 2025 | GPU | CodeCode Available | 0 |
| HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Apr 18, 2025 | GPU | —Unverified | 0 |
| Quantum Walks-Based Adaptive Distribution Generation with Efficient CUDA-Q Acceleration | Apr 18, 2025 | GPU | —Unverified | 0 |
| Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction | Apr 18, 2025 | 3D Object DetectionGPU | CodeCode Available | 1 |
| NNTile: a machine learning framework capable of training extremely large GPT language models on a single node | Apr 17, 2025 | CPUGPU | —Unverified | 0 |
| Mask Image Watermarking | Apr 17, 2025 | Computational EfficiencyDecoder | CodeCode Available | 1 |
| Second-order Optimization of Gaussian Splats with Importance Sampling | Apr 17, 2025 | 3DGSGPU | —Unverified | 0 |
| ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior | Apr 17, 2025 | 3DGSGPU | —Unverified | 0 |
| Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving | Apr 17, 2025 | GPU | —Unverified | 0 |
| Data-efficient LLM Fine-tuning for Code Generation | Apr 17, 2025 | Code GenerationGPU | CodeCode Available | 1 |
| Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation | Apr 17, 2025 | GPUObject Recognition | CodeCode Available | 2 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification | Apr 16, 2025 | GPU | —Unverified | 0 |
| MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models | Apr 16, 2025 | GPU | —Unverified | 0 |
| Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Apr 16, 2025 | CPUGPU | —Unverified | 0 |