| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 |
| Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis | Apr 22, 2025 | GPUQuantization | —Unverified | 0 |
| Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning | Apr 22, 2025 | GPUMalware Classification | —Unverified | 0 |
| A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained Settings | Apr 22, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| Splitwiser: Efficient LM inference with constrained resources | Apr 21, 2025 | GPUScheduling | CodeCode Available | 0 |
| LithOS: An Operating System for Efficient Machine Learning on GPUs | Apr 21, 2025 | BlockingGPU | —Unverified | 0 |
| Distribution-aware Dataset Distillation for Efficient Image Restoration | Apr 21, 2025 | 4kDataset Distillation | —Unverified | 0 |
| Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations | Apr 21, 2025 | GPUSurface Normal Estimation | —Unverified | 0 |
| Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects | Apr 21, 2025 | GPU | —Unverified | 0 |
| SG-Reg: Generalizable and Efficient Scene Graph Registration | Apr 20, 2025 | GPU | CodeCode Available | 2 |
| AlphaZero-Edu: Making AlphaZero Accessible to Everyone | Apr 20, 2025 | GPU | CodeCode Available | 0 |
| HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing | Apr 18, 2025 | GPU | —Unverified | 0 |
| Quantum Walks-Based Adaptive Distribution Generation with Efficient CUDA-Q Acceleration | Apr 18, 2025 | GPU | —Unverified | 0 |
| Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction | Apr 18, 2025 | 3D Object DetectionGPU | CodeCode Available | 1 |
| NNTile: a machine learning framework capable of training extremely large GPT language models on a single node | Apr 17, 2025 | CPUGPU | —Unverified | 0 |
| Mask Image Watermarking | Apr 17, 2025 | Computational EfficiencyDecoder | CodeCode Available | 1 |
| Second-order Optimization of Gaussian Splats with Importance Sampling | Apr 17, 2025 | 3DGSGPU | —Unverified | 0 |
| ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior | Apr 17, 2025 | 3DGSGPU | —Unverified | 0 |
| Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving | Apr 17, 2025 | GPU | —Unverified | 0 |
| Data-efficient LLM Fine-tuning for Code Generation | Apr 17, 2025 | Code GenerationGPU | CodeCode Available | 1 |
| Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation | Apr 17, 2025 | GPUObject Recognition | CodeCode Available | 2 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification | Apr 16, 2025 | GPU | —Unverified | 0 |
| MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models | Apr 16, 2025 | GPU | —Unverified | 0 |
| Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Apr 16, 2025 | CPUGPU | —Unverified | 0 |