| GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable | Apr 10, 2025 | GPUMath | —Unverified | 0 |
| A Comparison of Deep Learning Methods for Cell Detection in Digital Cytology | Apr 9, 2025 | Cell DetectionComputational Efficiency | CodeCode Available | 0 |
| CRYSIM: Prediction of Symmetric Structures of Large Crystals with GPU-based Ising Machines | Apr 9, 2025 | Bayesian OptimizationGPU | CodeCode Available | 0 |
| Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching | Apr 8, 2025 | GPUScheduling | —Unverified | 0 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training | Apr 8, 2025 | GPU | —Unverified | 0 |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Apr 8, 2025 | CPUGPU | CodeCode Available | 2 |
| HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling | Apr 8, 2025 | DecoderGPU | CodeCode Available | 1 |
| SmolVLM: Redefining small and efficient multimodal models | Apr 7, 2025 | GPU | —Unverified | 0 |
| PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters | Apr 7, 2025 | CPUGPU | CodeCode Available | 0 |
| Leveraging State Space Models in Long Range Genomics | Apr 7, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors | Apr 7, 2025 | GPU | CodeCode Available | 2 |
| Scaling Graph Neural Networks for Particle Track Reconstruction | Apr 7, 2025 | Edge ClassificationGPU | CodeCode Available | 1 |
| Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification | Apr 7, 2025 | Depth EstimationGPU | CodeCode Available | 0 |
| Multimodal Cinematic Video Synthesis Using Text-to-Image and Audio Generation Models | Apr 6, 2025 | Audio GenerationGPU | —Unverified | 0 |
| SLOs-Serve: Optimized Serving of Multi-SLO LLMs | Apr 5, 2025 | ChatbotGPU | —Unverified | 0 |
| DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design | Apr 4, 2025 | GPUKolmogorov-Arnold Networks | CodeCode Available | 0 |
| Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition | Apr 4, 2025 | GPUHandwritten Text Recognition | CodeCode Available | 1 |
| HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Apr 4, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis | Apr 4, 2025 | CPUGPU | —Unverified | 0 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Apr 3, 2025 | CPUGPU | —Unverified | 0 |
| Incorporating the ChEES Criterion into Sequential Monte Carlo Samplers | Apr 3, 2025 | Bayesian InferenceGPU | —Unverified | 0 |
| GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration | Apr 3, 2025 | GPUQuantization | CodeCode Available | 2 |
| A Truncated Newton Method for Optimal Transport | Apr 2, 2025 | GPU | CodeCode Available | 0 |