| CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning | Jul 18, 2025 | Code GenerationGPU | —Unverified | 0 |
| Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models | Jul 17, 2025 | DenoisingGPU | —Unverified | 0 |
| DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Jul 17, 2025 | GPUMonocular Visual Odometry | —Unverified | 0 |
| Kevin: Multi-Turn RL for Generating CUDA Kernels | Jul 16, 2025 | GPUReinforcement Learning (RL) | —Unverified | 0 |
| FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | Jul 16, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 |
| Relative Entropy Pathwise Policy Optimization | Jul 15, 2025 | GPU | CodeCode Available | 1 |
| Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning | Jul 14, 2025 | Computational EfficiencyDimensionality Reduction | —Unverified | 0 |
| DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation | Jul 14, 2025 | DecoderGPU | CodeCode Available | 0 |
| Scaling Attention to Very Long Sequences in Linear Time with Wavelet-Enhanced Random Spectral Attention (WERSA) | Jul 11, 2025 | GPU | CodeCode Available | 0 |
| HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation | Jul 10, 2025 | GPUImage Segmentation | CodeCode Available | 0 |
| Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning | Jul 9, 2025 | GPUMulti-agent Reinforcement Learning | —Unverified | 0 |
| From large-eddy simulations to deep learning: A U-net model for fast urban canopy flow predictions | Jul 9, 2025 | GPUL2 Regularization | CodeCode Available | 0 |
| AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs | Jul 8, 2025 | GPUreinforcement-learning | CodeCode Available | 2 |
| Diffusion Dataset Condensation: Training Your Diffusion Model Faster with Less Data | Jul 8, 2025 | Dataset CondensationGPU | —Unverified | 0 |
| Real-Time Graph-based Point Cloud Networks on FPGAs via Stall-Free Deep Pipelining | Jul 7, 2025 | GPU | CodeCode Available | 0 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| MathOptAI.jl: Embed trained machine learning predictors into JuMP models | Jul 3, 2025 | CPUGaussian Processes | CodeCode Available | 2 |
| SketchColour: Channel Concat Guided DiT-based Sketch-to-Colour Pipeline for 2D Animation | Jul 2, 2025 | GPU | —Unverified | 0 |
| LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs | Jul 2, 2025 | CPUGPU | —Unverified | 0 |
| FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation | Jun 30, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 1 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 |
| VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions | Jun 29, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation | Jun 26, 2025 | GPUImage Generation | —Unverified | 0 |
| Omniwise: Predicting GPU Kernels Performance with LLMs | Jun 25, 2025 | GPU | —Unverified | 0 |
| GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization | Jun 25, 2025 | GPU | —Unverified | 0 |
| Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking | Jun 25, 2025 | GPUVisual Tracking | CodeCode Available | 1 |
| Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch | Jun 25, 2025 | Computational EfficiencyGPR | CodeCode Available | 1 |
| DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs | Jun 25, 2025 | GPU | —Unverified | 0 |
| Scaling Speculative Decoding with Lookahead Reasoning | Jun 24, 2025 | GPUGSM8K | CodeCode Available | 0 |
| MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models | Jun 24, 2025 | GPUProtein Folding | CodeCode Available | 2 |
| Virtual Memory for 3D Gaussian Splatting | Jun 24, 2025 | GPUNovel View Synthesis | —Unverified | 0 |
| PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning | Jun 24, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 2 |
| DIP: Unsupervised Dense In-Context Post-training of Visual Representations | Jun 23, 2025 | GPUMeta-Learning | CodeCode Available | 1 |
| Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jun 23, 2025 | Domain AdaptationGPU | CodeCode Available | 3 |
| Let Your Video Listen to Your Music! | Jun 23, 2025 | GPUMusic Generation | —Unverified | 0 |
| Survey of HPC in US Research Institutions | Jun 23, 2025 | BenchmarkingGPU | —Unverified | 0 |
| 4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time | Jun 23, 2025 | 4D reconstructionGPU | —Unverified | 0 |
| CommVQ: Commutative Vector Quantization for KV Cache Compression | Jun 23, 2025 | GPUGSM8K | CodeCode Available | 1 |
| TDACloud: Point Cloud Recognition Using Topological Data Analysis | Jun 23, 2025 | Autonomous DrivingGPU | —Unverified | 0 |
| Lightweight RGB-T Tracking with Mobile Vision Transformers | Jun 23, 2025 | GPUObject Tracking | —Unverified | 0 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| Collaborative Texture Filtering | Jun 21, 2025 | GPU | —Unverified | 0 |
| ConsumerBench: Benchmarking Generative AI Applications on End-User Devices | Jun 21, 2025 | BenchmarkingCPU | CodeCode Available | 1 |
| VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM | Jun 20, 2025 | GPU | —Unverified | 0 |
| Beyond Blur: A Fluid Perspective on Generative Diffusion Models | Jun 20, 2025 | DiversityGPU | —Unverified | 0 |
| Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU Acceleration | Jun 20, 2025 | AttributeComputational Efficiency | —Unverified | 0 |
| TrainVerify: Equivalence-Based Verification for Distributed LLM Training | Jun 19, 2025 | GPU | —Unverified | 0 |