SOTAVerified

GPU

Papers

Showing 18511900 of 5629 papers

TitleStatusHype
Can Large Language Models Predict Parallel Code Performance?0
Anant-Net: Breaking the Curse of Dimensionality with Scalable and Interpretable Neural Surrogate for High-Dimensional PDEs0
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving0
Quantitative Analysis of Performance Drop in DeepSeek Model QuantizationCode0
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference0
QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach0
A UNet Model for Accelerated Preprocessing of CRISM Hyperspectral Data for Mineral Identification on Mars0
Sparfels: Fast Reconstruction from Sparse Unposed Imagery0
Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing0
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation0
Efficient On-Chip Implementation of 4D Radar-Based 3D Object Detection on Hailo-8L0
Aggregating empirical evidence from data strategy studies: a case on model quantization0
Sionna RT: Technical Report0
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language ModelsCode0
Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning0
Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language0
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage0
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication0
FlashOverlap: A Lightweight Design for Efficiently Overlapping Communication and Computation0
NSFlow: An End-to-End FPGA Framework with Scalable Dataflow Architecture for Neuro-Symbolic AI0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
GPU accelerated program synthesis: Enumerate semantics, not syntax!0
The Big Send-off: High Performance Collectives on GPU-based Supercomputers0
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference0
Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification0
Fried Parameter Estimation from Single Wavefront Sensor Image with Artificial Neural Networks0
Democracy of AI Numerical Weather Models: An Example of Global Forecasting with FourCastNetv2 Made by a University Research Lab Using GPU0
Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning0
A LoRA-Based Approach to Fine-Tuning LLMs for Educational Guidance in Resource-Constrained SettingsCode0
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis0
Splitwiser: Efficient LM inference with constrained resourcesCode0
Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects0
Distribution-aware Dataset Distillation for Efficient Image Restoration0
LithOS: An Operating System for Efficient Machine Learning on GPUs0
Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations0
AlphaZero-Edu: Making AlphaZero Accessible to EveryoneCode0
HPU: High-Bandwidth Processing Unit for Scalable, Cost-effective LLM Inference via GPU Co-processing0
Quantum Walks-Based Adaptive Distribution Generation with Efficient CUDA-Q Acceleration0
ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior0
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node0
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving0
Second-order Optimization of Gaussian Splats with Importance Sampling0
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models0
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures0
Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification0
BitNet b1.58 2B4T Technical Report0
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache OffloadingCode0
PatrolVision: Automated License Plate Recognition in the wild0
Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models0
ConvShareViT: Enhancing Vision Transformers with Convolutional Attention Mechanisms for Free-Space Optical Accelerators0
Show:102550
← PrevPage 38 of 113Next →

No leaderboard results yet.