SOTAVerified

GPU

Papers

Showing 18511900 of 5629 papers

TitleStatusHype
Pruner: A Speculative Exploration Mechanism to Accelerate Tensor Program TuningCode1
Structure-Aware E(3)-Invariant Molecular Conformer Aggregation NetworksCode1
Scalable and Efficient Temporal Graph Representation Learning via Forward Recent SamplingCode0
InferCept: Efficient Intercept Support for Augmented Large Language Model InferenceCode1
PRIME: Protect Your Videos From Malicious EditingCode0
Faster Inference of Integer SWIN Transformer by Removing the GELU Activation0
Enriched Physics-informed Neural Networks for Dynamic Poisson-Nernst-Planck Systems0
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State SpacesCode3
An Accurate and Low-Parameter Machine Learning Architecture for Next Location Prediction0
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache QuantizationCode3
Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages0
Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers0
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget0
GPU Cluster Scheduling for Network-Sensitive Deep Learning0
SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignCode2
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingCode0
Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote SensingCode2
HiFT: A Hierarchical Full Parameter Fine-Tuning StrategyCode1
The Case for Co-Designing Model Architectures with Hardware0
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert CacheCode3
ServerlessLLM: Low-Latency Serverless Inference for Large Language ModelsCode4
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignCode3
CNN architecture extraction on edge GPU0
Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-40
InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy PredictionCode1
Edge-Enabled Real-time Railway Track Segmentation0
immrax: A Parallelizable and Differentiable Toolbox for Interval Analysis and Mixed Monotone Reachability in JAXCode1
A Lightweight FPGA-based IDS-ECU Architecture for Automotive CAN0
Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network Pruning0
Exact analytical algorithm for solvent accessible surface area and derivatives in implicit solvent molecular simulations on GPUs0
Towards providing reliable job completion time predictions using PCSCode0
Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded DevicesCode1
PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map ConsistencyCode4
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelCode2
LoMA: Lossless Compressed Memory Attention0
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model InferenceCode1
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language ModelsCode3
TP-Aware Dequantization0
Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor SearchCode0
Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound DiagnosisCode0
Parameter-Efficient Detoxification with Contrastive Decoding0
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models0
Efficient Parallel Algorithms for Inpainting-Based Representations of 4K Images -- Part I: Homogeneous Diffusion Inpainting0
Efficient Parallel Data Optimization for Homogeneous Diffusion Inpainting of 4K Images0
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase PredictionCode2
Extreme Compression of Large Language Models via Additive QuantizationCode5
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU0
MGARD: A multigrid framework for high-performance, error-controlled data compression and refactoring0
Towards Safe Load Balancing based on Control Barrier Functions and Deep Reinforcement Learning0
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
Show:102550
← PrevPage 38 of 113Next →

No leaderboard results yet.