SOTAVerified

GPU

Papers

Showing 22512300 of 5629 papers

TitleStatusHype
Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference0
GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments0
SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization0
Assessing and Learning Alignment of Unimodal Vision and Language Models0
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning0
Unifying KV Cache Compression for Large Language Models with LeanKV0
Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning0
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness0
Can't Slow me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices0
SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detectionCode0
MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection0
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification0
Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control0
Improving feature interactions at Pinterest under industry constraints0
SPILDL: A Scalable and Parallel Inductive Learner in Description Logic0
HT-HEDL: High-Throughput Hypothesis Evaluation in Description Logic0
BlendPCR: Seamless and Efficient Rendering of Dynamic Point Clouds captured by Multiple RGB-D CamerasCode0
PAL -- Parallel active learning for machine-learned potentialsCode0
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing0
Open source Differentiable ODE Solving Infrastructure0
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching0
A Simple Sparse Matrix Vector Multiplication Approach to Padded ConvolutionCode0
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs0
An Integrated Artificial Intelligence Operating System for Advanced Low-Altitude Aviation Applications0
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers0
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads0
Differentiable Topology Estimating from Curvatures for 3D Shapes0
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach0
Towards Chunk-Wise Generation for Long Videos0
A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAsCode0
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning0
KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial RecomputationCode0
Automatic Skull Reconstruction by Deep Learnable Symmetry Enforcement0
A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training0
Knowledge-aware Evolutionary Graph Neural Architecture SearchCode0
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference0
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE0
Plastic Arbor: a modern simulation framework for synaptic plasticity x2013 from single synapses to networks of morphological neuronsCode0
MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking0
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format0
Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud0
Reassessing Layer Pruning in LLMs: New Insights and MethodsCode0
Multi-scale Cascaded Large-Model for Whole-body ROI SegmentationCode0
Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers0
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting0
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction0
Deep operator network models for predicting post-burn contraction0
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training0
FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting0
Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning0
Show:102550
← PrevPage 46 of 113Next →

No leaderboard results yet.