SOTAVerified

GPU

Papers

Showing 101150 of 5629 papers

TitleStatusHype
Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis0
Generalizable, real-time neural decoding with hybrid state-space models0
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and VideosCode2
Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers0
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet ParadigmCode9
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
High-Speed Ultra-Energy-Efficient Memristor-Based Massive MIMO SIC Detector Circuit with Hybrid Analog-Digital Computing Architecture0
Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives0
FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices0
Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency0
VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians0
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem0
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
COALESCE: Economic and Security Dynamics of Skill-Based Task Outsourcing Among Team of Autonomous LLM Agents0
Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization0
Recipes for Pre-training LLMs with MXFP80
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models0
NUC-Net: Non-uniform Cylindrical Partition Network for Efficient LiDAR Semantic SegmentationCode0
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks0
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Trainin0
LoLA: Low-Rank Linear Attention With Sparse Caching0
CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection0
Accelerating AllReduce with a Persistent StragglerCode1
Holistic Large-Scale Scene Reconstruction via Mixed Gaussian SplattingCode1
LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering0
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGSCode2
LUMION: Fast Fault Recovery for ML Jobs Using Programmable Optical Fabrics0
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework DesignCode1
Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule0
SHTOcc: Effective 3D Occupancy Prediction with Sparse Head and Tail VoxelsCode0
Re-ttention: Ultra Sparse Visual Generation via Attention Statistical ReshapeCode0
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding0
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control0
CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs0
Minute-Long Videos with Dual ParallelismsCode1
STACI: Spatio-Temporal Aleatoric Conformal Inference0
Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks0
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits0
InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling0
SwarmThinkers: Learning Physically Consistent Atomic KMC Transitions at Scale0
APE: A Data-Centric Benchmark for Efficient LLM Adaptation in Text SummarizationCode0
TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache OptimizationCode1
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial DatasetsCode0
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling0
eACGM: Non-instrumented Performance Tracing and Anomaly Detection towards Machine Learning Systems0
TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis0
Triangle Splatting for Real-Time Radiance Field Rendering0
Advancing Video Self-Supervised Learning via Image Foundation ModelsCode0
FastMamba: A High-Speed and Efficient Mamba Accelerator on FPGA with Accurate Quantization0
Show:102550
← PrevPage 3 of 113Next →

No leaderboard results yet.