SOTAVerified

GPU

Papers

Showing 150 of 5629 papers

TitleStatusHype
DeepSeek-V3 Technical ReportCode16
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
WebLLM: A High-Performance In-Browser LLM Inference EngineCode11
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionCode11
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting ControlCode11
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet ParadigmCode9
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data ConstructionCode9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingCode9
LTX-Video: Realtime Video Latent DiffusionCode9
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal SamplingCode9
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion TransformersCode9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-trainingCode9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse AttentionCode9
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled EnsembleCode9
DETRs Beat YOLOs on Real-time Object DetectionCode8
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
YOLOv12: Attention-Centric Real-Time Object DetectorsCode7
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
Revisiting PCA for time series reduction in temporal dimensionCode7
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model ServingCode7
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive ParallelismCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution RefinementCode7
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Scalable MatMul-free Language ModelingCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
Labeling supervised fine-tuning data with the scaling lawCode7
Fast Timing-Conditioned Latent Audio DiffusionCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectorsCode7
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Representing Long Volumetric Video with Temporal Gaussian HierarchyCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Show:102550
← PrevPage 1 of 113Next →

No leaderboard results yet.