SOTAVerified

GPU

Papers

Showing 150 of 5629 papers

TitleStatusHype
DeepSeek-V3 Technical ReportCode16
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient RoboticsCode11
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precisionCode11
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting ControlCode11
WebLLM: A High-Performance In-Browser LLM Inference EngineCode11
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet ParadigmCode9
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal SamplingCode9
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion TransformersCode9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse AttentionCode9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-trainingCode9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingCode9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
Liger Kernel: Efficient Triton Kernels for LLM TrainingCode9
LTX-Video: Realtime Video Latent DiffusionCode9
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled EnsembleCode9
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningCode9
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data ConstructionCode9
DETRs Beat YOLOs on Real-time Object DetectionCode8
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution RefinementCode7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectorsCode7
YOLOv12: Attention-Centric Real-Time Object DetectorsCode7
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive ParallelismCode7
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Fast Timing-Conditioned Latent Audio DiffusionCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model ServingCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Scalable MatMul-free Language ModelingCode7
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
Revisiting PCA for time series reduction in temporal dimensionCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Labeling supervised fine-tuning data with the scaling lawCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Deep Lake: a Lakehouse for Deep LearningCode5
Show:102550
← PrevPage 1 of 113Next →

No leaderboard results yet.