GPU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 5629 papers

Title	Date	Tasks	Status	Hype
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics	Jun 2, 2025	Action GenerationGPU	CodeCode Available	11
WebLLM: A High-Performance In-Browser LLM Inference Engine	Dec 20, 2024	CPUGPU	CodeCode Available	11
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision	Jul 11, 2024	GPUQuantization	CodeCode Available	11
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control	Jul 3, 2024	Computational EfficiencyFace Reenactment	CodeCode Available	11
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm	Jun 5, 2025	GPURelation	CodeCode Available	9
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction	Mar 21, 2025	CPUDocument Layout Analysis	CodeCode Available	9
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Jan 2, 2025	GPUScheduling	CodeCode Available	9
LTX-Video: Realtime Video Latent Diffusion	Dec 30, 2024	DenoisingGPU	CodeCode Available	9
Liger Kernel: Efficient Triton Kernels for LLM Training	Oct 14, 2024	ChunkingGPU	CodeCode Available	9
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling	Oct 14, 2024	Audio-Visual SynchronizationGPU	CodeCode Available	9
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers	Oct 14, 2024	DecoderGPU	CodeCode Available	9
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training	Oct 9, 2024	GPU	CodeCode Available	9
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second	Oct 2, 2024	Depth EstimationGPU	CodeCode Available	9
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention	Jul 2, 2024	GPULanguage Modelling	CodeCode Available	9
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Mar 26, 2024	GPUGSM8K	CodeCode Available	9
Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble	Mar 7, 2024	Anomaly DetectionGPU	CodeCode Available	9
DETRs Beat YOLOs on Real-time Object Detection	Apr 17, 2023	2D Object DetectionDecoder	CodeCode Available	8
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning	May 30, 2025	GPUMath	CodeCode Available	7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	May 14, 2025	DenoisingDepth Estimation	CodeCode Available	7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization	Mar 26, 2025	CPUGPU	CodeCode Available	7
YOLOv12: Attention-Centric Real-Time Object Detectors	Feb 18, 2025	GPUObject	CodeCode Available	7
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning	Jan 25, 2025	BenchmarkingEvolutionary Algorithms	CodeCode Available	7
EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming	Jan 21, 2025	Feature EngineeringGPU	CodeCode Available	7
Revisiting PCA for time series reduction in temporal dimension	Dec 27, 2024	Computational EfficiencyDimensionality Reduction	CodeCode Available	7
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Nov 27, 2024	FairnessGPU	CodeCode Available	7
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism	Nov 4, 2024	GPU	CodeCode Available	7
ThunderKittens: Simple, Fast, and Adorable AI Kernels	Oct 27, 2024	GPUState Space Models	CodeCode Available	7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement	Oct 17, 2024	GPUReal-Time Object Detection	CodeCode Available	7
Pyramidal Flow Matching for Efficient Video Generative Modeling	Oct 8, 2024	GPUText-to-Video Generation	CodeCode Available	7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI	Oct 1, 2024	GPUImitation Learning	CodeCode Available	7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Jun 24, 2024	CPUGPU	CodeCode Available	7
Scalable MatMul-free Language Modeling	Jun 4, 2024	GPULanguage Modeling	CodeCode Available	7
Mirage: A Multi-Level Superoptimizer for Tensor Programs	May 9, 2024	GPUNavigate	CodeCode Available	7
Labeling supervised fine-tuning data with the scaling law	May 5, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	7
Fast Timing-Conditioned Latent Audio Diffusion	Feb 7, 2024	Audio GenerationGPU	CodeCode Available	7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	Jan 10, 2024	GPUImage Generation	CodeCode Available	7
Elixir: Train a Large Language Model on a Small GPU Cluster	Dec 10, 2022	CPUGPU	CodeCode Available	7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	Jul 6, 2022	2D Object DetectionGPU	CodeCode Available	7
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models	Sep 21, 2023	4kGPU	CodeCode Available	6
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Jul 17, 2023	GPULanguage Modeling	CodeCode Available	6
SqueezeLLM: Dense-and-Sparse Quantization	Jun 13, 2023	GPUQuantization	CodeCode Available	6
QLoRA: Efficient Finetuning of Quantized LLMs	May 23, 2023	ChatbotGPU	CodeCode Available	6
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	May 27, 2022	16k4k	CodeCode Available	6
Group-in-Group Policy Optimization for LLM Agent Training	May 16, 2025	GPUMathematical Reasoning	CodeCode Available	5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention	Apr 22, 2025	GPU	CodeCode Available	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5
Representing Long Volumetric Video with Temporal Gaussian Hierarchy	Dec 12, 2024	GPU	CodeCode Available	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5

Show:10 25 50

← PrevPage 1 of 113Next →

No leaderboard results yet.