GPU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 5629 papers

Title	Date	Tasks	Status	Hype
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention	Apr 22, 2025	GPU	CodeCode Available	5
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning	Feb 29, 2024	GPULanguage Modeling	CodeCode Available	5
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation	Oct 16, 2024	Audio GenerationGPU	CodeCode Available	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models	Nov 8, 2023	8kGPU	CodeCode Available	5
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications	Jun 25, 2023	CPUDecoder	CodeCode Available	5
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications	Sep 7, 2022	GPUObject Detection	CodeCode Available	5
YOLOv6 v3.0: A Full-Scale Reloading	Jan 13, 2023	GPUObject Detection	CodeCode Available	5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Aug 21, 2024	GPUQuantization	CodeCode Available	5
Extreme Compression of Large Language Models via Additive Quantization	Jan 11, 2024	CPUGPU	CodeCode Available	5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale	Aug 15, 2022	GPULanguage Modelling	CodeCode Available	5
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models	Nov 22, 2024	GPU	CodeCode Available	5
ReLoRA: High-Rank Training Through Low-Rank Updates	Jul 11, 2023	GPU	CodeCode Available	5
Group-in-Group Policy Optimization for LLM Agent Training	May 16, 2025	GPUMathematical Reasoning	CodeCode Available	5
Fast On-device LLM Inference with NPUs	Jul 8, 2024	CPUGPU	CodeCode Available	5
EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design	Feb 1, 2023	GPUobject-detection	CodeCode Available	5
KBLaM: Knowledge Base augmented Language Model	Oct 14, 2024	8kGPU	CodeCode Available	5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU	Mar 13, 2023	CPUGPU	CodeCode Available	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU	Dec 16, 2023	CPUGPU	CodeCode Available	5
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second	Jul 5, 2022	AutoMLBayesian Inference	CodeCode Available	5
Representing Long Volumetric Video with Temporal Gaussian Hierarchy	Dec 12, 2024	GPU	CodeCode Available	5
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis	Sep 30, 2023	GPU	CodeCode Available	4
GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles	Oct 27, 2020	BIG-bench Machine LearningCPU	CodeCode Available	4
Deep Patch Visual SLAM	Aug 3, 2024	GPUVisual Odometry	CodeCode Available	4
PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency	Jan 17, 2024	GPUIncremental Learning	CodeCode Available	4
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees	Sep 18, 2023	GPUImputation	CodeCode Available	4
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS	Aug 2, 2024	GPUNavigate	CodeCode Available	4
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Apr 15, 2025	GPUInference Optimization	CodeCode Available	4
On Scaling Up 3D Gaussian Splatting Training	Jun 26, 2024	3DGS3D Reconstruction	CodeCode Available	4
Otter: A Multi-Modal Model with In-Context Instruction Tuning	May 5, 2023	GPUIn-Context Learning	CodeCode Available	4
PLAID: An Efficient Engine for Late Interaction Retrieval	May 19, 2022	CPUGPU	CodeCode Available	4
Multi-head Temporal Latent Attention	May 19, 2025	GPUspeech-recognition	CodeCode Available	4
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion	Jan 27, 2023	GPUImage Generation	CodeCode Available	4
CoTracker: It is Better to Track Together	Jul 14, 2023	GPUmotion prediction	CodeCode Available	4
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence	Jul 1, 2024	GPUPoint cloud reconstruction	CodeCode Available	4
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals	Jul 18, 2024	Experimental DesignGPU	CodeCode Available	4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Oct 9, 2024	GPUMixture-of-Experts	CodeCode Available	4
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation	Jun 4, 2024	Face SwappingGPU	CodeCode Available	4
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training	Oct 28, 2021	Deep LearningGPU	CodeCode Available	4
OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit	May 12, 2025	GPUPrivacy Preserving	CodeCode Available	4
Mamba-FETrack: Frame-Event Tracking via State Space Model	Apr 28, 2024	GPUMamba	CodeCode Available	4
Mamba YOLO: A Simple Baseline for Object Detection with State Space Model	Jun 9, 2024	GPUMamba	CodeCode Available	4
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training	Mar 3, 2023	Federated LearningGPU	CodeCode Available	4
FFCV: Accelerating Training by Removing Data Bottlenecks	Jun 21, 2023	CPUGPU	CodeCode Available	4
Looking Backward: Streaming Video-to-Video Translation with Feature Banks	May 24, 2024	GPUTranslation	CodeCode Available	4
Building reliable sim driving agents by scaling self-play	Feb 20, 2025	Autonomous VehiclesBenchmarking	CodeCode Available	4
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float	Apr 15, 2025	CPUGPU	CodeCode Available	4
fastai: A Layered API for Deep Learning	Feb 11, 2020	Deep LearningGPU	CodeCode Available	4
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token	Jan 7, 2025	GPUVisual Question Answering (VQA)	CodeCode Available	4

Show:10 25 50

← PrevPage 2 of 113Next →

No leaderboard results yet.