GPU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 5629 papers

Title	Date	Tasks	Status	Hype	Score
Retentive Network: A Successor to Transformer for Large Language Models	Jul 17, 2023	GPULanguage Modeling	CodeCode Available	3	5
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services	Apr 25, 2024	GPU	CodeCode Available	3	5
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models	Feb 10, 2024	CPUGPU	CodeCode Available	3	5
Fine-Tuning Language Models with Just Forward Passes	May 27, 2023	GPUIn-Context Learning	CodeCode Available	3	5
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs	Feb 6, 2024	BinarizationGPU	CodeCode Available	3	5
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	Mar 24, 2023	3D Hand Pose EstimationGPU	CodeCode Available	3	5
Fast Sampling of Diffusion Models with Exponential Integrator	Apr 29, 2022	GPU	CodeCode Available	3	5
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting	Dec 16, 2024	3D Reconstruction4k	CodeCode Available	3	5
FastMap: Revisiting Dense and Scalable Structure from Motion	May 7, 2025	GPU	CodeCode Available	3	5
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates	Sep 27, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	3	5
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU	May 12, 2024	CPUDeep Learning	CodeCode Available	3	5
OctFusion: Octree-based Diffusion Models for 3D Shape Generation	Aug 27, 2024	3D Generation3D Shape Generation	CodeCode Available	3	5
Allo: A Programming Model for Composable Accelerator Design	Apr 7, 2024	GPUHigh-Level Synthesis	CodeCode Available	3	5
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources	Sep 5, 2023	DecoderGPU	CodeCode Available	3	5
Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes	Aug 29, 2017	BIG-bench Machine LearningCPU	CodeCode Available	3	5
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models	Aug 16, 2024	GPUModel Compression	CodeCode Available	3	5
Modular Duality in Deep Learning	Oct 28, 2024	Deep LearningGPU	CodeCode Available	3	5
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache	Jan 25, 2024	GPUmodel	CodeCode Available	3	5
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network	Nov 24, 2024	GPUMamba	CodeCode Available	3	5
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices	Dec 28, 2023	AutoMLCPU	CodeCode Available	3	5
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion	May 30, 2024	DenoisingGPU	CodeCode Available	3	5
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models	Jul 10, 2024	GPUQuantization	CodeCode Available	3	5
MetaDE: Evolving Differential Evolution by Differential Evolution	Feb 13, 2025	Computational EfficiencyGPU	CodeCode Available	3	5
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	Apr 22, 2024	Common Sense ReasoningGPU	CodeCode Available	3	5
Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models	Jun 23, 2025	Domain AdaptationGPU	CodeCode Available	3	5
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts	Nov 29, 2022	GPUMixture-of-Experts	CodeCode Available	3	5
A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models	Oct 17, 2022	CPUGPU	CodeCode Available	3	5
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models	Apr 3, 2024	GPUMath	CodeCode Available	3	5
Merlin: A Vision Language Foundation Model for 3D Computed Tomography	Jun 10, 2024	3D Semantic SegmentationComputed Tomography (CT)	CodeCode Available	3	5
mlpack 3: a fast, flexible machine learning library	Jun 18, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available	3	5
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Apr 8, 2024	GPUMultiple-choice	CodeCode Available	3	5
M+: Extending MemoryLLM with Scalable Long-Term Memory	Feb 1, 2025	16kGPU	CodeCode Available	3	5
94% on CIFAR-10 in 3.29 Seconds on a Single GPU	Mar 30, 2024	GPU	CodeCode Available	3	5
MagicPIG: LSH Sampling for Efficient LLM Generation	Oct 21, 2024	CPUGPU	CodeCode Available	3	5
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Feb 7, 2025	4kGeneral Knowledge	CodeCode Available	3	5
Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence	Feb 12, 2020	BIG-bench Machine LearningGPU	CodeCode Available	3	5
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale	Aug 10, 2024	GPULanguage Modelling	CodeCode Available	3	5
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture	Sep 4, 2024	GPUMamba	CodeCode Available	3	5
LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training	Mar 3, 2025	3DGSGPU	CodeCode Available	3	5
EscherNet: A Generative Model for Scalable View Synthesis	Feb 6, 2024	3D ReconstructionGPU	CodeCode Available	3	5
LinFusion: 1 GPU, 1 Minute, 16K Image	Sep 3, 2024	16kCausal Inference	CodeCode Available	3	5
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications	Apr 11, 2025	GPU	CodeCode Available	3	5
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management	Oct 1, 2024	GPULanguage Modeling	CodeCode Available	3	5
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization	Jan 31, 2024	GPUQuantization	CodeCode Available	3	5
ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters	May 4, 2022	GPUImitation Learning	CodeCode Available	3	5
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing	Nov 22, 2024	Computational EfficiencyCPU	CodeCode Available	3	5
Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences	Jun 16, 2025	Document SummarizationGPU	CodeCode Available	3	5
Data Generation for Hardware-Friendly Post-Training Quantization	Oct 29, 2024	Data AugmentationGPU	CodeCode Available	3	5
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI	Jul 16, 2025	GPU	CodeCode Available	3	5
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation	Sep 12, 2023	GPUImage Generation	CodeCode Available	3	5

Show:10 25 50

← PrevPage 4 of 113Next →

No leaderboard results yet.