SOTAVerified

GPU

Papers

Showing 151200 of 5629 papers

TitleStatusHype
Retentive Network: A Successor to Transformer for Large Language ModelsCode3
Fine-Tuning Language Models with Just Forward PassesCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded ModesCode3
FastViT: A Fast Hybrid Vision Transformer using Structural ReparameterizationCode3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
Fast Sampling of Diffusion Models with Exponential IntegratorCode3
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden IntermediatesCode3
OctFusion: Octree-based Diffusion Models for 3D Shape GenerationCode3
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data ProcessingCode3
MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI ApplicationsCode3
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPUCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
FastMap: Revisiting Dense and Scalable Structure from MotionCode3
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian SplattingCode3
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language ModelsCode3
Modular Duality in Deep LearningCode3
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert CacheCode3
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of ExpertsCode3
EscherNet: A Generative Model for Scalable View SynthesisCode3
EvoTorch: Scalable Evolutionary Computation in PythonCode3
MobileMamba: Lightweight Multi-Receptive Visual Mamba NetworkCode3
ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated CharactersCode3
Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token SequencesCode3
M+: Extending MemoryLLM with Scalable Long-Term MemoryCode3
EfficientQAT: Efficient Quantization-Aware Training for Large Language ModelsCode3
Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AICode3
MetaDE: Evolving Differential Evolution by Differential EvolutionCode3
Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised ModelsCode3
94% on CIFAR-10 in 3.29 Seconds on a Single GPUCode3
Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligenceCode3
MagicPIG: LSH Sampling for Efficient LLM GenerationCode3
APOLLO: SGD-like Memory, AdamW-level PerformanceCode3
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingCode3
MegaBlocks: Efficient Sparse Training with Mixture-of-ExpertsCode3
mlpack 3: a fast, flexible machine learning libraryCode3
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid ArchitectureCode3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at ScaleCode3
LiteGS: A High-Performance Modular Framework for Gaussian Splatting TrainingCode3
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
Merlin: A Vision Language Foundation Model for 3D Computed TomographyCode3
MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionCode3
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache ManagementCode3
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming ServicesCode3
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache QuantizationCode3
InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentationCode3
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited ResourcesCode3
CtrLoRA: An Extensible and Efficient Framework for Controllable Image GenerationCode3
Data Generation for Hardware-Friendly Post-Training QuantizationCode3
Show:102550
← PrevPage 4 of 113Next →

No leaderboard results yet.