SOTAVerified

GPU

Papers

Showing 551600 of 5629 papers

TitleStatusHype
MetaDE: Evolving Differential Evolution by Differential EvolutionCode3
On LLM-generated Logic Programs and their Inference Execution Methods0
Latents of latents to delineate pixels: hybrid Matryoshka autoencoder-to-U-Net pairing for segmenting large medical images in GPU-poor and low-data regimes0
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising QualityCode0
E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization0
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU0
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 20
Inference-time sparse attention with asymmetric indexing0
High-Throughput SAT SamplingCode0
Numerical Schemes for Signature KernelsCode0
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers0
Bag of Tricks for Inference-time Computation of LLM ReasoningCode1
Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving0
Memory Analysis on the Training Course of DeepSeek Models0
Small Language Model Makes an Effective Long Text ExtractorCode1
Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation0
Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUsCode0
Accelerating Outlier-robust Rotation Estimation by Stereographic Projection0
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing0
MERGE^3: Efficient Evolutionary Merging on Consumer-grade GPUsCode1
Crypto Miner Attack: GPU Remote Code Execution Attacks0
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch PipelineCode0
Saving 77% of the Parameters in Large Language Models Technical ReportCode2
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving0
QuEST: Stable Training of LLMs with 1-Bit Weights and ActivationsCode2
WaferLLM: Large Language Model Inference at Wafer ScaleCode2
InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers0
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache0
Kozax: Flexible and Scalable Genetic Programming in JAXCode1
SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and BeyondCode1
Fast Sampling of Cosmological Initial Conditions with Gaussian Neural Posterior Estimation0
Robust Autonomy Emerges from Self-Play0
Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set0
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale GeometriesCode3
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization0
Brief analysis of DeepSeek R1 and it's implications for Generative AI0
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models0
Ilargi: a GPU Compatible Factorized ML Model Training Framework0
Comparative Analysis of FPGA and GPU Performance for Machine Learning-Based Track Reconstruction at LHCbCode0
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity0
ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving0
Recursive generalized type-2 fuzzy radial basis function neural networks for joint position estimation and adaptive EMG-based impedance control of lower limb exoskeletonsCode0
M+: Extending MemoryLLM with Scalable Long-Term MemoryCode3
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference0
Work-Efficient Parallel Non-Maximum Suppression KernelsCode1
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesCode0
TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs0
LLM-based Affective Text Generation Quality Based on Different Quantization Values0
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models0
Show:102550
← PrevPage 12 of 113Next →

No leaderboard results yet.