| MetaDE: Evolving Differential Evolution by Differential Evolution | Feb 13, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| On LLM-generated Logic Programs and their Inference Execution Methods | Feb 13, 2025 | GPU | —Unverified | 0 |
| Latents of latents to delineate pixels: hybrid Matryoshka autoencoder-to-U-Net pairing for segmenting large medical images in GPU-poor and low-data regimes | Feb 13, 2025 | DecoderGPU | —Unverified | 0 |
| CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Feb 13, 2025 | 8kGPU | CodeCode Available | 0 |
| E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization | Feb 13, 2025 | Computational EfficiencyDenoising | —Unverified | 0 |
| InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Feb 13, 2025 | GPULanguage Modeling | —Unverified | 0 |
| Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2 | Feb 12, 2025 | GPU | —Unverified | 0 |
| Inference-time sparse attention with asymmetric indexing | Feb 12, 2025 | GPU | —Unverified | 0 |
| High-Throughput SAT Sampling | Feb 12, 2025 | GPUvalid | CodeCode Available | 0 |
| Numerical Schemes for Signature Kernels | Feb 12, 2025 | GPU | CodeCode Available | 0 |
| Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers | Feb 12, 2025 | BlockingGPU | —Unverified | 0 |
| Bag of Tricks for Inference-time Computation of LLM Reasoning | Feb 11, 2025 | GPU | CodeCode Available | 1 |
| Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving | Feb 11, 2025 | Autonomous DrivingComputational Efficiency | —Unverified | 0 |
| Memory Analysis on the Training Course of DeepSeek Models | Feb 11, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Small Language Model Makes an Effective Long Text Extractor | Feb 11, 2025 | GPULanguage Modeling | CodeCode Available | 1 |
| Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation | Feb 11, 2025 | class-incremental learningClass Incremental Learning | —Unverified | 0 |
| Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs | Feb 10, 2025 | GPU | CodeCode Available | 0 |
| Accelerating Outlier-robust Rotation Estimation by Stereographic Projection | Feb 10, 2025 | GPU | —Unverified | 0 |
| MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | Feb 10, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| MERGE^3: Efficient Evolutionary Merging on Consumer-grade GPUs | Feb 9, 2025 | GPU | CodeCode Available | 1 |
| Crypto Miner Attack: GPU Remote Code Execution Attacks | Feb 9, 2025 | CPUGPU | —Unverified | 0 |
| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Feb 9, 2025 | CPUGPU | CodeCode Available | 0 |
| Saving 77% of the Parameters in Large Language Models Technical Report | Feb 9, 2025 | GPUText Generation | CodeCode Available | 2 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Feb 7, 2025 | CPUGPU | —Unverified | 0 |
| QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Feb 7, 2025 | GPUQuantization | CodeCode Available | 2 |
| WaferLLM: Large Language Model Inference at Wafer Scale | Feb 6, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers | Feb 6, 2025 | GPULarge Language Model | —Unverified | 0 |
| QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache | Feb 5, 2025 | GPU | —Unverified | 0 |
| Kozax: Flexible and Scalable Genetic Programming in JAX | Feb 5, 2025 | GPU | CodeCode Available | 1 |
| SyMANTIC: An Efficient Symbolic Regression Method for Interpretable and Parsimonious Model Discovery in Science and Beyond | Feb 5, 2025 | feature selectionGPU | CodeCode Available | 1 |
| Fast Sampling of Cosmological Initial Conditions with Gaussian Neural Posterior Estimation | Feb 5, 2025 | GPU | —Unverified | 0 |
| Robust Autonomy Emerges from Self-Play | Feb 5, 2025 | Autonomous DrivingGPU | —Unverified | 0 |
| Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set | Feb 5, 2025 | Combinatorial OptimizationCPU | —Unverified | 0 |
| Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries | Feb 4, 2025 | GPU | CodeCode Available | 3 |
| EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization | Feb 4, 2025 | GPULarge Language Model | —Unverified | 0 |
| Brief analysis of DeepSeek R1 and it's implications for Generative AI | Feb 4, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models | Feb 4, 2025 | GPUVideo Understanding | —Unverified | 0 |
| Ilargi: a GPU Compatible Factorized ML Model Training Framework | Feb 4, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| Comparative Analysis of FPGA and GPU Performance for Machine Learning-Based Track Reconstruction at LHCb | Feb 4, 2025 | GPUGraph Neural Network | CodeCode Available | 0 |
| Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity | Feb 3, 2025 | Audio DenoisingDenoising | —Unverified | 0 |
| ModServe: Scalable and Resource-Efficient Large Multimodal Model Serving | Feb 2, 2025 | DecoderGPU | —Unverified | 0 |
| Recursive generalized type-2 fuzzy radial basis function neural networks for joint position estimation and adaptive EMG-based impedance control of lower limb exoskeletons | Feb 1, 2025 | Electromyography (EMG)GPU | CodeCode Available | 0 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 |
| ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Feb 1, 2025 | GPUGSM8K | —Unverified | 0 |
| Work-Efficient Parallel Non-Maximum Suppression Kernels | Feb 1, 2025 | GPUobject-detection | CodeCode Available | 1 |
| Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques | Jan 31, 2025 | GPU | CodeCode Available | 0 |
| TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs | Jan 31, 2025 | GPU | —Unverified | 0 |
| LLM-based Affective Text Generation Quality Based on Different Quantization Values | Jan 31, 2025 | GPUQuantization | —Unverified | 0 |
| Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models | Jan 31, 2025 | GPUModel Compression | —Unverified | 0 |