| Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling | Feb 20, 2025 | DecoderGPU | CodeCode Available | 0 |
| Towards Efficient Automatic Self-Pruning of Large Language Models | Feb 20, 2025 | GPU | —Unverified | 0 |
| Distributed U-net model and Image Segmentation for Lung Cancer Detection | Feb 20, 2025 | CPUFederated Learning | —Unverified | 0 |
| Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective | Feb 20, 2025 | CPUGPU | —Unverified | 0 |
| ParallelComp: Parallel Long-Context Compressor for Length Extrapolation | Feb 20, 2025 | 4k8k | —Unverified | 0 |
| FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference | Feb 19, 2025 | GPU | —Unverified | 0 |
| Learning conformational ensembles of proteins based on backbone geometry | Feb 19, 2025 | GPU | —Unverified | 0 |
| Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Feb 19, 2025 | GPURetrieval | —Unverified | 0 |
| GPU-Friendly Laplacian Texture Blending | Feb 19, 2025 | GPU | —Unverified | 0 |
| LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation | Feb 19, 2025 | GPUparameter-efficient fine-tuning | —Unverified | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Feb 19, 2025 | GPU | —Unverified | 0 |
| MEX: Memory-efficient Approach to Referring Multi-Object Tracking | Feb 19, 2025 | Autonomous DrivingGPU | —Unverified | 0 |
| Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs | Feb 19, 2025 | GPU | —Unverified | 0 |
| An Experimental Study of SOTA LiDAR Segmentation Models | Feb 18, 2025 | GPUMotion Compensation | —Unverified | 0 |
| BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference | Feb 18, 2025 | GPULanguage Modeling | —Unverified | 0 |
| SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Feb 18, 2025 | GPUSafety Alignment | CodeCode Available | 0 |
| GPU Memory Usage Optimization for Backward Propagation in Deep Network Training | Feb 18, 2025 | GPU | —Unverified | 0 |
| Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer | Feb 17, 2025 | GPUQuantization | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| Real-time Neural Rendering of LiDAR Point Clouds | Feb 17, 2025 | GPUNeural Rendering | —Unverified | 0 |
| Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer Gate | Feb 17, 2025 | GPUMixture-of-Experts | CodeCode Available | 0 |
| Massively Scaling Explicit Policy-conditioned Value Functions | Feb 17, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations | Feb 17, 2025 | GPUMetric Learning | —Unverified | 0 |
| TPCap: Unlocking Zero-Shot Image Captioning with Trigger-Augmented and Multi-Modal Purification Modules | Feb 16, 2025 | GPUImage Captioning | —Unverified | 0 |
| JExplore: Design Space Exploration Tool for Nvidia Jetson Boards | Feb 16, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law | Feb 14, 2025 | Feature CompressionGPU | —Unverified | 0 |
| E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization | Feb 13, 2025 | Computational EfficiencyDenoising | —Unverified | 0 |
| On LLM-generated Logic Programs and their Inference Execution Methods | Feb 13, 2025 | GPU | —Unverified | 0 |
| Latents of latents to delineate pixels: hybrid Matryoshka autoencoder-to-U-Net pairing for segmenting large medical images in GPU-poor and low-data regimes | Feb 13, 2025 | DecoderGPU | —Unverified | 0 |
| Efficient solution validation of constraint satisfaction problems on neuromorphic hardware: the case of Sudoku puzzles | Feb 13, 2025 | GPU | CodeCode Available | 0 |
| InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Feb 13, 2025 | GPULanguage Modeling | —Unverified | 0 |
| CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | Feb 13, 2025 | 8kGPU | CodeCode Available | 0 |
| High-Throughput SAT Sampling | Feb 12, 2025 | GPUvalid | CodeCode Available | 0 |
| Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2 | Feb 12, 2025 | GPU | —Unverified | 0 |
| Numerical Schemes for Signature Kernels | Feb 12, 2025 | GPU | CodeCode Available | 0 |
| Inference-time sparse attention with asymmetric indexing | Feb 12, 2025 | GPU | —Unverified | 0 |
| Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers | Feb 12, 2025 | BlockingGPU | —Unverified | 0 |
| Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving | Feb 11, 2025 | Autonomous DrivingComputational Efficiency | —Unverified | 0 |
| Memory Analysis on the Training Course of DeepSeek Models | Feb 11, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation | Feb 11, 2025 | class-incremental learningClass Incremental Learning | —Unverified | 0 |
| Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs | Feb 10, 2025 | GPU | CodeCode Available | 0 |
| MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing | Feb 10, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Accelerating Outlier-robust Rotation Estimation by Stereographic Projection | Feb 10, 2025 | GPU | —Unverified | 0 |
| Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline | Feb 9, 2025 | CPUGPU | CodeCode Available | 0 |
| Crypto Miner Attack: GPU Remote Code Execution Attacks | Feb 9, 2025 | CPUGPU | —Unverified | 0 |
| fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving | Feb 7, 2025 | CPUGPU | —Unverified | 0 |
| InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers | Feb 6, 2025 | GPULarge Language Model | —Unverified | 0 |
| Unrealized Expectations: Comparing AI Methods vs Classical Algorithms for Maximum Independent Set | Feb 5, 2025 | Combinatorial OptimizationCPU | —Unverified | 0 |
| Fast Sampling of Cosmological Initial Conditions with Gaussian Neural Posterior Estimation | Feb 5, 2025 | GPU | —Unverified | 0 |