| A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism | Jul 22, 2024 | GPUNeural Architecture Search | —Unverified | 0 |
| Automated Road Safety: Enhancing Sign and Surface Damage Detection with AI | Jul 22, 2024 | Cloud ComputingGPU | —Unverified | 0 |
| vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jul 22, 2024 | CPUGPU | CodeCode Available | 3 |
| MedSAGa: Few-shot Memory Efficient Medical Image Segmentation using Gradient Low-Rank Projection in SAM | Jul 21, 2024 | Few-Shot LearningGPU | —Unverified | 0 |
| LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme | Jul 21, 2024 | CPUFraud Detection | —Unverified | 0 |
| GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation | Jul 20, 2024 | GPUImage Generation | CodeCode Available | 0 |
| Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference | Jul 19, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | Jul 19, 2024 | CPUGPU | —Unverified | 0 |
| Neural topology optimization: the good, the bad, and the ugly | Jul 19, 2024 | GPUMisconceptions | —Unverified | 0 |
| NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals | Jul 18, 2024 | Experimental DesignGPU | CodeCode Available | 4 |
| Forecasting GPU Performance for Deep Learning Training and Inference | Jul 18, 2024 | Deep LearningGPU | CodeCode Available | 2 |
| Attention in SRAM on Tenstorrent Grayskull | Jul 18, 2024 | CPUGPU | CodeCode Available | 1 |
| WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration | Jul 18, 2024 | GPUImage Registration | CodeCode Available | 1 |
| LiNR: Model Based Neural Retrieval on GPUs at LinkedIn | Jul 18, 2024 | AttributeGPU | —Unverified | 0 |
| Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark | Jul 18, 2024 | GPUImage Retrieval | CodeCode Available | 1 |
| SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization | Jul 17, 2024 | GPUQuantization | —Unverified | 0 |
| FastSAM-3DSlicer: A 3D-Slicer Extension for 3D Volumetric Segment Anything Model with Uncertainty Quantification | Jul 17, 2024 | CPUDomain Adaptation | CodeCode Available | 1 |
| Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale | Jul 17, 2024 | GPULAMBADA | CodeCode Available | 2 |
| RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models | Jul 17, 2024 | GPUNutrition | —Unverified | 0 |
| ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks | Jul 17, 2024 | CPUGPU | —Unverified | 0 |
| MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training | Jul 16, 2024 | CPUGPU | —Unverified | 0 |
| PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer | Jul 16, 2024 | 2D Object DetectionComputational Efficiency | —Unverified | 0 |
| Learning Multi-view Anomaly Detection | Jul 16, 2024 | Anomaly DetectionGPU | —Unverified | 0 |
| MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models | Jul 16, 2024 | GPUMultiple-choice | —Unverified | 0 |
| Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors | Jul 16, 2024 | GPUNeural Network Compression | —Unverified | 0 |
| Characterizing and Understanding HGNN Training on GPUs | Jul 16, 2024 | GPURecommendation Systems | —Unverified | 0 |
| Differentiable Voxelization and Mesh Morphing | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| Differentiable Neural-Integrated Meshfree Method for Forward and Inverse Modeling of Finite Strain Hyperelasticity | Jul 15, 2024 | GPUPhysics-informed machine learning | CodeCode Available | 0 |
| Separable Operator Networks | Jul 15, 2024 | BenchmarkingGPU | CodeCode Available | 1 |
| From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients | Jul 15, 2024 | GPU | CodeCode Available | 2 |
| NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis | Jul 15, 2024 | GPUNeRF | —Unverified | 0 |
| SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation | Jul 15, 2024 | GPUReinforcement Learning (RL) | —Unverified | 0 |
| LeanQuant: Accurate Large Language Model Quantization with Loss-Error-Aware Grid | Jul 14, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Disrupting Diffusion-based Inpainters with Semantic Digression | Jul 14, 2024 | GPUMisinformation | —Unverified | 0 |
| LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation | Jul 13, 2024 | GPU | CodeCode Available | 1 |
| Enhancing Training Efficiency Using Packing with Flash Attention | Jul 12, 2024 | GPU | —Unverified | 0 |
| Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators | Jul 12, 2024 | Code GenerationGPU | —Unverified | 0 |
| FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging | Jul 11, 2024 | DiversityFederated Learning | CodeCode Available | 1 |
| FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jul 11, 2024 | GPUQuantization | CodeCode Available | 12 |
| Gradient Boosting Reinforcement Learning | Jul 11, 2024 | GPUreinforcement-learning | CodeCode Available | 2 |
| EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Jul 10, 2024 | GPUQuantization | CodeCode Available | 3 |
| Analyzing Machine Learning Performance in a Hybrid Quantum Computing and HPC Environment | Jul 10, 2024 | CPUGPU | —Unverified | 0 |
| INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers | Jul 10, 2024 | GPU | —Unverified | 0 |
| MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis | Jul 10, 2024 | GPUImage Generation | CodeCode Available | 2 |
| Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search | Jul 10, 2024 | Few-Shot LearningGPU | CodeCode Available | 0 |
| Inference Performance Optimization for Large Language Models on CPUs | Jul 10, 2024 | CPUGPU | CodeCode Available | 3 |
| HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation | Jul 10, 2024 | GPUSemantic Segmentation | CodeCode Available | 0 |
| Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction | Jul 10, 2024 | DecoderGPU | —Unverified | 0 |
| 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes | Jul 9, 2024 | GPU | —Unverified | 0 |
| Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task | Jul 9, 2024 | GPUText-to-Video Generation | CodeCode Available | 0 |