| FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification | Nov 17, 2023 | Cloud ComputingGPU | —Unverified | 0 |
| EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction | Nov 16, 2023 | 3D ReconstructionGPU | —Unverified | 0 |
| JaxMARL: Multi-Agent RL Environments and Algorithms in JAX | Nov 16, 2023 | CPUGPU | CodeCode Available | 2 |
| Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation | Nov 16, 2023 | CPUGPU | —Unverified | 0 |
| DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model | Nov 15, 2023 | 3D GenerationDenoising | —Unverified | 0 |
| 4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters | Nov 15, 2023 | 4k8k | CodeCode Available | 1 |
| Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | Nov 14, 2023 | GPUPosition | CodeCode Available | 2 |
| A GPU-Accelerated Moving-Horizon Algorithm for Training Deep Classification Trees on Large Datasets | Nov 12, 2023 | GPU | —Unverified | 0 |
| InfMLLM: A Unified Framework for Visual-Language Tasks | Nov 12, 2023 | GPUImage Captioning | CodeCode Available | 1 |
| PerceptionGPT: Effectively Fusing Visual Perception into LLM | Nov 11, 2023 | GPU | —Unverified | 0 |
| LCM-LoRA: A Universal Stable-Diffusion Acceleration Module | Nov 9, 2023 | GPUImage Generation | CodeCode Available | 4 |
| GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition | Nov 8, 2023 | CPUDecoder | CodeCode Available | 1 |
| Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs | Nov 8, 2023 | GPU | —Unverified | 0 |
| LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models | Nov 8, 2023 | 8kGPU | CodeCode Available | 5 |
| A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction | Nov 8, 2023 | BenchmarkingClick-Through Rate Prediction | CodeCode Available | 0 |
| DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining | Nov 8, 2023 | GPUMRPC | —Unverified | 0 |
| Input Reconstruction Attack against Vertical Federated Large Language Models | Nov 7, 2023 | Federated LearningGPU | —Unverified | 0 |
| Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation | Nov 7, 2023 | GPUreinforcement-learning | —Unverified | 0 |
| Prompt Cache: Modular Attention Reuse for Low-Latency Inference | Nov 7, 2023 | CPUGPU | CodeCode Available | 1 |
| Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | Nov 7, 2023 | GPU | CodeCode Available | 2 |
| Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models | Nov 7, 2023 | GPUQuantization | —Unverified | 0 |
| Distributed Matrix-Based Sampling for Graph Neural Network Training | Nov 6, 2023 | GPUGraph Neural Network | —Unverified | 0 |
| Weight-Sharing Regularization | Nov 6, 2023 | GPU | CodeCode Available | 0 |
| S-LoRA: Serving Thousands of Concurrent LoRA Adapters | Nov 6, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 3 |
| VR-NeRF: High-Fidelity Virtualized Walkable Spaces | Nov 5, 2023 | 2kGPU | CodeCode Available | 1 |
| Ultra-Long Sequence Distributed Transformer | Nov 4, 2023 | GPU | —Unverified | 0 |
| Augmentation is AUtO-Net: Augmentation-Driven Contrastive Multiview Learning for Medical Image Segmentation | Nov 2, 2023 | GPUImage Segmentation | —Unverified | 0 |
| Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU | Nov 1, 2023 | GPU | —Unverified | 0 |
| Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning | Nov 1, 2023 | GPUOperator learning | CodeCode Available | 0 |
| A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain | Oct 31, 2023 | BenchmarkingDiagnostic | —Unverified | 0 |
| LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B | Oct 31, 2023 | GPURed Teaming | —Unverified | 0 |
| In Search of Lost Online Test-time Adaptation: A Survey | Oct 31, 2023 | BenchmarkingGPU | CodeCode Available | 1 |
| StairNet: Visual Recognition of Stairs for Human-Robot Locomotion | Oct 31, 2023 | CPUDeep Learning | —Unverified | 0 |
| Network Contention-Aware Cluster Scheduling with Reinforcement Learning | Oct 31, 2023 | GPUreinforcement-learning | CodeCode Available | 1 |
| DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection | Oct 30, 2023 | DenoisingGPU | CodeCode Available | 1 |
| Learning to love diligent trolls: Accounting for rater effects in the dialogue safety task | Oct 30, 2023 | Automated Essay ScoringChatbot | CodeCode Available | 0 |
| FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Oct 30, 2023 | GPUPose Estimation | —Unverified | 0 |
| Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks | Oct 30, 2023 | GPU | CodeCode Available | 1 |
| PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices | Oct 30, 2023 | Bayesian OptimizationCPU | —Unverified | 0 |
| Bespoke Solvers for Generative Flow Models | Oct 29, 2023 | GPU | —Unverified | 0 |
| SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | Oct 29, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | Oct 29, 2023 | GPUQuantization | CodeCode Available | 2 |
| The Synergy of Speculative Decoding and Batching in Serving Large Language Models | Oct 28, 2023 | GPUText Generation | —Unverified | 0 |
| Punica: Multi-Tenant LoRA Serving | Oct 28, 2023 | GPU | CodeCode Available | 3 |
| OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression | Oct 27, 2023 | BenchmarkingGPU | CodeCode Available | 0 |
| FP8-LM: Training FP8 Large Language Models | Oct 27, 2023 | GPU | CodeCode Available | 2 |
| LLMSTEP: LLM proofstep suggestions in Lean | Oct 27, 2023 | CPUGPU | CodeCode Available | 1 |
| Real-Time Neural Materials using Block-Compressed Features | Oct 26, 2023 | DecoderGPU | —Unverified | 0 |
| PockEngine: Sparse and Efficient Fine-tuning in a Pocket | Oct 26, 2023 | CPUGPU | —Unverified | 0 |
| TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs | Oct 25, 2023 | Autonomous DrivingGPU | CodeCode Available | 3 |