| A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | Dec 19, 2023 | GPU | CodeCode Available | 2 |
| Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving | Dec 19, 2023 | Autonomous DrivingGPU | CodeCode Available | 1 |
| GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis | Dec 18, 2023 | GPUInductive Bias | —Unverified | 0 |
| Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning | Dec 18, 2023 | Domain AdaptationGPU | —Unverified | 0 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 |
| Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs | Dec 16, 2023 | GPUScheduling | CodeCode Available | 1 |
| RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification | Dec 16, 2023 | GPUimage-classification | —Unverified | 0 |
| FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline | Dec 15, 2023 | GPUKnowledge Distillation | —Unverified | 0 |
| Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models | Dec 15, 2023 | BenchmarkingCode Summarization | CodeCode Available | 1 |
| Data-Efficient Multimodal Fusion on a Single GPU | Dec 15, 2023 | GPUImage Retrieval | CodeCode Available | 1 |
| LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data | Dec 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A parallelized cellular Potts model that enables simulations at tissue scale | Dec 14, 2023 | GPU | CodeCode Available | 0 |
| MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training | Dec 14, 2023 | GPU | CodeCode Available | 1 |
| A Sparse Cross Attention-based Graph Convolution Network with Auxiliary Information Awareness for Traffic Flow Prediction | Dec 14, 2023 | AttributeDecoder | —Unverified | 0 |
| Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning | Dec 14, 2023 | GPUparameter-efficient fine-tuning | —Unverified | 0 |
| Dataset Distillation via Adversarial Prediction Matching | Dec 14, 2023 | Dataset DistillationGPU | CodeCode Available | 0 |
| Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models | Dec 13, 2023 | GPU | —Unverified | 0 |
| Contractive error feedback for gradient compression | Dec 13, 2023 | Federated LearningGPU | —Unverified | 0 |
| EZ-CLIP: Efficient Zeroshot Video Action Recognition | Dec 13, 2023 | Action RecognitionGPU | CodeCode Available | 1 |
| CBQ: Cross-Block Quantization for Large Language Models | Dec 13, 2023 | GPUQuantization | —Unverified | 0 |
| Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI | Dec 13, 2023 | DiversityGPU | CodeCode Available | 1 |
| DTL: Disentangled Transfer Learning for Visual Recognition | Dec 13, 2023 | GPUTransfer Learning | CodeCode Available | 1 |
| Memory-Efficient Reversible Spiking Neural Networks | Dec 13, 2023 | GPU | CodeCode Available | 1 |
| LLM in a flash: Efficient Large Language Model Inference with Limited Memory | Dec 12, 2023 | CPUGPU | —Unverified | 0 |
| Neural Video Fields Editing | Dec 12, 2023 | GPUVideo Editing | —Unverified | 0 |
| XC-NAS: A New Cellular Encoding Approach for Neural Architecture Search of Multi-path Convolutional Neural Networks | Dec 12, 2023 | GPUNeural Architecture Search | —Unverified | 0 |
| Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | Dec 12, 2023 | GPUModel Compression | CodeCode Available | 1 |
| GateNet: A novel Neural Network Architecture for Automated Flow Cytometry Gating | Dec 12, 2023 | GPU | CodeCode Available | 1 |
| FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems | Dec 12, 2023 | GPU | CodeCode Available | 0 |
| Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection | Dec 12, 2023 | Anomaly DetectionGPU | —Unverified | 0 |
| RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Dec 12, 2023 | GPUMulti-Person Pose Estimation | —Unverified | 0 |
| PatchMorph: A Stochastic Deep Learning Approach for Unsupervised 3D Brain Image Registration with Small Patches | Dec 12, 2023 | GPUImage Registration | —Unverified | 0 |
| DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers | Dec 11, 2023 | DescriptiveGPU | CodeCode Available | 0 |
| BACTrack: Building Appearance Collection for Aerial Tracking | Dec 11, 2023 | GPUTemplate Matching | —Unverified | 0 |
| Compound Text-Guided Prompt Tuning via Image-Adaptive Cues | Dec 11, 2023 | Domain GeneralizationGPU | CodeCode Available | 1 |
| Stateful Large Language Model Serving with Pensieve | Dec 9, 2023 | CPUGPU | —Unverified | 0 |
| PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching | Dec 9, 2023 | GPUIn-Context Learning | —Unverified | 0 |
| PixLore: A Dataset-driven Approach to Rich Image Captioning | Dec 8, 2023 | GPUImage Captioning | CodeCode Available | 0 |
| Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections | Dec 8, 2023 | Deep LearningGPU | CodeCode Available | 1 |
| DARLEI: Deep Accelerated Reinforcement Learning with Evolutionary Intelligence | Dec 8, 2023 | CPUDiversity | —Unverified | 0 |
| Approximate Caching for Efficiently Serving Diffusion Models | Dec 7, 2023 | DenoisingGPU | —Unverified | 0 |
| PerSival: Neural-network-based visualisation for pervasive continuum-mechanical simulations in musculoskeletal biomechanics | Dec 7, 2023 | CPUGPU | —Unverified | 0 |
| SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM | Dec 6, 2023 | GPUQuantization | CodeCode Available | 1 |
| MMM: Generative Masked Motion Model | Dec 6, 2023 | GPUmodel | CodeCode Available | 1 |
| On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm | Dec 6, 2023 | Dataset DistillationDiversity | CodeCode Available | 1 |
| Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment | Dec 6, 2023 | GPUScheduling | —Unverified | 0 |
| A Hardware Evaluation Framework for Large Language Model Inference | Dec 5, 2023 | GPULanguage Modeling | —Unverified | 0 |
| FlexModel: A Framework for Interpretability of Distributed Large Language Models | Dec 5, 2023 | Distributed ComputingGPU | CodeCode Available | 1 |
| Learning to Holistically Detect Bridges from Large-Size VHR Remote Sensing Imagery | Dec 5, 2023 | GPUobject-detection | —Unverified | 0 |
| DIPR: Efficient Point Cloud Registration via Dynamic Iteration | Dec 5, 2023 | GPUPoint Cloud Registration | —Unverified | 0 |