| LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism | Apr 15, 2024 | GPU | CodeCode Available | 2 |
| Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic | Apr 10, 2024 | GPU | CodeCode Available | 2 |
| OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting | Apr 4, 2024 | GPU | CodeCode Available | 2 |
| Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures | Apr 3, 2024 | CPUGPU | CodeCode Available | 2 |
| Accelerating Transformer Pre-training with 2:4 Sparsity | Apr 2, 2024 | GPU | CodeCode Available | 2 |
| Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration | Apr 2, 2024 | AllDecoder | CodeCode Available | 2 |
| Efficient Modulation for Vision Networks | Mar 29, 2024 | GPU | CodeCode Available | 2 |
| Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction | Mar 27, 2024 | 3D Generation3DGS | CodeCode Available | 2 |
| Fully-fused Multi-Layer Perceptrons on Intel Data Center GPUs | Mar 26, 2024 | GPUImage Compression | CodeCode Available | 2 |
| Efficient Video Object Segmentation via Modulated Cross-Attention Memory | Mar 26, 2024 | GPUObject | CodeCode Available | 2 |
| Invertible Diffusion Models for Compressed Sensing | Mar 25, 2024 | compressed sensingGPU | CodeCode Available | 2 |
| YOLOv5-6D: Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Mar 22, 2024 | 6D Pose Estimation using RGBGPU | CodeCode Available | 2 |
| Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation | Mar 12, 2024 | Cross-Modal RetrievalGPU | CodeCode Available | 2 |
| Characterization of Large Language Model Development in the Datacenter | Mar 12, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Scalable Spatiotemporal Prediction with Bayesian Neural Fields | Mar 12, 2024 | Bayesian InferenceDemand Forecasting | CodeCode Available | 2 |
| Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System | Mar 11, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Mar 8, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction | Mar 8, 2024 | Audio GenerationComputational Efficiency | CodeCode Available | 2 |
| Birbal: An efficient 7B instruct-model fine-tuned with curated datasets | Mar 4, 2024 | GPU | CodeCode Available | 2 |
| MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection | Mar 4, 2024 | GPUMamba | CodeCode Available | 2 |
| Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis | Mar 3, 2024 | 3D Parameter-Efficient Fine-Tuning for ClassificationGPU | CodeCode Available | 2 |
| WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis | Feb 29, 2024 | DiversityGPU | CodeCode Available | 2 |
| DEYO: DETR with YOLO for End-to-End Object Detection | Feb 26, 2024 | DecoderGPU | CodeCode Available | 2 |
| Fast Adversarial Attacks on Language Models In One GPU Minute | Feb 23, 2024 | Adversarial AttackComputational Efficiency | CodeCode Available | 2 |
| Me LLaMA: Foundation Large Language Models for Medical Applications | Feb 20, 2024 | Few-Shot LearningGPU | CodeCode Available | 2 |
| QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference | Feb 15, 2024 | GPUQuantization | CodeCode Available | 2 |
| On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference | Feb 9, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space | Feb 7, 2024 | Concept AlignmentGPU | CodeCode Available | 2 |
| 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes | Feb 5, 2024 | GPUNovel View Synthesis | CodeCode Available | 2 |
| Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing | Jan 29, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design | Jan 29, 2024 | CPUGPU | CodeCode Available | 2 |
| Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | Jan 17, 2024 | GPUImage Classification | CodeCode Available | 2 |
| Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction | Jan 12, 2024 | Bandwidth ExtensionCPU | CodeCode Available | 2 |
| Low-resource finetuning of foundation models beats state-of-the-art in histopathology | Jan 9, 2024 | GPUSelf-Supervised Learning | CodeCode Available | 2 |
| WidthFormer: Toward Efficient Transformer-based BEV View Transformation | Jan 8, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | Jan 5, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 2 |
| CoMoSVC: Consistency Model-based Singing Voice Conversion | Jan 3, 2024 | GPUmodel | CodeCode Available | 2 |
| MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | Dec 29, 2023 | GPULanguage Modeling | CodeCode Available | 2 |
| Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis | Dec 28, 2023 | 8kFeature Splatting | CodeCode Available | 2 |
| Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference | Dec 23, 2023 | GPUHigh-Level Synthesis | CodeCode Available | 2 |
| A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | Dec 19, 2023 | GPU | CodeCode Available | 2 |
| XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX | Dec 19, 2023 | DiversityGPU | CodeCode Available | 2 |
| mLoRA: Fine-Tuning LoRA Adapters via Highly-Efficient Pipeline Parallelism in Multiple GPUs | Dec 5, 2023 | GPULarge Language Model | CodeCode Available | 2 |
| CoLLiE: Collaborative Training of Large Language Models in an Efficient Way | Dec 1, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 |
| XLB: A differentiable massively parallel lattice Boltzmann library in Python | Nov 27, 2023 | CPUGPU | CodeCode Available | 2 |
| Learning to Fly in Seconds | Nov 22, 2023 | GPUReinforcement Learning (RL) | CodeCode Available | 2 |
| Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model | Nov 22, 2023 | DenoisingGPU | CodeCode Available | 2 |
| JaxMARL: Multi-Agent RL Environments and Algorithms in JAX | Nov 16, 2023 | CPUGPU | CodeCode Available | 2 |
| Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | Nov 14, 2023 | GPUPosition | CodeCode Available | 2 |
| Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | Nov 7, 2023 | GPU | CodeCode Available | 2 |