| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache | Mar 24, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping | Mar 21, 2025 | GPUMotion Estimation | CodeCode Available | 2 |
| DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Mar 20, 2025 | GPU | CodeCode Available | 2 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds | Mar 16, 2025 | GPU | CodeCode Available | 2 |
| OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Mar 11, 2025 | GPUMamba | CodeCode Available | 2 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 |
| X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation | Mar 8, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process | Mar 6, 2025 | Autonomous NavigationComputational Efficiency | CodeCode Available | 2 |
| DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models | Mar 4, 2025 | DiversityGPU | CodeCode Available | 2 |
| Streaming Video Question-Answering with In-context Video KV-Cache Retrieval | Mar 1, 2025 | GPUQuestion Answering | CodeCode Available | 2 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 |
| TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators | Feb 20, 2025 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | Feb 19, 2025 | GPUQuantization | CodeCode Available | 2 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 |
| Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Feb 18, 2025 | DecoderGPU | CodeCode Available | 2 |
| Saving 77% of the Parameters in Large Language Models Technical Report | Feb 9, 2025 | GPUText Generation | CodeCode Available | 2 |
| QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Feb 7, 2025 | GPUQuantization | CodeCode Available | 2 |
| WaferLLM: Large Language Model Inference at Wafer Scale | Feb 6, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks | Jan 23, 2025 | GPU | CodeCode Available | 2 |
| Recurrent Diffusion for Large-Scale Parameter Generation | Jan 20, 2025 | GPU | CodeCode Available | 2 |
| A User's Guide to KSig: GPU-Accelerated Computation of the Signature Kernel | Jan 13, 2025 | GPU | CodeCode Available | 2 |
| Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution | Jan 12, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios | Jan 10, 2025 | Aerial Scene ClassificationCPU | CodeCode Available | 2 |
| MBQ: Modality-Balanced Quantization for Large Vision-Language Models | Dec 27, 2024 | GPUQuantization | CodeCode Available | 2 |
| ArchesWeather & ArchesWeatherGen: a deterministic and generative model for efficient ML weather forecasting | Dec 17, 2024 | GPUWeather Forecasting | CodeCode Available | 2 |
| FlashRNN: Optimizing Traditional RNNs on Modern Hardware | Dec 10, 2024 | GPULogical Reasoning | CodeCode Available | 2 |
| ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks | Dec 9, 2024 | GPUImitation Learning | CodeCode Available | 2 |
| Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction | Dec 6, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| Playable Game Generation | Dec 1, 2024 | GPUImage Generation | CodeCode Available | 2 |
| Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification | Dec 1, 2024 | GPUVisual Question Answering | CodeCode Available | 2 |
| Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments | Nov 30, 2024 | Autonomous NavigationGPU | CodeCode Available | 2 |
| Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators | Nov 27, 2024 | GPU | CodeCode Available | 2 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 |
| GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving | Nov 19, 2024 | 3D Object DetectionAutonomous Driving | CodeCode Available | 2 |
| AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models | Nov 11, 2024 | Audio Super-ResolutionGPU | CodeCode Available | 2 |
| Brain Tumour Removing and Missing Modality Generation using 3D WDM | Nov 7, 2024 | GPUPrediction | CodeCode Available | 2 |
| Real-Time Polygonal Semantic Mapping for Humanoid Robot Stair Climbing | Nov 4, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 |
| DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | Nov 4, 2024 | GPURobot Manipulation | CodeCode Available | 2 |
| RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | Nov 4, 2024 | Answer GenerationGPU | CodeCode Available | 2 |
| The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains | Oct 31, 2024 | GPUPhilosophy | CodeCode Available | 2 |
| Very fast Bayesian Additive Regression Trees on GPU | Oct 30, 2024 | CPUGPU | CodeCode Available | 2 |
| $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources | Oct 30, 2024 | GPU | CodeCode Available | 2 |
| LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search | Oct 24, 2024 | ClusteringGPU | CodeCode Available | 2 |
| Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step | Oct 19, 2024 | Conditional Image GenerationGPU | CodeCode Available | 2 |
| nvTorchCam: An Open-source Library for Camera-Agnostic Differentiable Geometric Vision | Oct 15, 2024 | Deep LearningGPU | CodeCode Available | 2 |
| GS^3: Efficient Relighting with Triple Gaussian Splatting | Oct 15, 2024 | GPU | CodeCode Available | 2 |
| Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Sep 25, 2024 | GPUToken Reduction | CodeCode Available | 2 |