| Scaling Down Text Encoders of Text-to-Image Diffusion Models | Mar 25, 2025 | GPUImage Generation | CodeCode Available | 2 |
| BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache | Mar 24, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping | Mar 21, 2025 | GPUMotion Estimation | CodeCode Available | 2 |
| DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Mar 20, 2025 | GPU | CodeCode Available | 2 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds | Mar 16, 2025 | GPU | CodeCode Available | 2 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 |
| OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Mar 11, 2025 | GPUMamba | CodeCode Available | 2 |
| X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation | Mar 8, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process | Mar 6, 2025 | Autonomous NavigationComputational Efficiency | CodeCode Available | 2 |
| DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models | Mar 4, 2025 | DiversityGPU | CodeCode Available | 2 |
| Streaming Video Question-Answering with In-context Video KV-Cache Retrieval | Mar 1, 2025 | GPUQuestion Answering | CodeCode Available | 2 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 |
| TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators | Feb 20, 2025 | BenchmarkingCode Generation | CodeCode Available | 2 |
| Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | Feb 19, 2025 | GPUQuantization | CodeCode Available | 2 |
| Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Feb 18, 2025 | DecoderGPU | CodeCode Available | 2 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 |
| Saving 77% of the Parameters in Large Language Models Technical Report | Feb 9, 2025 | GPUText Generation | CodeCode Available | 2 |
| QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | Feb 7, 2025 | GPUQuantization | CodeCode Available | 2 |
| WaferLLM: Large Language Model Inference at Wafer Scale | Feb 6, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks | Jan 23, 2025 | GPU | CodeCode Available | 2 |
| Recurrent Diffusion for Large-Scale Parameter Generation | Jan 20, 2025 | GPU | CodeCode Available | 2 |
| A User's Guide to KSig: GPU-Accelerated Computation of the Signature Kernel | Jan 13, 2025 | GPU | CodeCode Available | 2 |
| Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution | Jan 12, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |