| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer | Feb 17, 2025 | GPUQuantization | —Unverified | 0 |
| Massively Scaling Explicit Policy-conditioned Value Functions | Feb 17, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer Gate | Feb 17, 2025 | GPUMixture-of-Experts | CodeCode Available | 0 |
| JExplore: Design Space Exploration Tool for Nvidia Jetson Boards | Feb 16, 2025 | BenchmarkingGPU | CodeCode Available | 0 |
| TPCap: Unlocking Zero-Shot Image Captioning with Trigger-Augmented and Multi-Modal Purification Modules | Feb 16, 2025 | GPUImage Captioning | —Unverified | 0 |
| An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law | Feb 14, 2025 | Feature CompressionGPU | —Unverified | 0 |
| InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | Feb 13, 2025 | GPULanguage Modeling | —Unverified | 0 |
| E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization | Feb 13, 2025 | Computational EfficiencyDenoising | —Unverified | 0 |
| Efficient solution validation of constraint satisfaction problems on neuromorphic hardware: the case of Sudoku puzzles | Feb 13, 2025 | GPU | CodeCode Available | 0 |