| DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis | Feb 10, 2025 | CPUInference Optimization | —Unverified | 0 |
| Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals | Jan 28, 2025 | Inference Optimization | —Unverified | 0 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| FluidML: Fast and Memory Efficient Inference Optimization | Nov 14, 2024 | Autonomous VehiclesInference Optimization | —Unverified | 0 |
| A Temporal Linear Network for Time Series Forecasting | Oct 28, 2024 | Computational EfficiencyInference Optimization | CodeCode Available | 0 |
| LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models | Oct 17, 2024 | Inference OptimizationNetwork Pruning | CodeCode Available | 0 |
| EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge | Oct 16, 2024 | Deep LearningInference Optimization | —Unverified | 0 |
| CycleBNN: Cyclic Precision Training in Binary Neural Networks | Sep 28, 2024 | Inference Optimization | CodeCode Available | 2 |
| Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning | Sep 2, 2024 | Inference OptimizationLanguage Modeling | —Unverified | 0 |
| The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Aug 23, 2024 | Computational EfficiencyInference Optimization | —Unverified | 0 |