| FluidML: Fast and Memory Efficient Inference Optimization | Nov 14, 2024 | Autonomous VehiclesInference Optimization | —Unverified | 0 |
| A Temporal Linear Network for Time Series Forecasting | Oct 28, 2024 | Computational EfficiencyInference Optimization | CodeCode Available | 0 |
| LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models | Oct 17, 2024 | Inference OptimizationNetwork Pruning | CodeCode Available | 0 |
| EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge | Oct 16, 2024 | Deep LearningInference Optimization | —Unverified | 0 |
| Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning | Sep 2, 2024 | Inference OptimizationLanguage Modeling | —Unverified | 0 |
| The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Aug 23, 2024 | Computational EfficiencyInference Optimization | —Unverified | 0 |
| An approach to optimize inference of the DIART speaker diarization pipeline | Aug 5, 2024 | Inference OptimizationKnowledge Distillation | —Unverified | 0 |
| LLaSA: Large Language and E-Commerce Shopping Assistant | Aug 4, 2024 | Inference OptimizationSpecificity | CodeCode Available | 0 |
| Patched MOA: optimizing inference for diverse software development tasks | Jul 26, 2024 | Inference Optimization | CodeCode Available | 0 |
| Inference Optimization of Foundation Models on AI Accelerators | Jul 12, 2024 | Inference OptimizationModel Compression | —Unverified | 0 |