Speed and Conversational Large Language Models: Not All Is About Tokens per Second
2025-02-23Unverified0· sign in to hype
Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.