SOTAVerified

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

2025-02-23Unverified0· sign in to hype

Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

Tasks

Reproductions