SOTAVerified

Benchmarking

Papers

Showing 30513060 of 5548 papers

TitleStatusHype
LLMeBench: A Flexible Framework for Accelerating LLMs BenchmarkingCode1
Benchmarking LLM powered Chatbots: Methods and Metrics0
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARKCode1
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?0
XFlow: Benchmarking Flow Behaviors over GraphsCode1
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)0
Precise Benchmarking of Explainable AI Attribution MethodsCode0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and RetrievalCode0
RobustMQ: Benchmarking Robustness of Quantized Models0
A Survey of Spanish Clinical Language Models0
Show:102550
← PrevPage 306 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified