SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3051–3060 of 5548 papers

Title	Date	Tasks	Status	Hype
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking	Aug 9, 2023	BenchmarkingFew-Shot Learning	CodeCode Available	1
Benchmarking LLM powered Chatbots: Methods and Metrics	Aug 8, 2023	BenchmarkingChatbot	—Unverified	0
Application-Oriented Benchmarking of Quantum Generative Learning Using QUARK	Aug 8, 2023	BenchmarkingGPU	CodeCode Available	1
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?	Aug 8, 2023	BenchmarkingCollaborative Filtering	—Unverified	0
XFlow: Benchmarking Flow Behaviors over Graphs	Aug 7, 2023	Benchmarking	CodeCode Available	1
Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)	Aug 6, 2023	BenchmarkingImage Segmentation	—Unverified	0
Precise Benchmarking of Explainable AI Attribution Methods	Aug 6, 2023	Benchmarkingimage-classification	CodeCode Available	0
ChatGPT for GTFS: Benchmarking LLMs on GTFS Understanding and Retrieval	Aug 4, 2023	BenchmarkingInformation Retrieval	CodeCode Available	0
RobustMQ: Benchmarking Robustness of Quantized Models	Aug 4, 2023	Adversarial RobustnessBenchmarking	—Unverified	0
A Survey of Spanish Clinical Language Models	Aug 4, 2023	BenchmarkingSurvey	—Unverified	0

Show:10 25 50

← PrevPage 306 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified