SOTAVerified

Benchmarking

Papers

Showing 24312440 of 5548 papers

TitleStatusHype
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model AgentsCode2
Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation0
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering0
SciAssess: Benchmarking LLM Proficiency in Scientific Literature AnalysisCode2
Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground0
Classification of the Fashion-MNIST Dataset on a Quantum Computer0
Model Lakes0
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost BenchmarksCode0
Show:102550
← PrevPage 244 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified