SOTAVerified

Benchmarking

Papers

Showing 971980 of 5548 papers

TitleStatusHype
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoringCode1
Deep learning model solves change point detection for multiple change typesCode1
Fast hyperboloid decision tree algorithmsCode1
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness MethodsCode1
Working Memory Capacity of ChatGPT: An Empirical StudyCode1
Benchmarking Natural Language Understanding Services for building Conversational AgentsCode1
DependEval: Benchmarking LLMs for Repository Dependency UnderstandingCode1
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4Code1
Benchmarking Neural Network Generalization for Grammar InductionCode1
Evaluation of large language models for discovery of gene set functionCode1
Show:102550
← PrevPage 98 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified