SOTAVerified

Benchmarking

Papers

Showing 25812590 of 5548 papers

TitleStatusHype
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration0
On the Loss of Context-awareness in General Instruction Fine-tuningCode0
Imagining and building wise machines: The centrality of AI metacognition0
Benchmarking XAI Explanations with Human-Aligned Evaluations0
SinaTools: Open Source Toolkit for Arabic Natural Language Processing0
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models0
FEET: A Framework for Evaluating Embedding TechniquesCode0
Artificial Intelligence for Microbiology and Microbiome Research0
Benchmarking Bias in Large Language Models during Role-Playing0
Cityscape-Adverse: Benchmarking Robustness of Semantic Segmentation with Realistic Scene Modifications via Diffusion-Based Image EditingCode0
Show:102550
← PrevPage 259 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified