SOTAVerified

Benchmarking

Papers

Showing 9911000 of 5548 papers

TitleStatusHype
xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods0
LadderMIL: Multiple Instance Learning with Coarse-to-Fine Self-Distillation0
Dynamic benchmarking framework for LLM-based conversational data capture0
Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented GenerationCode4
Evalita-LLM: Benchmarking Large Language Models on Italian0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models0
A comparison of translation performance between DeepL and SupertextCode0
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning DatasetsCode0
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities0
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation0
Show:102550
← PrevPage 100 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified