SOTAVerified

Benchmarking

Papers

Showing 10511060 of 5548 papers

TitleStatusHype
FORB: A Flat Object Retrieval Benchmark for Universal Image EmbeddingCode1
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven InterpretationCode1
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
GNNs as Predictors of Agentic Workflow PerformancesCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive ScenariosCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement LearningCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
Benchmarking: Past, Present and FutureCode1
Show:102550
← PrevPage 106 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified