SOTAVerified

Benchmarking

Papers

Showing 29812990 of 5548 papers

TitleStatusHype
MIRAI: Evaluating LLM Agents for Event Forecasting0
Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
Commute Graph Neural Networks0
PerSEval: Assessing Personalization in Text Summarizers0
Benchmarking M6 Competitors: An Analysis of Financial Metrics and Discussion of Incentives0
Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges0
Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI0
Quantum-tunnelling deep neural network for optical illusion recognition0
XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis0
Show:102550
← PrevPage 299 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified