SOTAVerified

Benchmarking

Papers

Showing 47514775 of 5548 papers

TitleStatusHype
Global Prediction of COVID-19 Variant Emergence Using Dynamics-Informed Graph Neural NetworksCode0
GiantHunter: Accurate detection of giant virus in metagenomic data using reinforcement-learning and Monte Carlo tree searchCode0
Benchmarking Temporal Reasoning and Alignment Across Chinese DynastiesCode0
Safe Trajectory Generation for Complex Urban Environments Using Spatio-temporal Semantic CorridorCode0
Natural Image Noise DatasetCode0
Benchmarking Suite for Synthetic Aperture Radar Imagery Anomaly Detection (SARIAD) AlgorithmsCode0
SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness CalibrationCode0
Geological Inference from Textual Data using Word EmbeddingsCode0
Flexible Generation of Preference Data for Recommendation AnalysisCode0
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance LabelsCode0
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority LanguagesCode0
The LOCATA Challenge: Acoustic Source Localization and TrackingCode0
Generative Models for Fast Simulation of Cherenkov Detectors at the Electron-Ion ColliderCode0
A Meta-Analysis of the Anomaly Detection ProblemCode0
On the Measure of IntelligenceCode0
Generalization and Regularization in DQNCode0
Automatic Resolution of Domain Name DisputesCode0
Mind the XAI Gap: A Human-Centered LLM Framework for Democratizing Explainable AICode0
Automatic benchmarking of large multimodal models via iterative experiment programmingCode0
GenderBench: Evaluation Suite for Gender Biases in LLMsCode0
MineRL: A Large-Scale Dataset of Minecraft DemonstrationsCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in ExplanationsCode0
Mining-Gym: A Configurable RL Benchmarking Environment for Truck Dispatch SchedulingCode0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
Show:102550
← PrevPage 191 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified