SOTAVerified

Benchmarking

Papers

Showing 31013125 of 5548 papers

TitleStatusHype
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Extraction of clinical information from the non-invasive fetal electrocardiogram0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
Face Detection on Surveillance Images0
Face Morphing Attack Generation & Detection: A Comprehensive Survey0
FACT: Learning Governing Abstractions Behind Integer Sequences0
FactLens: Benchmarking Fine-Grained Fact Verification0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
FAIRification of MLC data0
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs0
Fairness-Aware Graph Neural Networks: A Survey0
Fairness Index Measures to Evaluate Bias in Biometric Recognition0
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph0
Show:102550
← PrevPage 125 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified