SOTAVerified

Benchmarking

Papers

Showing 31013150 of 5548 papers

TitleStatusHype
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Extraction of clinical information from the non-invasive fetal electrocardiogram0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
Face Detection on Surveillance Images0
Face Morphing Attack Generation & Detection: A Comprehensive Survey0
FACT: Learning Governing Abstractions Behind Integer Sequences0
FactLens: Benchmarking Fine-Grained Fact Verification0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
FAIRification of MLC data0
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs0
Fairness-Aware Graph Neural Networks: A Survey0
Fairness Index Measures to Evaluate Bias in Biometric Recognition0
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections0
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning0
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension0
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension0
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph0
Fast, approximate kinetics of RNA folding0
FastDraft: How to Train Your Draft0
Fast Empirical Scenarios0
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation0
Fast Labeling and Transcription with the Speechalyzer Toolkit0
Fast Training of Deep Networks with One-Class CNNs0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration0
Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining0
Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems0
Feature Encodings for Gradient Boosting with Automunge0
Featuremetric benchmarking: Quantum computer benchmarks based on circuit features0
Feature Selection and Classification of Hyperspectral Images With Support Vector Machines0
Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach0
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation0
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation0
FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data0
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization0
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning0
FedEval: A Holistic Evaluation Framework for Federated Learning0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization0
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks0
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning0
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models0
FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge0
Show:102550
← PrevPage 63 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified