Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3101–3125 of 5548 papers

Title	Date	Tasks	Status
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume	Mar 8, 2024	Adversarial RobustnessBenchmarking	—Unverified
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation	Jun 24, 2020	BenchmarkingData Augmentation	—Unverified
Extensible Logging and Empirical Attainment Function for IOHexperimenter	Sep 28, 2021	Benchmarking	—Unverified
Extraction of clinical information from the non-invasive fetal electrocardiogram	May 27, 2016	BenchmarkingHeart Rate Variability	—Unverified
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis	Aug 22, 2024	Benchmarking	—Unverified
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content	Mar 13, 2025	BenchmarkingImage Generation	—Unverified
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified
Face Detection on Surveillance Images	Oct 22, 2019	BenchmarkingFace Detection	—Unverified
Face Morphing Attack Generation & Detection: A Comprehensive Survey	Nov 3, 2020	BenchmarkingFace Recognition	—Unverified
FACT: Learning Governing Abstractions Behind Integer Sequences	Sep 20, 2022	Benchmarking	—Unverified
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations	Dec 23, 2024	BenchmarkingQuestion Answering	—Unverified
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System	May 3, 2024	BenchmarkingCollaborative Filtering	—Unverified
FAIRification of MLC data	Nov 23, 2022	BenchmarkingManagement	—Unverified
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs	Oct 25, 2024	BenchmarkingFairness	—Unverified
Fairness-Aware Graph Neural Networks: A Survey	Jul 8, 2023	BenchmarkingFairness	—Unverified
Fairness Index Measures to Evaluate Bias in Biometric Recognition	Jun 19, 2023	BenchmarkingFairness	—Unverified
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections	Nov 27, 2023	ArticlesBenchmarking	—Unverified
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning	May 12, 2025	16kBenchmarking	—Unverified
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension	Nov 16, 2021	BenchmarkingQuestion Answering	—Unverified
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension	May 1, 2022	BenchmarkingQuestion Answering	—Unverified
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph	May 4, 2020	BenchmarkingEntity Linking	—Unverified

Show:10 25 50

← PrevPage 125 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified