Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3101–3150 of 5548 papers

Title	Date	Tasks	Status
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume	Mar 8, 2024	Adversarial RobustnessBenchmarking	—Unverified
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation	Jun 24, 2020	BenchmarkingData Augmentation	—Unverified
Extensible Logging and Empirical Attainment Function for IOHexperimenter	Sep 28, 2021	Benchmarking	—Unverified
Extraction of clinical information from the non-invasive fetal electrocardiogram	May 27, 2016	BenchmarkingHeart Rate Variability	—Unverified
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis	Aug 22, 2024	Benchmarking	—Unverified
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content	Mar 13, 2025	BenchmarkingImage Generation	—Unverified
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified
Face Detection on Surveillance Images	Oct 22, 2019	BenchmarkingFace Detection	—Unverified
Face Morphing Attack Generation & Detection: A Comprehensive Survey	Nov 3, 2020	BenchmarkingFace Recognition	—Unverified
FACT: Learning Governing Abstractions Behind Integer Sequences	Sep 20, 2022	Benchmarking	—Unverified
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations	Dec 23, 2024	BenchmarkingQuestion Answering	—Unverified
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System	May 3, 2024	BenchmarkingCollaborative Filtering	—Unverified
FAIRification of MLC data	Nov 23, 2022	BenchmarkingManagement	—Unverified
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs	Oct 25, 2024	BenchmarkingFairness	—Unverified
Fairness-Aware Graph Neural Networks: A Survey	Jul 8, 2023	BenchmarkingFairness	—Unverified
Fairness Index Measures to Evaluate Bias in Biometric Recognition	Jun 19, 2023	BenchmarkingFairness	—Unverified
FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections	Nov 27, 2023	ArticlesBenchmarking	—Unverified
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning	May 12, 2025	16kBenchmarking	—Unverified
Fantastic Questions and Where to Find Them: FairytaleQA--An Authentic Dataset for Narrative Comprehension	Nov 16, 2021	BenchmarkingQuestion Answering	—Unverified
Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension	May 1, 2022	BenchmarkingQuestion Answering	—Unverified
FarsBase-KBP: A Knowledge Base Population System for the Persian Knowledge Graph	May 4, 2020	BenchmarkingEntity Linking	—Unverified
Fast, approximate kinetics of RNA folding	Jan 19, 2015	Benchmarking	—Unverified
FastDraft: How to Train Your Draft	Nov 17, 2024	BenchmarkingCode Completion	—Unverified
Fast Empirical Scenarios	Jul 8, 2023	BenchmarkingDecision Making	—Unverified
FastEnsemble: Benchmarking and Accelerating Ensemble-based Uncertainty Estimation for Image-to-Image Translation	Sep 29, 2021	BenchmarkingImage Generation	—Unverified
Fast Labeling and Transcription with the Speechalyzer Toolkit	May 1, 2012	Audio ClassificationBenchmarking	—Unverified
Fast Training of Deep Networks with One-Class CNNs	Jun 28, 2020	BenchmarkingClassification	—Unverified
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration	Dec 17, 2024	BenchmarkingFace Generation	—Unverified
Feasibility of BERT Embeddings For Domain-Specific Knowledge Mining	Jan 16, 2022	BenchmarkingLanguage Modelling	—Unverified
Feature-based Evolutionary Diversity Optimization of Discriminating Instances for Chance-constrained Optimization Problems	Jan 24, 2025	BenchmarkingDiversity	—Unverified
Feature Encodings for Gradient Boosting with Automunge	Sep 25, 2022	BenchmarkingBinarization	—Unverified
Featuremetric benchmarking: Quantum computer benchmarks based on circuit features	Apr 17, 2025	Benchmarking	—Unverified
Feature Selection and Classification of Hyperspectral Images With Support Vector Machines	Oct 15, 2007	BenchmarkingClassification	—Unverified
Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach	Apr 15, 2024	Benchmarkingfeature selection	—Unverified
FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation	Feb 19, 2024	BenchmarkingChatbot	—Unverified
FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation	Jun 26, 2025	AttributeBenchmarking	—Unverified
FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data	Aug 8, 2024	Anomaly DetectionBenchmarking	—Unverified
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization	May 8, 2025	AttributeBenchmarking	—Unverified
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning	Sep 1, 2023	BenchmarkingFederated Learning	—Unverified
FedEval: A Holistic Evaluation Framework for Federated Learning	Nov 19, 2020	BenchmarkingFederated Learning	—Unverified
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified
FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks	Jan 16, 2022	BenchmarkingFederated Learning	—Unverified
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning	Oct 11, 2023	BenchmarkingDiversity	—Unverified
FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models	Jun 11, 2025	BenchmarkingFederated Learning	—Unverified
FERA 2017 - Addressing Head Pose in the Third Facial Expression Recognition and Analysis Challenge	Feb 14, 2017	BenchmarkingFacial Action Unit Detection	—Unverified

Show:10 25 50

← PrevPage 63 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified