Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2726–2750 of 5548 papers

Title	Date	Tasks	Status
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content	Mar 13, 2025	BenchmarkingImage Generation	—Unverified
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis	Aug 22, 2024	Benchmarking	—Unverified
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization	May 23, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow	Feb 14, 2025	Benchmarking	—Unverified
Generalized Conflict-directed Search for Optimal Ordering Problems	Mar 31, 2021	BenchmarkingScheduling	—Unverified
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression	Jun 25, 2025	BenchmarkingManagement	—Unverified
General Scales Unlock AI Evaluation with Explanatory and Predictive Power	Mar 9, 2025	BenchmarkingSpecificity	—Unverified
Extraction of clinical information from the non-invasive fetal electrocardiogram	May 27, 2016	BenchmarkingHeart Rate Variability	—Unverified
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey	Jun 5, 2020	BenchmarkingExperimental Design	—Unverified
Extensible Logging and Empirical Attainment Function for IOHexperimenter	Sep 28, 2021	Benchmarking	—Unverified
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation	Jun 24, 2020	BenchmarkingData Augmentation	—Unverified
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design	Apr 14, 2025	BenchmarkingLanguage Modeling	—Unverified
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking	Nov 6, 2024	Benchmarking	—Unverified
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow	Dec 18, 2024	Benchmarking	—Unverified
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?	May 27, 2020	BenchmarkingFraud Detection	—Unverified
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking	Apr 7, 2025	BenchmarkingImage Generation	—Unverified
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors	Jun 29, 2023	Benchmarking	—Unverified
A Survey of Parameters Associated with the Quality of Benchmarks in NLP	Oct 14, 2022	Benchmarking	—Unverified
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition	Mar 24, 2025	BenchmarkingFood Recognition	—Unverified
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding	Jun 9, 2025	BenchmarkingVideo Compression	—Unverified
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Feb 4, 2025	BenchmarkingDecision Making	—Unverified
AI Idea Bench 2025: AI Research Idea Generation Benchmark	Apr 19, 2025	Benchmarkingscientific discovery	—Unverified

Show:10 25 50

← PrevPage 110 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified