SOTAVerified

Benchmarking

Papers

Showing 27262750 of 5548 papers

TitleStatusHype
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow0
Generalized Conflict-directed Search for Optimal Ordering Problems0
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression0
General Scales Unlock AI Evaluation with Explanatory and Predictive Power0
Extraction of clinical information from the non-invasive fetal electrocardiogram0
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation0
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design0
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking0
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow0
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?0
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking0
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors0
A Survey of Parameters Associated with the Quality of Benchmarks in NLP0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models0
AI Idea Bench 2025: AI Research Idea Generation Benchmark0
Show:102550
← PrevPage 110 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified