SOTAVerified

Benchmarking

Papers

Showing 27012750 of 5548 papers

TitleStatusHype
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images0
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training0
FAIRification of MLC data0
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking0
GANmut: Generating and Modifying Facial Expressions0
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics0
A Survey of Spanish Clinical Language Models0
AI Matrix - Synthetic Benchmarks for DNN0
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks0
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries0
FactLens: Benchmarking Fine-Grained Fact Verification0
FACT: Learning Governing Abstractions Behind Integer Sequences0
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy0
Face Morphing Attack Generation & Detection: A Comprehensive Survey0
A Unified Taylor Framework for Revisiting Attribution Methods0
Face Detection on Surveillance Images0
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
A Survey of Small Language Models0
Identifying the Context Shift between Test Benchmarks and Production Data0
Exploring the Decentraland Economy: Multifaceted Parcel Attributes, Key Insights, and Benchmarking0
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning0
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content0
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis0
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization0
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow0
Generalized Conflict-directed Search for Optimal Ordering Problems0
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression0
General Scales Unlock AI Evaluation with Explanatory and Predictive Power0
Extraction of clinical information from the non-invasive fetal electrocardiogram0
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey0
Extensible Logging and Empirical Attainment Function for IOHexperimenter0
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation0
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design0
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking0
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow0
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?0
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking0
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors0
A Survey of Parameters Associated with the Quality of Benchmarks in NLP0
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance0
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding0
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models0
AI Idea Bench 2025: AI Research Idea Generation Benchmark0
Show:102550
← PrevPage 55 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified