Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2701–2750 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images	Dec 12, 2023	BenchmarkingRetrieval	—Unverified
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training	Apr 30, 2025	Benchmarking	—Unverified
FAIRification of MLC data	Nov 23, 2022	BenchmarkingManagement	—Unverified
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking	Sep 5, 2023	BenchmarkingKnowledge Distillation	—Unverified
GANmut: Generating and Modifying Facial Expressions	Jun 16, 2024	BenchmarkingDiversity	—Unverified
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR	Apr 15, 2025	Benchmarking	—Unverified
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System	May 3, 2024	BenchmarkingCollaborative Filtering	—Unverified
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics	Mar 27, 2025	BenchmarkingNatural Language Queries	—Unverified
A Survey of Spanish Clinical Language Models	Aug 4, 2023	BenchmarkingSurvey	—Unverified
AI Matrix - Synthetic Benchmarks for DNN	Nov 27, 2018	BenchmarkingCPU	—Unverified
Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations	Dec 23, 2024	BenchmarkingQuestion Answering	—Unverified
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks	May 24, 2024	BenchmarkingDecoder	—Unverified
Identifying patterns and recommendations of and for sustainable open data initiatives: a benchmarking-driven analysis of open government data initiatives among European countries	Dec 1, 2023	Benchmarking	—Unverified
FactLens: Benchmarking Fine-Grained Fact Verification	Nov 8, 2024	BenchmarkingFact Verification	—Unverified
FACT: Learning Governing Abstractions Behind Integer Sequences	Sep 20, 2022	Benchmarking	—Unverified
Benchmarking Pretrained Attention-based Models for Real-Time Recognition in Robot-Assisted Esophagectomy	Dec 4, 2024	AnatomyBenchmarking	—Unverified
Face Morphing Attack Generation & Detection: A Comprehensive Survey	Nov 3, 2020	BenchmarkingFace Recognition	—Unverified
A Unified Taylor Framework for Revisiting Attribution Methods	Aug 21, 2020	BenchmarkingDecision Making	—Unverified
Face Detection on Surveillance Images	Oct 22, 2019	BenchmarkingFace Detection	—Unverified
GenderBias-VL: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing	Jun 30, 2024	Benchmarkingcounterfactual	—Unverified
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases	May 25, 2024	BenchmarkingHallucination	—Unverified
A Survey of Small Language Models	Oct 25, 2024	BenchmarkingModel Compression	—Unverified
Identifying the Context Shift between Test Benchmarks and Production Data	Jul 3, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Exploring the Decentraland Economy: Multifaceted Parcel Attributes, Key Insights, and Benchmarking	Apr 11, 2024	AttributeBenchmarking	—Unverified
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning	Apr 19, 2024	Benchmarkingcounterfactual	—Unverified
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content	Mar 13, 2025	BenchmarkingImage Generation	—Unverified
Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis	Aug 22, 2024	Benchmarking	—Unverified
Generalization, Mayhems and Limits in Recurrent Proximal Policy Optimization	May 23, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow	Feb 14, 2025	Benchmarking	—Unverified
Generalized Conflict-directed Search for Optimal Ordering Problems	Mar 31, 2021	BenchmarkingScheduling	—Unverified
A Survey of Predictive Maintenance Methods: An Analysis of Prognostics via Classification and Regression	Jun 25, 2025	BenchmarkingManagement	—Unverified
General Scales Unlock AI Evaluation with Explanatory and Predictive Power	Mar 9, 2025	BenchmarkingSpecificity	—Unverified
Extraction of clinical information from the non-invasive fetal electrocardiogram	May 27, 2016	BenchmarkingHeart Rate Variability	—Unverified
Generating Artificial Outliers in the Absence of Genuine Ones -- a Survey	Jun 5, 2020	BenchmarkingExperimental Design	—Unverified
Extensible Logging and Empirical Attainment Function for IOHexperimenter	Sep 28, 2021	Benchmarking	—Unverified
Extended Labeled Faces in-the-Wild (ELFW): Augmenting Classes for Face Segmentation	Jun 24, 2020	BenchmarkingData Augmentation	—Unverified
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design	Apr 14, 2025	BenchmarkingLanguage Modeling	—Unverified
Generating Synthetic Electronic Health Record (EHR) Data: A Review with Benchmarking	Nov 6, 2024	Benchmarking	—Unverified
Generation of Large District Heating System Models Using Open-Source Data and Tools: An Exemplary Workflow	Dec 18, 2024	Benchmarking	—Unverified
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?	May 27, 2020	BenchmarkingFraud Detection	—Unverified
Generative Adversarial Networks with Limited Data: A Survey and Benchmarking	Apr 7, 2025	BenchmarkingImage Generation	—Unverified
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors	Jun 29, 2023	Benchmarking	—Unverified
A Survey of Parameters Associated with the Quality of Benchmarks in NLP	Oct 14, 2022	Benchmarking	—Unverified
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition	Mar 24, 2025	BenchmarkingFood Recognition	—Unverified
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion	May 28, 2024	BenchmarkingEmotion Recognition	—Unverified
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance	Jun 18, 2024	Benchmarking	—Unverified
Generative Models at the Frontier of Compression: A Survey on Generative Face Video Coding	Jun 9, 2025	BenchmarkingVideo Compression	—Unverified
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Feb 4, 2025	BenchmarkingDecision Making	—Unverified
AI Idea Bench 2025: AI Research Idea Generation Benchmark	Apr 19, 2025	Benchmarkingscientific discovery	—Unverified

Show:10 25 50

← PrevPage 55 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified