Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2101–2125 of 5548 papers

Title	Date	Tasks	Status
Evolutionary Multimodal Optimization: A Short Survey	Aug 3, 2015	BenchmarkingDiversity	—Unverified
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams	Apr 4, 2025	BenchmarkingManagement	—Unverified
Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale	May 16, 2025	BenchmarkingTAG	—Unverified
Benchmarking air-conditioning energy performance of residential rooms based on regression and clustering techniques	Aug 22, 2019	BenchmarkingClustering	—Unverified
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol	Mar 7, 2025	BenchmarkingBug fixing	—Unverified
Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST)	Jan 31, 2023	BenchmarkingModel Predictive Control	—Unverified
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches	Oct 12, 2023	BenchmarkingColorization	—Unverified
Evolving Evolutionary Algorithms using Linear Genetic Programming	Aug 21, 2021	BenchmarkingEvolutionary Algorithms	—Unverified
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation	May 30, 2025	BenchmarkingMachine Translation	—Unverified
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography	Apr 14, 2025	BenchmarkingVisual Reasoning	—Unverified
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability	Jul 9, 2024	BenchmarkingDecoder	—Unverified
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing	Jan 9, 2025	BenchmarkingChatbot	—Unverified
Call for Action: towards the next generation of symbolic regression benchmark	May 6, 2025	BenchmarkingDiversity	—Unverified
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring	Nov 27, 2024	BenchmarkingEarth Observation	—Unverified
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations	May 20, 2025	Benchmarking	—Unverified
Benchmarking Aggression Identification in Social Media	Aug 1, 2018	Aggression IdentificationBenchmarking	—Unverified
Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches	Aug 30, 2017	Benchmarking	—Unverified
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline	Aug 6, 2024	Benchmarking	—Unverified
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift	Jul 12, 2025	BenchmarkingTransfer Learning	—Unverified
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations	Jul 1, 2022	BenchmarkingCombinatorial Optimization	—Unverified
Ev-Layout: A Large-scale Event-based Multi-modal Dataset for Indoor Layout Estimation and Tracking	Mar 11, 2025	Benchmarking	—Unverified
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified
Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms	Jan 30, 2025	BenchmarkingCombinatorial Optimization	—Unverified
Exact lattice-based stochastic cell culture simulation algorithms incorporating spontaneous and contact-dependent reactions	Aug 9, 2022	BenchmarkingCultural Vocal Bursts Intensity Prediction	—Unverified
Explainable AI using expressive Boolean formulas	Jun 6, 2023	BenchmarkingExplainable Artificial Intelligence (XAI)	—Unverified

Show:10 25 50

← PrevPage 85 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified