SOTAVerified

Benchmarking

Papers

Showing 776800 of 5548 papers

TitleStatusHype
A Comprehensive Overview of Large Language ModelsCode1
Examining the Effects of Degree Distribution and Homophily in Graph Learning ModelsCode1
Leveraging Trust for Joint Multi-Objective and Multi-Fidelity OptimizationCode1
Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
Exploring Large Language Models for Classical PhilologyCode1
CIDEr: Consensus-based Image Description EvaluationCode1
AirSim Drone Racing LabCode1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMsCode1
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action ConstraintsCode1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative ComprehensionCode1
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
featsel: A framework for benchmarking of feature selection algorithms and cost functionsCode1
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization CorrelationsCode1
Benchmarking Adversarial Patch Against Aerial DetectionCode1
Benchmarking Data Science AgentsCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report LabelingCode1
Benchmarking Adversarial Robustness on Image ClassificationCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
FineSurE: Fine-grained Summarization Evaluation using LLMsCode1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial OptimizationCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Show:102550
← PrevPage 32 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified