SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 431–440 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1	5
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1	5
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1	5
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph	Nov 15, 2023	Benchmarking	CodeCode Available	1	5
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1	5
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1	5
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials	Nov 29, 2022	Benchmarking	CodeCode Available	1	5
AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Dec 15, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 44 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified