SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–460 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics	Jun 3, 2024	Audio ClassificationBenchmarking	CodeCode Available	1	5
AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Dec 15, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1	5
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials	Nov 29, 2022	Benchmarking	CodeCode Available	1	5
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph	Nov 15, 2023	Benchmarking	CodeCode Available	1	5
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1	5
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1	5
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1	5
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1	5
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 46 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified