SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4591–4600 of 5548 papers

Title	Date	Tasks	Status	Hype
Identifying and Benchmarking Natural Out-of-Context Prediction Problems	Oct 25, 2021	Benchmarking	CodeCode Available	0
Analysis \| OPEN \| Published: 17 June 2019 Multitask learning and benchmarking with clinical time series data	Jun 17, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0
IdeaBench: Benchmarking Large Language Models for Research Idea Generation	Oct 31, 2024	Benchmarkingscientific discovery	CodeCode Available	0
IceBench: A Benchmark for Deep Learning based Sea Ice Type Classification	Mar 22, 2025	BenchmarkingClassification	CodeCode Available	0
BioFors: A Large Biomedical Image Forensics Dataset	Aug 30, 2021	BenchmarkingImage Forensics	CodeCode Available	0
Benchmarking Attribution Methods with Relative Feature Importance	Jul 23, 2019	BenchmarkingFeature Importance	CodeCode Available	0
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs	Feb 25, 2024	BenchmarkingChatbot	CodeCode Available	0
Hyperspectral Image Dataset for Benchmarking on Salient Object Detection	Jun 29, 2018	BenchmarkingObject	CodeCode Available	0
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning	Jan 1, 2020	Benchmarkingreinforcement-learning	CodeCode Available	0
Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition	Sep 2, 2018	Age-Invariant Face RecognitionBenchmarking	CodeCode Available	0

Show:10 25 50

← PrevPage 460 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified