SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1310 of 5548 papers

Title	Date	Tasks	Status	Hype
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity Quantification	Jun 18, 2023	BenchmarkingRetrieval	CodeCode Available	1
Contemporary Symbolic Regression Methods and their Relative Performance	Jul 29, 2021	Benchmarkingparameter estimation	CodeCode Available	1
COVID-19 event extraction from Twitter via extractive question answering with continuous prompts	Mar 19, 2023	BenchmarkingEvent Extraction	CodeCode Available	1
Benchmarking Robustness of Machine Reading Comprehension Models	Apr 29, 2020	BenchmarkingMachine Reading Comprehension	CodeCode Available	1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets	Apr 11, 2022	Action Triplet RecognitionBenchmarking	CodeCode Available	1
Benchmarking Robustness to Adversarial Image Obfuscations	Jan 30, 2023	Benchmarking	CodeCode Available	1
Benchmarks for Deep Off-Policy Evaluation	Mar 30, 2021	Benchmarkingcontinuous-control	CodeCode Available	1
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 131 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified