SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1351–1360 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets	Dec 10, 2021	Benchmarking	CodeCode Available	1	5
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios	Oct 31, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1	5
IMGTB: A Framework for Machine-Generated Text Detection Benchmarking	Nov 21, 2023	BenchmarkingText Detection	CodeCode Available	1	5
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification	Jul 6, 2023	BenchmarkingDomain Adaptation	CodeCode Available	1	5
A Unified Taxonomy and Multimodal Dataset for Events in Invasion Games	Aug 25, 2021	BenchmarkingVideo Classification	CodeCode Available	1	5
Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?	Sep 29, 2023	BenchmarkingKnowledge Graph Completion	CodeCode Available	1	5
Benchmarking the Spectrum of Agent Capabilities	Sep 14, 2021	Benchmarking	CodeCode Available	1	5
DFGC 2021: A DeepFake Game Competition	Jun 2, 2021	BenchmarkingDeepFake Detection	CodeCode Available	1	5
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning	May 30, 2024	Autonomous DrivingBenchmarking	CodeCode Available	1	5
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing	Mar 30, 2023	AttributeBenchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 136 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified