SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1301–1310 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet	Feb 23, 2023	BenchmarkingKnowledge Distillation	CodeCode Available	1	5
CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)	Nov 19, 2022	BenchmarkingC++ code	CodeCode Available	1	5
Mukayese: Turkish NLP Strikes Back	Mar 2, 2022	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Robustness of Machine Reading Comprehension Models	Apr 29, 2020	BenchmarkingMachine Reading Comprehension	CodeCode Available	1	5
Benchmarking Robustness of Text-Image Composed Retrieval	Nov 24, 2023	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Robustness to Adversarial Image Obfuscations	Jan 30, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking the Spectrum of Agent Capabilities	Sep 14, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1	5
Illuminating Darkness: Enhancing Real-world Low-light Scenes with Smartphone Images	Mar 10, 2025	4kBenchmarking	CodeCode Available	1	5
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Apr 24, 2024	AttributeAttribute Value Extraction	CodeCode Available	1	5

Show:10 25 50

← PrevPage 131 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified