SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3351–3360 of 5548 papers

Title	Date	Tasks	Status	Hype
UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite	Apr 18, 2023	BenchmarkingInstance Segmentation	—Unverified	0
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images	Apr 17, 2023	3D Pose EstimationBenchmarking	—Unverified	0
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis	Apr 17, 2023	BenchmarkingDrift Detection	CodeCode Available	0
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy	Apr 14, 2023	Benchmarking	—Unverified	0
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation	Apr 11, 2023	BenchmarkingConversational Recommendation	—Unverified	0
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection	Apr 11, 2023	Adversarial AttackAdversarial Robustness	—Unverified	0
OpenAGI: When LLM Meets Domain Experts	Apr 10, 2023	BenchmarkingNatural Language Queries	CodeCode Available	4
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems	Apr 10, 2023	Benchmarking	CodeCode Available	1
Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence	Apr 10, 2023	Benchmarkingspeech-recognition	CodeCode Available	0
On Evaluation of Bangla Word Analogies	Apr 10, 2023	BenchmarkingWord Embeddings	—Unverified	0

Show:10 25 50

← PrevPage 336 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified