SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2571–2580 of 5548 papers

Title	Date	Tasks	Status	Hype
Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception	Jan 24, 2024	Benchmarking	CodeCode Available	1
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval	Jan 24, 2024	BenchmarkingImage Captioning	CodeCode Available	1
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding	Jan 24, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking the Fairness of Image Upsampling Methods	Jan 24, 2024	BenchmarkingDiversity	CodeCode Available	0
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents	Jan 24, 2024	Benchmarking	CodeCode Available	3
LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method	Jan 23, 2024	BenchmarkingFairness	CodeCode Available	0
Benchmarking LLMs via Uncertainty Quantification	Jan 23, 2024	BenchmarkingUncertainty Quantification	CodeCode Available	3
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition	Jan 23, 2024	Benchmarking	CodeCode Available	0
Deep Neural Network Benchmarks for Selective Classification	Jan 23, 2024	BenchmarkingClassification	CodeCode Available	0
Subgroup analysis methods for time-to-event outcomes in heterogeneous randomized controlled trials	Jan 22, 2024	BenchmarkingSynthetic Data Generation	CodeCode Available	0

Show:10 25 50

← PrevPage 258 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified