SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3381–3390 of 5548 papers

Title	Date	Tasks	Status	Hype
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA	Jan 29, 2024	BenchmarkingImage Comprehension	—Unverified	0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models	Jan 28, 2024	BenchmarkingCode Generation	CodeCode Available	0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset	Jan 27, 2024	BenchmarkingTime Series	—Unverified	0
SAM-based instance segmentation models for the automation of structural damage detection	Jan 27, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis	Jan 26, 2024	BenchmarkingSemantic Segmentation	—Unverified	0
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs	Jan 26, 2024	BenchmarkingKnowledge Graphs	—Unverified	0
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified	0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified	0
Large Malaysian Language Model Based on Mistral for Enhanced Local Language Understanding	Jan 24, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Benchmarking the Fairness of Image Upsampling Methods	Jan 24, 2024	BenchmarkingDiversity	CodeCode Available	0

Show:10 25 50

← PrevPage 339 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified