SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2561–2570 of 5548 papers

Title	Date	Tasks	Status	Hype
Machine Translation Meta Evaluation through Translation Accuracy Challenge Sets	Jan 29, 2024	BenchmarkingMachine Translation	CodeCode Available	1
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA	Jan 29, 2024	BenchmarkingImage Comprehension	—Unverified	0
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models	Jan 28, 2024	BenchmarkingCode Generation	CodeCode Available	0
SAM-based instance segmentation models for the automation of structural damage detection	Jan 27, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Benchmarking with MIMIC-IV, an irregular, spare clinical time series dataset	Jan 27, 2024	BenchmarkingTime Series	—Unverified	0
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries	Jan 27, 2024	BenchmarkingRAG	CodeCode Available	3
Biological Valuation Map of Flanders: A Sentinel-2 Imagery Analysis	Jan 26, 2024	BenchmarkingSemantic Segmentation	—Unverified	0
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs	Jan 26, 2024	BenchmarkingKnowledge Graphs	—Unverified	0
Automated legal reasoning with discretion to act using s(LAW)	Jan 25, 2024	BenchmarkingLegal Reasoning	—Unverified	0
TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images	Jan 25, 2024	BenchmarkingSegmentation	—Unverified	0

Show:10 25 50

← PrevPage 257 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified