SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 741–750 of 5548 papers

Title	Date	Tasks	Status	Hype
Experimental Validation of Ultrasound Beamforming with End-to-End Deep Learning for Single Plane Wave Imaging	Apr 22, 2024	Benchmarking	CodeCode Available	1
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models	Apr 22, 2024	BenchmarkingWorld Knowledge	CodeCode Available	1
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking	Apr 19, 2024	Benchmarkingcoreference-resolution	CodeCode Available	1
How to Benchmark Vision Foundation Models for Semantic Segmentation?	Apr 18, 2024	BenchmarkingDecoder	CodeCode Available	1
Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data	Apr 16, 2024	BenchmarkingFace Recognition	CodeCode Available	1
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations	Apr 15, 2024	BenchmarkingBias Detection	CodeCode Available	1
A Review and Efficient Implementation of Scene Graph Generation Metrics	Apr 15, 2024	BenchmarkingGraph Generation	CodeCode Available	1
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems	Apr 15, 2024	BenchmarkingCode Generation	CodeCode Available	1
nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation	Apr 15, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion	Apr 14, 2024	BenchmarkingData Augmentation	CodeCode Available	1

Show:10 25 50

← PrevPage 75 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified