SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2621–2630 of 5548 papers

Title	Date	Tasks	Status	Hype
SEED-Bench: Benchmarking Multimodal Large Language Models	Jan 1, 2024	BenchmarkingImage Generation	CodeCode Available	3
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark	Jan 1, 2024	Age EstimationBenchmarking	CodeCode Available	2
FISBe: A Real-World Benchmark Dataset for Instance Segmentation of Long-Range Thin Filamentous Structures	Jan 1, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Sheared Backpropagation for Fine-tuning Foundation Models	Jan 1, 2024	Benchmarking	—Unverified	0
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models	Jan 1, 2024	Benchmarking	CodeCode Available	1
Temporal Validity Change Prediction	Jan 1, 2024	BenchmarkingPrediction	—Unverified	0
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions	Jan 1, 2024	BenchmarkingInstruction Following	CodeCode Available	1
Benchmarking Hebbian learning rules for associative memory	Dec 30, 2023	Benchmarking	—Unverified	0
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models	Dec 30, 2023	Benchmarkingimage-classification	—Unverified	0
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 263 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified