SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 821–830 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond	Dec 25, 2023	Animal Pose EstimationBenchmarking	CodeCode Available	1
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models	Dec 21, 2023	Benchmarking	CodeCode Available	1
RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation	Dec 21, 2023	BenchmarkingProduct Recommendation	CodeCode Available	1
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1
TAO-Amodal: A Benchmark for Tracking Any Object Amodally	Dec 19, 2023	Amodal TrackingAutonomous Driving	CodeCode Available	1
How to Train Neural Field Representations: A Comprehensive Study and Benchmark	Dec 16, 2023	Benchmarking	CodeCode Available	1
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models	Dec 15, 2023	BenchmarkingCode Summarization	CodeCode Available	1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning	Dec 11, 2023	BenchmarkingHuman-Object Interaction Detection	CodeCode Available	1

Show:10 25 50

← PrevPage 83 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified