SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2111–2120 of 5548 papers

Title	Date	Tasks	Status	Hype
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography	Apr 14, 2025	BenchmarkingVisual Reasoning	—Unverified	0
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability	Jul 9, 2024	BenchmarkingDecoder	—Unverified	0
CallNavi, A Challenge and Empirical Study on LLM Function Calling and Routing	Jan 9, 2025	BenchmarkingChatbot	—Unverified	0
Call for Action: towards the next generation of symbolic regression benchmark	May 6, 2025	BenchmarkingDiversity	—Unverified	0
Benchmarking Agility and Reconfigurability in Satellite Systems for Tropical Cyclone Monitoring	Nov 27, 2024	BenchmarkingEarth Observation	—Unverified	0
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations	May 20, 2025	Benchmarking	—Unverified	0
Benchmarking Aggression Identification in Social Media	Aug 1, 2018	Aggression IdentificationBenchmarking	—Unverified	0
Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches	Aug 30, 2017	Benchmarking	—Unverified	0
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline	Aug 6, 2024	Benchmarking	—Unverified	0
Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift	Jul 12, 2025	BenchmarkingTransfer Learning	—Unverified	0

Show:10 25 50

← PrevPage 212 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified