SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–360 of 5548 papers

Title	Date	Tasks	Status	Hype
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use	May 20, 2025	Benchmarking	—Unverified	0
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis	May 20, 2025	BenchmarkingFairness	—Unverified	0
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach	May 20, 2025	Benchmarking	—Unverified	0
LLM-based Evaluation Policy Extraction for Ecological Modeling	May 20, 2025	BenchmarkingLarge Language Model	—Unverified	0
SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis	May 20, 2025	BenchmarkingModel Optimization	CodeCode Available	0
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications	May 20, 2025	BenchmarkingMachine Translation	—Unverified	0
OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking	May 20, 2025	Benchmarking	CodeCode Available	3
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations	May 20, 2025	Benchmarking	—Unverified	0
Benchmarking data encoding methods in Quantum Machine Learning	May 20, 2025	BenchmarkingQuantum Machine Learning	—Unverified	0
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations	May 20, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 36 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified