SOTAVerified

Benchmarking

Papers

Showing 231240 of 5548 papers

TitleStatusHype
EasyTPP: Towards Open Benchmarking Temporal Point ProcessesCode2
State-specific protein-ligand complex structure prediction with a multi-scale deep generative modelCode2
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code GenerationCode2
Benchmarking Complex Instruction-Following with Multiple Constraints CompositionCode2
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied AgentsCode2
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
EffiBench: Benchmarking the Efficiency of Automatically Generated CodeCode2
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentationsCode2
EvalGIM: A Library for Evaluating Generative Image ModelsCode2
Fino1: On the Transferability of Reasoning Enhanced LLMs to FinanceCode2
Show:102550
← PrevPage 24 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified