SOTAVerified

Benchmarking

Papers

Showing 481490 of 5548 papers

TitleStatusHype
Alpha Excel Benchmark0
Benchmarking LLMs' Swarm intelligenceCode1
Call for Action: towards the next generation of symbolic regression benchmark0
Multimodal Benchmarking and Recommendation of Text-to-Image Generation ModelsCode0
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
MedArabiQ: Benchmarking Large Language Models on Arabic Medical TasksCode0
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding ApproachCode0
Completing Spatial Transcriptomics Data for Gene Expression Prediction Benchmarking0
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language ModelsCode2
Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning0
Show:102550
← PrevPage 49 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified