SOTAVerified

Benchmarking

Papers

Showing 13111320 of 5548 papers

TitleStatusHype
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
Benchmarking saliency methods for chest X-ray interpretationCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarksCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
A Ladder of Causal DistancesCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Benchmarking Segmentation Models with Mask-Preserved Attribute EditingCode1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarkingCode1
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMCode1
Show:102550
← PrevPage 132 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified