SOTAVerified

Benchmarking

Papers

Showing 461470 of 5548 papers

TitleStatusHype
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data PerspectiveCode1
CIDEr: Consensus-based Image Description EvaluationCode1
Benchmarking Detection Transfer Learning with Vision TransformersCode1
Benchmarking Deep Models for Salient Object DetectionCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine LearningCode1
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality RecognitionCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methodsCode1
Show:102550
← PrevPage 47 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified