SOTAVerified

Benchmarking

Papers

Showing 951975 of 5548 papers

TitleStatusHype
IntelliGraphs: Datasets for Benchmarking Knowledge Graph GenerationCode1
A Comprehensive Overview of Large Language ModelsCode1
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoringCode1
Benchmarking Algorithms for Federated Domain GeneralizationCode1
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified BenchmarkCode1
Benchmarking Test-Time Adaptation against Distribution Shifts in Image ClassificationCode1
Uncovering the Limits of Machine Learning for Automatic Vulnerability DetectionCode1
SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Replicable ScenesCode1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMsCode1
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolutionCode1
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly DetectionCode1
Challenges and Opportunities in Improving Worst-Group Generalization in Presence of Spurious FeaturesCode1
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized CodebaseCode1
IMP-MARL: a Suite of Environments for Large-scale Infrastructure Management Planning via MARLCode1
Geometric Deep Learning for Structure-Based Drug Design: A SurveyCode1
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
causalAssembly: Generating Realistic Production Data for Benchmarking Causal DiscoveryCode1
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New BenchmarkingCode1
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
OpenDataVal: a Unified Benchmark for Data ValuationCode1
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient LearningCode1
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and BeyondCode1
FFB: A Fair Fairness Benchmark for In-Processing Group Fairness MethodsCode1
Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline MaterialsCode1
KoLA: Carefully Benchmarking World Knowledge of Large Language ModelsCode1
Show:102550
← PrevPage 39 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified