SOTAVerified

Benchmarking

Papers

Showing 30113020 of 5548 papers

TitleStatusHype
Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction0
NeMig -- A Bilingual News Collection and Knowledge Graph about MigrationCode0
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning0
Can humans help BERT gain "confidence"?0
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph EngineeringCode1
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO0
Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageCode0
Benchmarking the Generation of Fact Checking ExplanationsCode1
Towards quantitative precision for ECG analysis: Leveraging state space models, self-supervision and patient metadataCode1
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictionsCode3
Show:102550
← PrevPage 302 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified