SOTAVerified

Benchmarking

Papers

Showing 36263650 of 5548 papers

TitleStatusHype
Benchmarking Graph Learning for Drug-Drug Interaction Prediction0
A Dataset for Developing and Benchmarking Active Vision0
Benchmarking GPUs on SVBRDF Extractor Model0
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Benchmarking GPU and TPU Performance with Graph Neural Networks0
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems0
What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus0
mlr3proba: An R Package for Machine Learning in Survival Analysis0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
Benchmarking GNNs Using Lightning Network Data0
A dataset for benchmarking vision-based localization at intersections0
Benchmarking global optimization techniques for unmanned aerial vehicle path planning0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models0
MMInA: Benchmarking Multihop Multimodal Internet Agents0
Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)0
Benchmarking General-Purpose In-Context Learning0
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation0
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks0
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems0
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines0
Show:102550
← PrevPage 146 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified