SOTAVerified

Benchmarking

Papers

Showing 37013725 of 5548 papers

TitleStatusHype
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception0
Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction0
MS MARCO: Benchmarking Ranking Models in the Large-Data Regime0
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge0
Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework0
Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data0
MTG: A Benchmarking Suite for Multilingual Text Generation0
Benchmarking Federated Machine Unlearning methods for Tabular Data0
MTLens: Machine Translation Output Debugging0
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
Benchmarking FedAvg and FedCurv for Image Classification Tasks0
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Mukayese: Turkish NLP Strikes Back0
Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
Multicalibration for Confidence Scoring in LLMs0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Multi-channel deep convolutional neural networks for multi-classifying thyroid disease0
Benchmarking Explanatory Models for Inertia Forecasting using Public Data of the Nordic Area0
Multiclass Optimal Classification Trees with SVM-splits0
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks0
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models0
Show:102550
← PrevPage 149 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified