SOTAVerified

Benchmarking

Papers

Showing 37013750 of 5548 papers

TitleStatusHype
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception0
Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction0
MS MARCO: Benchmarking Ranking Models in the Large-Data Regime0
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge0
Towards Robust and Generalizable Gerchberg Saxton based Physics Inspired Neural Networks for Computer Generated Holography: A Sensitivity Analysis Framework0
Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data0
MTG: A Benchmarking Suite for Multilingual Text Generation0
Benchmarking Federated Machine Unlearning methods for Tabular Data0
MTLens: Machine Translation Output Debugging0
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark0
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models0
Benchmarking FedAvg and FedCurv for Image Classification Tasks0
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Mukayese: Turkish NLP Strikes Back0
Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
Benchmarking Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS20
Multicalibration for Confidence Scoring in LLMs0
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking0
Multi-channel deep convolutional neural networks for multi-classifying thyroid disease0
Benchmarking Explanatory Models for Inertia Forecasting using Public Data of the Nordic Area0
Multiclass Optimal Classification Trees with SVM-splits0
Benchmarking Evolutionary Community Detection Algorithms in Dynamic Networks0
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models0
Benchmarking Model Predictive Control Algorithms in Building Optimization Testing Framework (BOPTEST)0
Multifactorial Cellular Genetic Algorithm (MFCGA): Algorithmic Design, Performance Comparison and Genetic Transferability Analysis0
Multi-Fidelity Methods for Optimization: A Survey0
Benchmarking Evolutionary Algorithms For Single Objective Real-valued Constrained Optimization - A Critical Review0
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition0
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans0
Benchmarking Ethical and Safety Risks of Healthcare LLMs in China-Toward Systemic Governance under Healthy China 20300
Multi-input Multi-output Loewner Framework for Vibration-based Damage Detection on a Trainer Jet0
Benchmarking Estimators for Natural Experiments: A Novel Dataset and a Doubly Robust Algorithm0
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations0
Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics0
Multilingual European Language Models: Benchmarking Approaches and Challenges0
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
Multilingual Protest News Detection - Shared Task 1, CASE 20210
Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data0
Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI0
MultiMed: Massively Multimodal and Multitask Medical Understanding0
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models0
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations0
Towards Sentiment Analysis of Tobacco Products’ Usage in Social Media0
Multimodal Deep Learning for Scientific Imaging Interpretation0
Multimodal Deep Reinforcement Learning for Portfolio Optimization0
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration0
Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms0
Benchmarking End-to-end Learning of MIMO Physical-Layer Communication0
Show:102550
← PrevPage 75 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified