SOTAVerified

Benchmarking

Papers

Showing 34513500 of 5548 papers

TitleStatusHype
Multifactorial Cellular Genetic Algorithm (MFCGA): Algorithmic Design, Performance Comparison and Genetic Transferability Analysis0
Multi-Fidelity Methods for Optimization: A Survey0
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans0
Multi-input Multi-output Loewner Framework for Vibration-based Damage Detection on a Trainer Jet0
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations0
Multilingual European Language Models: Benchmarking Approaches and Challenges0
Multilingual Large Language Models Are Not (Yet) Code-Switchers0
Multilingual Protest News Detection - Shared Task 1, CASE 20210
MultiMed: Massively Multimodal and Multitask Medical Understanding0
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models0
Multimodal Deep Learning for Scientific Imaging Interpretation0
Multimodal Deep Reinforcement Learning for Portfolio Optimization0
Multi-Modal Explainable Medical AI Assistant for Trustworthy Human-AI Collaboration0
Multimodal Information Retrieval for Open World with Edit Distance Weak Supervision0
Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes0
Multi-Modal Three-Stream Network for Action Recognition0
MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation0
LadderMIL: Multiple Instance Learning with Coarse-to-Fine Self-Distillation0
MultiRobustBench: Benchmarking Robustness Against Multiple Attacks0
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts0
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing0
Non-linear Multitask Learning with Deep Gaussian Processes0
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking0
Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?0
Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery0
MUPAX: Multidimensional Problem Agnostic eXplainable AI0
MVS^2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry0
My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks0
N^2: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion0
NABU - Multilingual Graph-based Neural RDF Verbalizer0
NAS-Bench-Zero: A Large Scale Dataset for Understanding Zero-Shot Neural Architecture Search0
NA-SODINN: a deep learning algorithm for exoplanet image detection based on residual noise regimes0
NativQA: Multilingual Culturally-Aligned Natural Query for LLMs0
Natural Disasters Detection in Social Media and Satellite imagery: a survey0
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning0
Nature-Inspired Optimization Algorithms: Challenges and Open Problems0
NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation0
Near-Term Quantum Computing Techniques: Variational Quantum Algorithms, Error Mitigation, Circuit Compilation, Benchmarking and Classical Simulation0
NeIn: Telling What You Don't Want0
NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods0
Hyperparameter optimization with REINFORCE and Transformers0
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation0
Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems0
GIM: Gaussian Isolation Machines0
Neural Networks for Fast Optimisation in Model Predictive Control: A Review0
Neural Text Generation: Past, Present and Beyond0
Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network0
New Loss Functions for Fast Maximum Inner Product Search0
NEWS 2018 Whitepaper0
NEWTS: A Corpus for News Topic-Focused Summarization0
Show:102550
← PrevPage 70 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified