SOTAVerified

Benchmarking

Papers

Showing 36013650 of 5548 papers

TitleStatusHype
Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images0
Benchmarking high-fidelity pedestrian tracking systems for research, real-time monitoring and crowd control0
What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI0
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images0
ADCB: An Alzheimer's disease benchmark for evaluating observational estimators of causal effects0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
MIRAI: Evaluating LLM Agents for Event Forecasting0
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability0
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models0
Benchmarking Hebbian learning rules for associative memory0
Mitigating severe over-parameterization in deep convolutional neural networks through forced feature abstraction and compression with an entropy-based heuristic0
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices0
A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing0
Benchmarking Harmonized Tariff Schedule Classification Models0
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation0
Towards Large-Scale Small Object Detection: Survey and Benchmarks0
MLAR: Multi-layer Large Language Model-based Robotic Process Automation Applicant Tracking0
Towards Long-Term predictions of Turbulence using Neural Operators0
Benchmarking Graph Neural Networks on Link Prediction0
MLHarness: A Scalable Benchmarking System for MLCommons0
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs0
MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale0
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale0
A Dataset for Movie Description0
Benchmarking Graph Learning for Drug-Drug Interaction Prediction0
A Dataset for Developing and Benchmarking Active Vision0
Benchmarking GPUs on SVBRDF Extractor Model0
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks0
Benchmarking GPU and TPU Performance with Graph Neural Networks0
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems0
What if we had no Wikipedia? Domain-independent Term Extraction from a Large News Corpus0
mlr3proba: An R Package for Machine Learning in Survival Analysis0
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
Benchmarking GNNs Using Lightning Network Data0
A dataset for benchmarking vision-based localization at intersections0
Benchmarking global optimization techniques for unmanned aerial vehicle path planning0
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding0
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models0
MMInA: Benchmarking Multihop Multimodal Internet Agents0
Benchmarking Generative AI for Scoring Medical Student Interviews in Objective Structured Clinical Examinations (OSCEs)0
Benchmarking General-Purpose In-Context Learning0
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation0
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks0
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems0
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines0
Show:102550
← PrevPage 73 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified