SOTAVerified

Benchmarking

Papers

Showing 27512775 of 5548 papers

TitleStatusHype
Benchmarking Pathology Feature Extractors for Whole Slide Image ClassificationCode1
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Benchmarking Machine Learning Models for Quantum Error Correction0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event ExtractionCode1
Exponentially Faster Language ModellingCode2
Domain Aligned CLIP for Few-shot Classification0
Social Bias Probing: Fairness Benchmarking for Language Models0
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment GraphCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksCode0
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine TranslationCode0
Benchmarking Individual Tree Mapping with Sub-meter Imagery0
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and CollaborationCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization RegimeCode1
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks0
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data0
The Disagreement Problem in Faithfulness Metrics0
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language ModelsCode1
Flames: Benchmarking Value Alignment of LLMs in ChineseCode1
Identification of vortex in unstructured mesh with graph neural networks0
Show:102550
← PrevPage 111 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified