SOTAVerified

Benchmarking

Papers

Showing 27612770 of 5548 papers

TitleStatusHype
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Domain Aligned CLIP for Few-shot Classification0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksCode0
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and CollaborationCode1
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine TranslationCode0
Benchmarking Individual Tree Mapping with Sub-meter Imagery0
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks0
The Disagreement Problem in Faithfulness Metrics0
Show:102550
← PrevPage 277 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified