SOTAVerified

Benchmarking

Papers

Showing 27512800 of 5548 papers

TitleStatusHype
Benchmarking Pathology Feature Extractors for Whole Slide Image ClassificationCode1
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
Benchmarking Machine Learning Models for Quantum Error Correction0
Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization0
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach0
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event ExtractionCode1
Exponentially Faster Language ModellingCode2
Domain Aligned CLIP for Few-shot Classification0
Social Bias Probing: Fairness Benchmarking for Language Models0
AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment GraphCode1
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable SummarizationCode1
Model Agnostic Explainable Selective Regression via Uncertainty Estimation0
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksCode0
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine TranslationCode0
Benchmarking Individual Tree Mapping with Sub-meter Imagery0
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and CollaborationCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization RegimeCode1
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical ImagesCode0
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks0
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data0
The Disagreement Problem in Faithfulness Metrics0
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language ModelsCode1
Flames: Benchmarking Value Alignment of LLMs in ChineseCode1
Identification of vortex in unstructured mesh with graph neural networks0
CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationCode1
MultiIoT: Benchmarking Machine Learning for the Internet of ThingsCode1
SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification0
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMsCode1
An efficiency analysis of Spanish airports0
The voraus-AD Dataset for Anomaly Detection in Robot ApplicationsCode1
Prompt Sketching for Large Language Models0
The PetShop Dataset -- Finding Causes of Performance Issues across MicroservicesCode1
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR PredictionCode0
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture TranscriptsCode1
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing UnderstandingCode0
Benchmarking Geospatial Question Answering Engines using the Dataset GeoQuestions1089Code1
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State DecodingCode1
Benchmarking Deep Facial Expression Recognition: An Extensive Protocol with Balanced Dataset in the Wild0
Benchmarking Differential Evolution on a Quantum Simulator0
Exploitation-Guided Exploration for Semantic Embodied Navigation0
Digital Typhoon: Long-term Satellite Image Dataset for the Spatio-Temporal Modeling of Tropical CyclonesCode1
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in CrowdsCode1
Benchmarking a Benchmark: How Reliable is MS-COCO?0
Learning Disentangled Speech Representations0
NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning ApplicationsCode1
LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for LocomotionCode3
FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven InterpretationCode1
Use of Deep Neural Networks for Uncertain Stress Functions with Extensions to Impact Mechanics0
Investigating Deep-Learning NLP for Automating the Extraction of Oncology Efficacy Endpoints from Scientific Literature0
Show:102550
← PrevPage 56 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified