SOTAVerified

Benchmarking

Papers

Showing 41014125 of 5548 papers

TitleStatusHype
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests0
The ACL RD-TEC: A Dataset for Benchmarking Terminology Extraction and Classification in Computational Linguistics0
The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking0
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence0
Language Models as a Service: Overview of a New Paradigm and its Challenges0
The Benchmark Lottery0
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI0
The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal0
The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach0
The Curious Case of Integrator Reach Sets, Part I: Basic Theory0
The Design and Implementation of a Scalable DL Benchmarking Platform0
The Disagreement Problem in Faithfulness Metrics0
The DLV System for Knowledge Representation and Reasoning0
The Dota 2 Bot Competition0
The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation0
The EuroCity Persons Dataset: A Novel Benchmark for Object Detection0
The Evolutionary Computation Methods No One Should Use0
The Expressive Power of Word Embeddings0
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models0
The FaceChannelS: Strike of the Sequences for the AffWild 2 Challenge0
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input0
The Forchheim Image Database for Camera Identification in the Wild0
The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech0
The Impact of Genomic Variation on Function (IGVF) Consortium0
The iNaturalist Sounds Dataset0
Show:102550
← PrevPage 165 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified