SOTAVerified

Benchmarking

Papers

Showing 51515175 of 5548 papers

TitleStatusHype
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine TranslationCode0
Are Large Language Models Good at Utility Judgments?Code0
Benchmarking Language-agnostic Intent Classification for Virtual Assistant PlatformsCode0
Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client AvailabilityCode0
VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning TasksCode0
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AICode0
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory InstructionsCode0
DispBench: Benchmarking Disparity Estimation to Synthetic CorruptionsCode0
OpenBioLink: A benchmarking framework for large-scale biomedical link predictionCode0
DispaRisk: Auditing Fairness Through Usable InformationCode0
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic CountingCode0
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation ExtractionCode0
Large Scale Clustering with Variational EM for Gaussian Mixture ModelsCode0
AI Sound Recognition on Asthma Medication Adherence: Evaluation With the RDA Benchmark SuiteCode0
Dialogue Quality and Emotion Annotations for Customer Support ConversationsCode0
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible BenchmarkingCode0
OpenDenoising: an Extensible Benchmark for Building Comparative Studies of Image DenoisersCode0
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame CompressionCode0
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation FrameworkCode0
Towards Biologically Plausible and Private Gene Expression Data GenerationCode0
DFEE: Interactive DataFlow Execution and Evaluation KitCode0
Towards causal benchmarking of bias in face analysis algorithmsCode0
SORCE: Small Object Retrieval in Complex EnvironmentsCode0
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological UnderpinningsCode0
Recognizing Object Affordances to Support Scene Reasoning for Manipulation TasksCode0
Show:102550
← PrevPage 207 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified