SOTAVerified

Benchmarking

Papers

Showing 17011750 of 5548 papers

TitleStatusHype
Benchmarking BioRelEx for Entity Tagging and Relation Extraction0
A Deep Q-Learning Method for Downlink Power Allocation in Multi-Cell Networks0
DiPCo -- Dinner Party Corpus0
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation0
Benchmarking Biomedical Nested NER and Relation Extraction Models0
Deep Patent Landscaping Model Using Transformer and Graph Embedding0
A New Approach for Image Authentication Framework for Media Forensics Purpose0
Benchmarking Bias in Large Language Models during Role-Playing0
Abnormality-Driven Representation Learning for Radiology Imaging0
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models0
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning0
An Evolutionary Algorithm For the Vehicle Routing Problem with Drones with Interceptions0
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations0
An evaluation framework for comparing causal inference models0
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation0
DIG: A Turnkey Library for Diving into Graph Deep Learning Research0
Benchmarking Azerbaijani Neural Machine Translation0
Classification of the Fashion-MNIST Dataset on a Quantum Computer0
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models0
Benchmarking AutoML Frameworks for Disease Prediction Using Medical Claims0
Class-agnostic Object Detection0
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives0
A deep convolutional neural network model for rapid prediction of fluvial flood inundation0
Diffusion-Driven Domain Adaptation for Generating 3D Molecules0
DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation0
Disability prediction in multiple sclerosis using performance outcome measures and demographic data0
Discriminative Link Prediction using Local Links, Node Features and Community Structure0
CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering0
Benchmarking a wide range of optimisers for solving the Fermi-Hubbard model using the variational quantum eigensolver0
Classification and Retrieval of Digital Pathology Scans: A New Dataset0
A biologically-inspired multi-modal evaluation of molecular generative machine learning0
Classifying neuromorphic data using a deep learning framework for image classification0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs0
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics0
DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale0
Diff5T: Benchmarking Human Brain Diffusion MRI with an Extensive 5.0 Tesla K-Space and Spatial Dataset0
CityLearn v2: Energy-flexible, resilient, occupant-centric, and carbon-aware management of grid-interactive communities0
Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks0
Addressing the Real-world Class Imbalance Problem in Dermatology0
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry0
Benchmarking Automated Review Response Generation for the Hospitality Domain0
Benchmarking bias: Expanding clinical AI model card to incorporate bias reporting of social and non-social factors0
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy0
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents0
DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior0
CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting0
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications0
CIMLA: Interpretable AI for inference of differential causal networks0
CloudifierNet -- Deep Vision Models for Artificial Image Processing0
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis0
Show:102550
← PrevPage 35 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified