SOTAVerified

Benchmarking

Papers

Showing 25762600 of 5548 papers

TitleStatusHype
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural NetworksCode0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning AlgorithmsCode0
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtypingCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal OptimizationCode0
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and BenchmarkCode0
Benchmarking Hierarchical Script KnowledgeCode0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
Forecasting time series with constraintsCode0
FHBench: Towards Efficient and Personalized Federated Learning for Multimodal HealthcareCode0
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is ComingCode0
Forecasting Future International Events: A Reliable Dataset for Text-Based Event ModelingCode0
Aesthetic Image Captioning From Weakly-Labelled PhotographsCode0
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation DifficultyCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering ApproachCode0
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN ParametersCode0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0Code0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
Show:102550
← PrevPage 104 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified