SOTAVerified

Benchmarking

Papers

Showing 26012650 of 5548 papers

TitleStatusHype
Fine-grained Hand Gesture Recognition in Multi-viewpoint Hand HygieneCode0
Benchmarking Hierarchical Script KnowledgeCode0
Delta-Influence: Unlearning Poisons via Influence FunctionsCode0
Aesthetic Image Captioning From Weakly-Labelled PhotographsCode0
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation DifficultyCode0
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal CarcinomaCode0
DefAn: Definitive Answer Dataset for LLMs Hallucination EvaluationCode0
GenCeption: Evaluate Multimodal LLMs with Unlabeled Unimodal DataCode0
From raw affiliations to organization identifiersCode0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
From Variability to Stability: Advancing RecSys Benchmarking PracticesCode0
Benchmarking Graph Representations and Graph Neural Networks for Multivariate Time Series ClassificationCode0
A projected nonlinear state-space model for forecasting time series signalsCode0
From MNIST to ImageNet and Back: Benchmarking Continual Curriculum LearningCode0
From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code RepositoriesCode0
Deep Reinforcement Learning for General Video Game AICode0
From Modern CNNs to Vision Transformers: Assessing the Performance, Robustness, and Classification Strategies of Deep Learning Models in HistopathologyCode0
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream TasksCode0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question AnsweringCode0
2017 Robotic Instrument Segmentation ChallengeCode0
Okapi: Generalising Better by Making Statistical Matches MatchCode0
FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-windowCode0
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing UnderstandingCode0
A predictive analytics approach for stroke prediction using machine learning and neural networksCode0
DeepOBS: A Deep Learning Optimizer Benchmark SuiteCode0
Deep Neural Network Benchmarks for Selective ClassificationCode0
From Bytes to Borsch: Fine-Tuning Gemma and Mistral for the Ukrainian Language RepresentationCode0
fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space ModelsCode0
GenderBench: Evaluation Suite for Gender Biases in LLMsCode0
GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and BenchmarkingCode0
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot InteractionCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
Deep Nets: What have they ever done for Vision?0
Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis0
Deep Learning vs. Gradient Boosting: Benchmarking state-of-the-art machine learning algorithms for credit scoring0
Benchmarking Graph Learning for Drug-Drug Interaction Prediction0
Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment0
Benchmarking GPUs on SVBRDF Extractor Model0
Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis0
Deep Learning Logo Detection with Data Expansion by Synthesising Context0
Benchmarking GPU and TPU Performance with Graph Neural Networks0
A practical generalization metric for deep networks benchmarking0
Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost Functions0
Optimal Design of Volt/VAR Control Rules of Inverters using Deep Learning0
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies0
Deep learning for molecular design - a review of the state of the art0
Deep learning for extracting protein-protein interactions from biomedical literature0
Approaches for benchmarking single-cell gene regulatory network inference methods0
Deep learning for action spotting in association football videos0
Show:102550
← PrevPage 53 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified