SOTAVerified

Benchmarking

Papers

Showing 52015250 of 5548 papers

TitleStatusHype
2017 Robotic Instrument Segmentation ChallengeCode0
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic BiasCode0
Benchmarking Intersectional Biases in NLPCode0
Benchmarking Commercial Intent Detection Services with Practice-Driven EvaluationsCode0
Towards Fair and Privacy-Preserving Federated Deep ModelsCode0
SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEsCode0
Deep Neural Network Benchmarks for Selective ClassificationCode0
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual RelationshipsCode0
Arabic Speech Recognition by End-to-End, Modular Systems and HumanCode0
Benchmarking Image Perturbations for Testing Automated Driving Assistance SystemsCode0
Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature EmbeddingCode0
Deepened Graph Auto-Encoders Help Stabilize and Enhance Link PredictionCode0
Oral Imaging for Malocclusion Issues Assessments: OMNI Dataset, Deep Learning Baselines and BenchmarkingCode0
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based ReasoningCode0
ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue SummarizationCode0
Benchmarking Human and Automated Prompting in the Segment Anything ModelCode0
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?Code0
Deep Emotion Recognition in Textual Conversations: A SurveyCode0
Neural Style Transfer Improves 3D Cardiovascular MR Image Segmentation on Inconsistent DataCode0
OSS-Bench: Benchmark Generator for Coding LLMsCode0
DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural networkCode0
deepCR: Cosmic Ray Rejection with Deep LearningCode0
A quantum-classical reinforcement learning model to play Atari gamesCode0
Towards Ground-truth-free Evaluation of Any Segmentation in Medical ImagesCode0
Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic EnvironmentCode0
Out of Distribution Detection on ImageNet-OCode0
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtypingCode0
Deep Affinity Network for Multiple Object TrackingCode0
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal OptimizationCode0
Benchmarking Hierarchical Script KnowledgeCode0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource ScriptsCode0
Towards IID representation learning and its application on biomedical dataCode0
A projected nonlinear state-space model for forecasting time series signalsCode0
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationCode0
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word ProblemCode0
Dealing with missing data using attention and latent space regularizationCode0
DCR: Quantifying Data Contamination in LLMs EvaluationCode0
DateLogicQA: Benchmarking Temporal Biases in Large Language ModelsCode0
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing EvaluationCode0
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro DataCode0
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability PerspectiveCode0
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language ModelsCode0
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection ModelsCode0
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future DirectionsCode0
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-LocalizationCode0
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different LanguagesCode0
Partial Rankings of OptimizersCode0
A predictive analytics approach for stroke prediction using machine learning and neural networksCode0
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large pCode0
Show:102550
← PrevPage 105 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified