SOTAVerified

Benchmarking

Papers

Showing 37513800 of 5548 papers

TitleStatusHype
3DOS: Towards 3D Open Set Learning -- Benchmarking and Understanding Semantic Novelty Detection on Point CloudsCode0
Panoptic Scene Graph GenerationCode2
Rethinking the Reference-based Distinctive Image CaptioningCode0
PieTrack: An MOT solution based on synthetic data training and self-supervised domain adaptation0
Physiology-based simulation of the retinal vasculature enables annotation-free segmentation of OCT angiographsCode1
Benchmarking tools for a priori identifiability analysisCode0
Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific ApplicationsCode0
Detecting beats in the photoplethysmogram: benchmarking open-source algorithmsCode1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and LocalizationCode1
Initial recommendations for performing, benchmarking, and reporting single-cell proteomics experimentsCode1
Benchmarking Transformers-based models on French Spoken Language Understanding tasks0
Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence ClassificationCode0
The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural NetworksCode0
Why do tree-based models still outperform deep learning on tabular data?Code2
GOAL: Towards Benchmarking Few-Shot Sports Game SummarizationCode0
Benchmarking Omni-Vision Representation through the Lens of Visual RealmsCode1
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey0
Immunofluorescence Capillary Imaging Segmentation: Cases StudyCode0
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty QuantificationCode0
Slot Filling for Extracting Reskilling and Upskilling Options from the WebCode0
TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphsCode1
Graph Generative Model for Benchmarking Graph Neural NetworksCode1
A novel evaluation methodology for supervised Feature Ranking algorithmsCode0
Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling0
OVQA: A Clinically Generated Visual Question Answering Dataset0
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot LearningCode2
Benefits and Challenges of Dynamic Modelling of Cascading Failures in Power Systems0
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and LeaderboardingCode2
Identifying the Context Shift between Test Benchmarks and Production Data0
Can Language Models Make Fun? A Case Study in Chinese Comical CrosstalkCode1
Less Is More: A Comparison of Active Learning Strategies for 3D Medical Image SegmentationCode1
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features0
Towards Toxic Positivity Detection0
Benchmarking Intersectional Biases in NLPCode0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles0
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
Benchmarking Language-agnostic Intent Classification for Virtual Assistant PlatformsCode0
Local manifold learning and its link to domain-based physics knowledgeCode0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
DFGC 2022: The Second DeepFake Game CompetitionCode1
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital PathologyCode1
Computer-aided diagnosis and prediction in brain disorders0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesCode1
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
Benchopt: Reproducible, efficient and collaborative optimization benchmarksCode4
The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick DataCode1
Show:102550
← PrevPage 76 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified