SOTAVerified

Benchmarking

Papers

Showing 49014950 of 5548 papers

TitleStatusHype
Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost BenchmarksCode0
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny DetectionCode0
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning RatesCode0
Benchmarking Positional Encodings for GNNs and Graph TransformersCode0
Fast and accurate alignment of long bisulfite-seq readsCode0
Benchmarking Popular Classification Models' Robustness to Random and Targeted CorruptionsCode0
False Promises in Medical Imaging AI? Assessing Validity of Outperformance ClaimsCode0
Benchmarking Perturbation-based Saliency Maps for Explaining Atari AgentsCode0
Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous DomainsCode0
Benchmarking person re-identification datasets and approaches for practical real-world implementationsCode0
FALCON: Feature-Label Constrained Graph Net Collapse for Memory Efficient GNNsCode0
FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainabilityCode0
Authentic Emotion Mapping: Benchmarking Facial Expressions in Real NewsCode0
Benchmarking performance of object detection under image distortions in an uncontrolled environmentCode0
GUNNEL: Guided Mixup Augmentation and Multi-View Fusion for Aquatic Animal SegmentationCode0
Multimodal Benchmarking and Recommendation of Text-to-Image Generation ModelsCode0
Segmenting France Across Four CenturiesCode0
Audio Explanation Synthesis with Generative Foundation ModelsCode0
Benchmarking Tropical Cyclone Rapid Intensification with Satellite Images and Attention-based Deep ModelsCode0
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure ModesCode0
Can LLMs perform structured graph reasoning?Code0
Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptation of Object DetectorsCode0
Exploring Model-based Planning with Policy NetworksCode0
Exploring Context Generalizability in Citywide Crowd Mobility Prediction: An Analytic Framework and BenchmarkCode0
Multimodal Multi-User Surface Recognition with the Kernel Two-Sample TestCode0
Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine TranslationCode0
Zero-shot generation of synthetic neurosurgical data with large language modelsCode0
Benchmarking Pathology Foundation Models: Adaptation Strategies and ScenariosCode0
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural NetworksCode0
Multiple Instance Learning: A Survey of Problem Characteristics and ApplicationsCode0
Self-Adjusting Weighted Expected Improvement for Bayesian OptimizationCode0
Multiple Light Source Dataset for Colour ResearchCode0
Experimental Analysis of Large-scale Learnable Vector Storage CompressionCode0
Benchmarking Parameter Control Methods in Differential Evolution for Mixed-Integer Black-Box OptimizationCode0
ThrowBench: Benchmarking LLMs by Predicting Runtime ExceptionsCode0
Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman ProcessCode0
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber AttacksCode0
Expecting The Unexpected: Towards Broad Out-Of-Distribution DetectionCode0
exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment ProblemCode0
Benchmarking optimality of time series classification methods in distinguishing diffusionsCode0
ExEBench: Benchmarking Foundation Models on Extreme Earth EventsCode0
MULTITAT: Benchmarking Multilingual Table-and-Text Question AnsweringCode0
Evolving Evolutionary Algorithms with PatternsCode0
Semantic Hilbert Space for Text Representation LearningCode0
A Continuous Information Gain Measure to Find the Most Discriminatory Problems for AI BenchmarkingCode0
Timage -- A Robust Time Series Classification PipelineCode0
AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness DetectionCode0
EvoLearner: Learning Description Logics with Evolutionary AlgorithmsCode0
Evidential Deep Learning for Uncertainty Quantification and Out-of-Distribution Detection in Jet Identification using Deep Neural NetworksCode0
Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test DataCode0
Show:102550
← PrevPage 99 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified