SOTAVerified

Benchmarking

Papers

Showing 31013150 of 5548 papers

TitleStatusHype
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking0
Time Sensitive Knowledge Editing through Efficient Finetuning0
Statistical Multicriteria Benchmarking via the GSD-Front0
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection0
Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation0
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs0
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check0
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
Analyzing the Feature Extractor Networks for Face Image SynthesisCode0
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation DatasetCode0
ACCORD: Closing the Commonsense Measurability GapCode0
TruthEval: A Dataset to Evaluate LLM Truthfulness and ReliabilityCode0
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions0
ELSA: Evaluating Localization of Social Activities in Urban Streets using Open-Vocabulary Detection0
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models0
Scaffold Splits Overestimate Virtual Screening Performance0
WebSuite: Systematically Evaluating Why Web Agents FailCode0
On the project risk baseline: integrating aleatory uncertainty into project scheduling0
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images0
CoSy: Evaluating Textual Explanations of Neurons0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification0
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data0
Exploring Thermography Technology: A Comprehensive Facial Dataset for Face Detection, Recognition, and Emotion0
Risk-Neutral Generative Networks0
A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis0
Benchmarking General-Purpose In-Context Learning0
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases0
BOLD: Boolean Logic Deep Learning0
NuwaTS: a Foundation Model Mending Every Incomplete Time Series0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model0
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"0
Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks0
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study0
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems0
Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology images0
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification0
A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings0
An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models0
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models0
EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods0
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
DispaRisk: Auditing Fairness Through Usable InformationCode0
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsCode0
A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text0
SpeechVerse: A Large-scale Generalizable Audio Language Model0
Show:102550
← PrevPage 63 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified