SOTAVerified

Benchmarking

Papers

Showing 21212130 of 5548 papers

TitleStatusHype
Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices0
BEADs: Bias Evaluation Across Domains0
MLVU: Benchmarking Multi-task Long Video UnderstandingCode3
TIDMAD: Time Series Dataset for Discovering Dark Matter with AI DenoisingCode1
Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation0
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection0
CattleFace-RGBT: RGB-T Cattle Facial Landmark BenchmarkCode1
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN PerformanceCode0
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation DatasetCode0
Show:102550
← PrevPage 213 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified