SOTAVerified

Benchmarking

Papers

Showing 17711780 of 5548 papers

TitleStatusHype
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal0
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language ModelsCode1
Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions0
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and BeyondCode1
Segment Anything in Medical Images and Videos: Benchmark and DeploymentCode7
Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline0
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational AgentsCode1
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities0
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future0
LMEMs for post-hoc analysis of HPO BenchmarkingCode0
Show:102550
← PrevPage 178 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified