SOTAVerified

Benchmarking

Papers

Showing 21762200 of 5548 papers

TitleStatusHype
An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models0
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous AgentsCode4
GCondenser: Benchmarking Graph CondensationCode1
A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings0
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models0
Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture BreedingCode1
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models0
EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods0
Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep LearningCode2
DispaRisk: Auditing Fairness Through Usable InformationCode0
MTVQA: Benchmarking Multilingual Text-Centric Visual Question AnsweringCode2
EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge0
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions0
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetCode3
A Robust Autoencoder Ensemble-Based Approach for Anomaly Detection in Text0
Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsCode0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
DocuMint: Docstring Generation for Python using Small Language ModelsCode1
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language ModelsCode2
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationCode1
SpeechVerse: A Large-scale Generalizable Audio Language Model0
UCCIX: Irish-eXcellence Large Language Model0
Divergent Creativity in Humans and Large Language ModelsCode0
Show:102550
← PrevPage 88 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified