SOTAVerified

Benchmarking

Papers

Showing 42514275 of 5548 papers

TitleStatusHype
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library0
Automatic detection of passable roads after floods in remote sensed and social media data0
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms0
Automated Structured Radiology Report Generation0
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation0
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning0
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing0
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension0
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A0
Automated legal reasoning with discretion to act using s(LAW)0
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models0
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)0
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy0
Show:102550
← PrevPage 171 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified