SOTAVerified

Benchmarking

Papers

Showing 42514300 of 5548 papers

TitleStatusHype
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning0
A Comprehensive Benchmarking Platform for Deep Generative Models in Molecular Design0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation0
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library0
Automatic detection of passable roads after floods in remote sensed and social media data0
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents0
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms0
Automated Structured Radiology Report Generation0
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation0
Automated Machine Learning on Big Data using Stochastic Algorithm Tuning0
Pulse Shape-Aided Multipath Delay Estimation for Fine-Grained WiFi Sensing0
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension0
Pushing Boundaries: Exploring Zero Shot Object Classification with Large Multimodal Models0
Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A0
Automated legal reasoning with discretion to act using s(LAW)0
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models0
Automated detection of gibbon calls from passive acoustic monitoring data using convolutional neural networks in the "torch for R" ecosystem0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)0
PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy0
AutoLay: Benchmarking amodal layout estimation for autonomous driving0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
Python Random Graph Generator0
Q2SAR: A Quantum Multiple Kernel Learning Approach for Drug Discovery0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs0
AutoAI-TS: AutoAI for Time Series Forecasting0
QDA^2: A principled approach to automatically annotating charge stability diagrams0
A Universal Protocol to Benchmark Camera Calibration for Sports0
A Unified Taylor Framework for Revisiting Attribution Methods0
A Complementarity Analysis of the COCO Benchmark Problems and Artificially Generated Problems0
QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges0
A Comparison of Word Embeddings for English and Cross-Lingual Chinese Word Sense Disambiguation0
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning0
QSAM-Net: Rain streak removal by quaternion neural network with self-attention module0
Decoding Intelligence: A Framework for Certifying Knowledge Comprehension in LLMs0
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation0
Unbounded Bayesian Optimization via Regularization0
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling0
Quality Assessment of Low Light Restored Images: A Subjective Study and an Unsupervised Model0
Quality Assured: Rethinking Annotation Strategies in Imaging AI0
Quality at the Tail of Machine Learning Inference0
Uncertainty estimation for Cross-dataset performance in Trajectory prediction0
A Unified Study of Machine Learning Explanation Evaluation Metrics0
QuantBench: Benchmarking AI Methods for Quantitative Investment0
Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling0
Show:102550
← PrevPage 86 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified