SOTAVerified

Benchmarking

Papers

Showing 19311940 of 5548 papers

TitleStatusHype
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified BenchmarkCode0
Fluorescence Reference Target Quantitative Analysis LibraryCode0
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents0
Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations0
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs0
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room0
Benchmarking machine learning models for predicting aerofoil performance0
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V30
Establishing Reliability Metrics for Reward Models in Large Language Models0
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture0
Show:102550
← PrevPage 194 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified