SOTAVerified

Benchmarking

Papers

Showing 31613170 of 5548 papers

TitleStatusHype
Benchmarking Educational Program RepairCode0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking0
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-TuningCode0
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images0
Performance Evaluation of Real-Time Object Detection for Electric ScootersCode0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language ModelsCode0
Systematic Review: Anomaly Detection in Connected and Autonomous Vehicles0
PhilHumans: Benchmarking Machine Learning for Personal Health0
A Normative Framework for Benchmarking Consumer Fairness in Large Language Model Recommender System0
Show:102550
← PrevPage 317 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified