SOTAVerified

Ethics

Papers

Showing 1120 of 832 papers

TitleStatusHype
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via CipherCode2
Getting pwn'd by AI: Penetration Testing with Large Language ModelsCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Aligning AI With Shared Human ValuesCode2
XTRUST: On the Multilingual Trustworthiness of Large Language ModelsCode1
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive SurveyCode1
Language Model Alignment in Multilingual Trolley ProblemsCode1
MoralBench: Moral Evaluation of LLMsCode1
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language ModelsCode1
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese JournalismCode1
Show:102550
← PrevPage 2 of 84Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1RuGPT-3 LargeAccuracy68.6Unverified
2RuGPT-3 MeduimAccuracy68.3Unverified
3RuGPT-3 SmallAccuracy55.5Unverified
4Human benchmarkAccuracy52.9Unverified
#ModelMetricClaimedVerifiedStatus
1Human benchmarkAccuracy67.6Unverified
2RuGPT-3 SmallAccuracy60.9Unverified
3RuGPT-3 LargeAccuracy44.9Unverified
4RuGPT-3 MediumAccuracy44.1Unverified