SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 4150 of 399 papers

TitleStatusHype
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language ModelsCode1
Aligning Medical Images with General Knowledge from Large Language ModelsCode1
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language ModelsCode1
DIAGen: Diverse Image Augmentation with Generative ModelsCode1
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Code1
Can Editing LLMs Inject Harm?Code1
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language ModelsCode1
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMsCode1
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language ModelsCode1
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank DecompositionCode1
Show:102550
← PrevPage 5 of 40Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified