SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 141150 of 399 papers

TitleStatusHype
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language ModelsCode1
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Leveraging Large Language Models for enhanced personalised user experience in Smart Homes0
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank DecompositionCode1
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
CityBench: Evaluating the Capabilities of Large Language Models for Urban TasksCode1
Exploring Safety-Utility Trade-Offs in Personalized Language Models0
Are Large Language Models a Good Replacement of Taxonomies?Code0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Show:102550
← PrevPage 15 of 40Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified