SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 221230 of 399 papers

TitleStatusHype
Are Large Language Models a Good Replacement of Taxonomies?Code0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Learning from Natural Language Explanations for Generalizable Entity Matching0
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers0
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context EncodingCode0
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge0
MoST: Multi-modality Scene Tokenization for Motion Prediction0
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs0
Show:102550
← PrevPage 23 of 40Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified