SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 171180 of 399 papers

TitleStatusHype
Joey NMT: A Minimalist NMT Toolkit for NovicesCode0
Learning to Understand Phrases by Embedding the DictionaryCode0
Evaluating Polish linguistic and cultural competency in large language models0
Evaluating Consistency and Reasoning Capabilities of Large Language Models0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Evaluating Company-specific Biases in Financial Sentiment Analysis using Large Language Models0
Are LLMs Good Cryptic Crossword Solvers?0
Enhancing Target-unspecific Tasks through a Features Matrix0
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation0
Enhance Graph Alignment for Large Language Models0
Show:102550
← PrevPage 18 of 40Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified