SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 151175 of 399 papers

TitleStatusHype
Leveraging Large Language Models for Automated Dialogue AnalysisCode0
Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray AnalysisCode0
Commonsense Knowledge in Word Associations and ConceptNetCode0
Fed-CO2: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated LearningCode0
Learning to Learn Variational Semantic MemoryCode0
Learning to Understand Phrases by Embedding the DictionaryCode0
Knowledge Distillation for Detection Transformer with Consistent Distillation Points SamplingCode0
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained ModelCode0
Knowledge graphs for empirical concept retrievalCode0
A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service RoboticsCode0
Integrating Semantic Knowledge to Tackle Zero-shot Text ClassificationCode0
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case StudyCode0
Improving Personalized Search with Regularized Low-Rank Parameter UpdatesCode0
Joey NMT: A Minimalist NMT Toolkit for NovicesCode0
Exploiting Adapters for Cross-lingual Low-resource Speech RecognitionCode0
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge SubtractionCode0
ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist ExaminationCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge GraphCode0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
Pruning neural network models for gene regulatory dynamics using data and domain knowledgeCode0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Evaluating Polish linguistic and cultural competency in large language models0
Evaluating Consistency and Reasoning Capabilities of Large Language Models0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Show:102550
← PrevPage 7 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified