SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 101125 of 399 papers

TitleStatusHype
Knowledge Graph Contrastive Learning for RecommendationCode1
The Unreasonable Effectiveness of Easy Training Data for Hard TasksCode1
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly DetectionCode1
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to DoCode1
Towards Task Sampler Learning for Meta-LearningCode1
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of ExpertsCode1
TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research CorporaCode1
A New Learning Paradigm for Foundation Model-based Remote Sensing Change DetectionCode1
PELMS: Pre-training for Effective Low-Shot Multi-Document SummarizationCode0
Patching as Translation: the Data and the MetaphorCode0
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language ModelsCode0
Are Large Language Models a Good Replacement of Taxonomies?Code0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
Efficient Transfer Learning for Video-language Foundation ModelsCode0
Efficient Relation-aware Neighborhood Aggregation in Graph Neural Networks via Tensor DecompositionCode0
Effective Skill Unlearning through Intervention and AbstentionCode0
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMsCode0
Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement LearningCode0
Pruning neural network models for gene regulatory dynamics using data and domain knowledgeCode0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
Leveraging Large Language Models for Automated Dialogue AnalysisCode0
Domain Generalization via Model-Agnostic Learning of Semantic FeaturesCode0
Learning to Understand Phrases by Embedding the DictionaryCode0
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful KnowledgeCode0
Learning to Learn Variational Semantic MemoryCode0
Show:102550
← PrevPage 5 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified