SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 51100 of 399 papers

TitleStatusHype
GeoGalactica: A Scientific Large Language Model in GeoscienceCode1
BEAMetrics: A Benchmark for Language Generation Evaluation EvaluationCode1
Prompt Learning via Meta-RegularizationCode1
Importance-based Neuron Allocation for Multilingual Neural Machine TranslationCode1
RAG with Differential PrivacyCode1
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text GenerationCode1
KALA: Knowledge-Augmented Language Model AdaptationCode1
Pre-training and Diagnosing Knowledge Base Completion ModelsCode1
Can Editing LLMs Inject Harm?Code1
Knowledge Prompt-tuning for Sequential RecommendationCode1
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and SundaneseCode1
Knowledge Graph Contrastive Learning for RecommendationCode1
Aligning Medical Images with General Knowledge from Large Language ModelsCode1
Prediction and Control in Continual Reinforcement LearningCode1
PMET: Precise Model Editing in a TransformerCode1
Few-Shot Class-Incremental Learning via Class-Aware Bilateral DistillationCode1
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud AnalysisCode1
Prompt-aligned Gradient for Prompt TuningCode1
RDF2Vec: RDF Graph Embeddings and Their ApplicationsCode1
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language ModelsCode1
Overcoming Generic Knowledge Loss with Selective Parameter UpdateCode1
MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic SegmentationCode1
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language ModelsCode1
A General Knowledge Injection Framework for ICD CodingCode1
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMsCode1
MultiGPrompt for Multi-Task Pre-Training and Prompting on GraphsCode1
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question AnsweringCode1
EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion RecognitionCode1
Automated Phrase Mining from Massive Text CorporaCode1
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model AdaptationCode1
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language ModelCode1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented GenerationCode1
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic CalibrationCode1
Learning with Recoverable ForgettingCode1
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD DetectionCode1
CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage RefinementCode1
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report GenerationCode1
DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object DetectionCode1
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language ModelsCode1
Towards Task Sampler Learning for Meta-LearningCode1
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model FusionCode1
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank DecompositionCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
DIAGen: Diverse Image Augmentation with Generative ModelsCode1
Bert4XMR: Cross-Market Recommendation with Bidirectional Encoder Representations from TransformerCode1
Go From the General to the Particular: Multi-Domain Translation with Domain Transformation NetworksCode1
Better Question-Answering Models on a BudgetCode1
Health Index Estimation Through Integration of General Knowledge with Unsupervised LearningCode1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained ModelCode1
CityBench: Evaluating the Capabilities of Large Language Models for Urban TasksCode1
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified