General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 399 papers

Title	Date	Tasks	Status	Hype
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming	Jun 14, 2024	BenchmarkingGeneral Knowledge	—Unverified	0
Learning from Natural Language Explanations for Generalizable Entity Matching	Jun 13, 2024	Binary ClassificationDomain Generalization	—Unverified	0
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection	Jun 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation	Jun 9, 2024	Common Sense ReasoningDenoising	CodeCode Available	1
F-LMM: Grounding Frozen Large Multimodal Models	Jun 9, 2024	General KnowledgeInstruction Following	CodeCode Available	2
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers	Jun 7, 2024	General KnowledgeQuestion Generation	—Unverified	0
HYDRA: Model Factorization Framework for Black-Box LLM Personalization	Jun 5, 2024	General Knowledge	CodeCode Available	1
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context Encoding	Jun 2, 2024	Anomaly DetectionDensity Estimation	CodeCode Available	0
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks	Jun 1, 2024	General KnowledgeHippocampus	CodeCode Available	1
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection	May 26, 2024	General Knowledge	CodeCode Available	1
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge	May 15, 2024	General KnowledgeKnowledge Graphs	—Unverified	0
Health Index Estimation Through Integration of General Knowledge with Unsupervised Learning	May 8, 2024	General Knowledge	CodeCode Available	1
MoST: Multi-modality Scene Tokenization for Motion Prediction	Apr 30, 2024	General Knowledgemotion prediction	—Unverified	0
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs	Apr 29, 2024	DiagnosticGeneral Knowledge	—Unverified	0
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation	Apr 28, 2024	Action RecognitionGeneral Knowledge	—Unverified	0
Evaluating Consistency and Reasoning Capabilities of Large Language Models	Apr 25, 2024	General KnowledgeText Generation	—Unverified	0
Learning Electromagnetic Metamaterial Physics With ChatGPT	Apr 23, 2024	General Knowledge	—Unverified	0
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering	Apr 19, 2024	General Knowledge	—Unverified	0
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain	Apr 12, 2024	Continual PretrainingGeneral Knowledge	—Unverified	0
Knowledge graphs for empirical concept retrieval	Apr 10, 2024	General KnowledgeKnowledge Graphs	CodeCode Available	0
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge	Apr 8, 2024	General KnowledgeSafety Alignment	CodeCode Available	0
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models	Apr 5, 2024	Factual probeGeneral Knowledge	CodeCode Available	1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
Prompt Learning via Meta-Regularization	Apr 1, 2024	Domain GeneralizationGeneral Knowledge	CodeCode Available	1
Juru: Legal Brazilian Large Language Model from Reputable Sources	Mar 26, 2024	General KnowledgeLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 7 of 16Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	94.3	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	93.9	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	84.8	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	84.2	—	Unverified
6	Gopher-280B (few-shot, k=5)	Accuracy	84.1	—	Unverified
7	Gopher-280B (few-shot, k=5)	Accuracy	83.9	—	Unverified
8	Gopher-280B (few-shot, k=5)	Accuracy	83.3	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	81.8	—	Unverified
10	Gopher-280B (few-shot, k=5)	Accuracy	81	—	Unverified