General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 399 papers

Title	Date	Tasks	Status	Hype
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming	Jun 14, 2024	BenchmarkingGeneral Knowledge	—Unverified	0
Learning from Natural Language Explanations for Generalizable Entity Matching	Jun 13, 2024	Binary ClassificationDomain Generalization	—Unverified	0
RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection	Jun 11, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation	Jun 9, 2024	Common Sense ReasoningDenoising	CodeCode Available	1
F-LMM: Grounding Frozen Large Multimodal Models	Jun 9, 2024	General KnowledgeInstruction Following	CodeCode Available	2
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers	Jun 7, 2024	General KnowledgeQuestion Generation	—Unverified	0
HYDRA: Model Factorization Framework for Black-Box LLM Personalization	Jun 5, 2024	General Knowledge	CodeCode Available	1
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context Encoding	Jun 2, 2024	Anomaly DetectionDensity Estimation	CodeCode Available	0
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks	Jun 1, 2024	General KnowledgeHippocampus	CodeCode Available	1
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection	May 26, 2024	General Knowledge	CodeCode Available	1
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge	May 15, 2024	General KnowledgeKnowledge Graphs	—Unverified	0
Health Index Estimation Through Integration of General Knowledge with Unsupervised Learning	May 8, 2024	General Knowledge	CodeCode Available	1
MoST: Multi-modality Scene Tokenization for Motion Prediction	Apr 30, 2024	General Knowledgemotion prediction	—Unverified	0
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs	Apr 29, 2024	DiagnosticGeneral Knowledge	—Unverified	0
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation	Apr 28, 2024	Action RecognitionGeneral Knowledge	—Unverified	0
Evaluating Consistency and Reasoning Capabilities of Large Language Models	Apr 25, 2024	General KnowledgeText Generation	—Unverified	0
Learning Electromagnetic Metamaterial Physics With ChatGPT	Apr 23, 2024	General Knowledge	—Unverified	0
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering	Apr 19, 2024	General Knowledge	—Unverified	0
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain	Apr 12, 2024	Continual PretrainingGeneral Knowledge	—Unverified	0
Knowledge graphs for empirical concept retrieval	Apr 10, 2024	General KnowledgeKnowledge Graphs	CodeCode Available	0
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge	Apr 8, 2024	General KnowledgeSafety Alignment	CodeCode Available	0
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models	Apr 5, 2024	Factual probeGeneral Knowledge	CodeCode Available	1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT	Apr 3, 2024	BenchmarkingGeneral Knowledge	CodeCode Available	1
Prompt Learning via Meta-Regularization	Apr 1, 2024	Domain GeneralizationGeneral Knowledge	CodeCode Available	1
Juru: Legal Brazilian Large Language Model from Reputable Sources	Mar 26, 2024	General KnowledgeLanguage Modeling	—Unverified	0
Are LLMs Good Cryptic Crossword Solvers?	Mar 15, 2024	General Knowledge	—Unverified	0
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model	Mar 13, 2024	General KnowledgeInstruction Following	CodeCode Available	2
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning	Mar 11, 2024	Domain GeneralizationFederated Learning	—Unverified	0
See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI	Mar 11, 2024	Brain DecodingGeneral Knowledge	CodeCode Available	1
Deep Prompt Multi-task Network for Abuse Language Detection	Mar 8, 2024	Abusive LanguageGeneral Knowledge	—Unverified	0
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models	Mar 6, 2024	EthicsGeneral Knowledge	CodeCode Available	1
K-Link: Knowledge-Link Graph from LLMs for Enhanced Representation Learning in Multivariate Time-Series Data	Mar 6, 2024	General Knowledgegraph construction	—Unverified	0
Pruning neural network models for gene regulatory dynamics using data and domain knowledge	Mar 5, 2024	General KnowledgeNetwork Pruning	CodeCode Available	0
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation	Mar 4, 2024	Age And Gender ClassificationAge and Gender Estimation	CodeCode Available	3
Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese	Feb 27, 2024	General KnowledgeQuestion Answering	CodeCode Available	1
Bootstrapping Cognitive Agents with a Large Language Model	Feb 25, 2024	General KnowledgeLanguage Modeling	—Unverified	0
OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models	Feb 21, 2024	General KnowledgeLogical Reasoning	CodeCode Available	1
Inductive Graph Alignment Prompt: Bridging the Gap between Graph Pre-training and Inductive Fine-tuning From Spectral Perspective	Feb 21, 2024	General KnowledgeGraph Classification	—Unverified	0
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge	Feb 12, 2024	General KnowledgeMultiple-choice	CodeCode Available	2
Pre-training and Diagnosing Knowledge Base Completion Models	Jan 27, 2024	General KnowledgeKnowledge Base Completion	CodeCode Available	1
GALA: Generating Animatable Layered Assets from a Single Scan	Jan 23, 2024	3D geometryGeneral Knowledge	—Unverified	0
INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning	Jan 22, 2024	class-incremental learningClass Incremental Learning	—Unverified	0
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks	Jan 12, 2024	General KnowledgeIn-Context Learning	CodeCode Available	1
Generic Knowledge Boosted Pre-training For Remote Sensing Images	Jan 9, 2024	Change DetectionDeep Learning	CodeCode Available	1
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation	Jan 1, 2024	General KnowledgeNavigate	CodeCode Available	2
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling	Jan 1, 2024	General KnowledgeKnowledge Distillation	—Unverified	0
MMA: Multi-Modal Adapter for Vision-Language Models	Jan 1, 2024	Domain GeneralizationGeneral Knowledge	CodeCode Available	2
GeoGalactica: A Scientific Large Language Model in Geoscience	Dec 31, 2023	Document ClassificationGeneral Knowledge	CodeCode Available	1
Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection	Dec 23, 2023	Change DetectionGeneral Knowledge	CodeCode Available	1
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation	Dec 22, 2023	Conditional Image GenerationGeneral Knowledge	CodeCode Available	1

Show:10 25 50

← PrevPage 4 of 8Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Chinchilla-70B (few-shot, k=5)	Accuracy	94.3	—	Unverified
2	Gopher-280B (few-shot, k=5)	Accuracy	93.9	—	Unverified
3	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	—	Unverified
4	Gopher-280B (few-shot, k=5)	Accuracy	84.8	—	Unverified
5	Gopher-280B (few-shot, k=5)	Accuracy	84.2	—	Unverified
6	Gopher-280B (few-shot, k=5)	Accuracy	84.1	—	Unverified
7	Gopher-280B (few-shot, k=5)	Accuracy	83.9	—	Unverified
8	Gopher-280B (few-shot, k=5)	Accuracy	83.3	—	Unverified
9	Gopher-280B (few-shot, k=5)	Accuracy	81.8	—	Unverified
10	Gopher-280B (few-shot, k=5)	Accuracy	81	—	Unverified