SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 151175 of 399 papers

TitleStatusHype
Differentially Private Distributed Learning for Language Modeling Tasks0
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery0
KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration0
Knowledgebra: An Algebraic Learning Framework for Knowledge Graph0
Deep Prompt Multi-task Network for Abuse Language Detection0
Data structuring for the ontological modelling of wind energy systems0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
An Adaptive Deep Learning Framework for Day-ahead Forecasting of Photovoltaic Power Generation0
DAML-ST5: Low Resource Style Transfer via Domain Adaptive Meta Learning0
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era0
Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey0
Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI0
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Hierarchical Inductive Transfer for Continual Dialogue Learning0
Acquiring Knowledge from Pre-trained Model to Neural Machine Translation0
Hierarchical Inductive Transfer for Continual Dialogue Learning0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
Controversy Rules - Discovering Regions Where Classifiers (Dis-)Agree Exceptionally0
Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era0
GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning0
GOT4Rec: Graph of Thoughts for Sequential Recommendation0
Autonomous Intelligent Software Development0
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking0
How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization0
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level0
Show:102550
← PrevPage 7 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified