SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 101150 of 399 papers

TitleStatusHype
HAE-RAE Bench: Evaluation of Korean Knowledge in Language ModelsCode1
KALA: Knowledge-Augmented Language Model AdaptationCode1
CC-Riddle: A Question Answering Dataset of Chinese Character RiddlesCode1
GeoGalactica: A Scientific Large Language Model in GeoscienceCode1
Knowledge Prompt-tuning for Sequential RecommendationCode1
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Code1
Transformers as Soft Reasoners over LanguageCode1
A New Learning Paradigm for Foundation Model-based Remote Sensing Change DetectionCode1
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Are LLMs Good Cryptic Crossword Solvers?0
AcademicGPT: Empowering Academic Research0
Learning Electromagnetic Metamaterial Physics With ChatGPT0
Enhance Graph Alignment for Large Language Models0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian0
Enabling Autonomic Microservice Management through Self-Learning Agents0
Applying SoftTriple Loss for Supervised Language Model Fine Tuning0
AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation0
Enhancing Target-unspecific Tasks through a Features Matrix0
How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization0
Efficient illumination angle self-calibration in Fourier ptychography0
Evaluating Company-specific Biases in Financial Sentiment Analysis using Large Language Models0
Evaluating Consistency and Reasoning Capabilities of Large Language Models0
Evaluating Polish linguistic and cultural competency in large language models0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
Bootstrapping Cognitive Agents with a Large Language Model0
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning0
Image Captioning and Visual Question Answering Based on Attributes and External Knowledge0
Dominance-based Rough Set Approach, basic ideas and main trends0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Domain Specific, Semi-Supervised Transfer Learning for Medical Imaging0
An Energy Ontology for Global City Indicators (ISO 37120)0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models0
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer0
Dobby: A Conversational Service Robot Driven by GPT-40
DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning0
Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation0
Hierarchical Inductive Transfer for Continual Dialogue Learning0
GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Distributed Fine-tuning of Language Models on Private Data0
Analysis of Watson's Strategies for Playing Jeopardy!0
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph0
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning0
Differentially Private Distributed Learning for Language Modeling Tasks0
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery0
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified