SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 201225 of 399 papers

TitleStatusHype
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning0
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
Constructing Enhanced Mutual Information for Online Class-Incremental Learning0
An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph0
Quantized Prompt for Efficient Generalization of Vision-Language ModelsCode0
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era0
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects0
Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian0
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation0
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning0
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Leveraging Large Language Models for enhanced personalised user experience in Smart Homes0
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
Exploring Safety-Utility Trade-Offs in Personalized Language Models0
Are Large Language Models a Good Replacement of Taxonomies?Code0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Learning from Natural Language Explanations for Generalizable Entity Matching0
Show:102550
← PrevPage 9 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified