SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 176200 of 399 papers

TitleStatusHype
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Bootstrapping Cognitive Agents with a Large Language Model0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Learning Electromagnetic Metamaterial Physics With ChatGPT0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Collaborative ontology sharing and editing0
Collective inference of the truth of propositions from crowd probability judgments0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text0
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis0
Constructing Enhanced Mutual Information for Online Class-Incremental Learning0
Context and Humor: Understanding Amul advertisements of India0
Controversy Rules - Discovering Regions Where Classifiers (Dis-)Agree Exceptionally0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
DAML-ST5: Low Resource Style Transfer via Domain Adaptive Meta Learning0
Data structuring for the ontological modelling of wind energy systems0
Deep Prompt Multi-task Network for Abuse Language Detection0
Differentially Private Distributed Learning for Language Modeling Tasks0
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning0
Generating Question Relevant Captions to Aid Visual Question Answering0
Distributed Fine-tuning of Language Models on Private Data0
DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning0
Show:102550
← PrevPage 8 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified