SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 251300 of 399 papers

TitleStatusHype
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
TURNER: The Uncertainty-based Retrieval Framework for Chinese NER0
Collective inference of the truth of propositions from crowd probability judgments0
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs0
MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning0
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System0
Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models0
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model0
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning0
MoST: Multi-modality Scene Tokenization for Motion Prediction0
Motif-Based Prompt Learning for Universal Cross-Domain Recommendation0
Collaborative ontology sharing and editing0
Multilingual Tourist Assistance using ChatGPT: Comparing Capabilities in Hindi, Telugu, and Kannada0
Multi-task Federated Learning with Encoder-Decoder Structure: Enabling Collaborative Learning Across Different Tasks0
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation0
Neural Discourse Relation Recognition with Semantic Memory0
Neural Regularized Domain Adaptation for Chinese Word Segmentation0
Shifted Autoencoders for Point Annotation Restoration in Object Counting0
Universal Item Tokenization for Transferable Generative Recommendation0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video0
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code0
Organizing Linked Data Quality Related Methods0
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering0
Learning Electromagnetic Metamaterial Physics With ChatGPT0
A Joint Planning and Learning Framework for Human-Aided Decision-Making0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking0
PASS-FC: Progressive and Adaptive Search Scheme for Fact Checking of Comprehensive Claims0
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
Utilisation d'une base de connaissances de sp\'ecialit\'e et de sens commun pour la simplification de comptes-rendus radiologiques (Radiological text simplification using a general knowledge base)0
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models0
Pilot: Building the Federated Multimodal Instruction Tuning Framework0
Video Question Answering Using CLIP-Guided Visual-Text Attention0
Bootstrapping Cognitive Agents with a Large Language Model0
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer0
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Show:102550
← PrevPage 6 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified