SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 301350 of 399 papers

TitleStatusHype
Transformer Based Bengali Chatbot Using General Knowledge Dataset0
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation0
TURNER: The Uncertainty-based Retrieval Framework for Chinese NER0
Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models0
Universal Item Tokenization for Transferable Generative Recommendation0
Utilisation d'une base de connaissances de sp\'ecialit\'e et de sens commun pour la simplification de comptes-rendus radiologiques (Radiological text simplification using a general knowledge base)0
Video Question Answering Using CLIP-Guided Visual-Text Attention0
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints0
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives0
Visual Question Answering as Reading Comprehension0
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making0
What's a Good Prediction? Challenges in evaluating an agent's knowledge0
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering0
"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with Emoji Using Valence Reversal and Semantic Incongruity0
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs0
MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning0
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System0
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model0
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning0
MoST: Multi-modality Scene Tokenization for Motion Prediction0
Motif-Based Prompt Learning for Universal Cross-Domain Recommendation0
Multilingual Tourist Assistance using ChatGPT: Comparing Capabilities in Hindi, Telugu, and Kannada0
Multi-task Federated Learning with Encoder-Decoder Structure: Enabling Collaborative Learning Across Different Tasks0
Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation0
Neural Discourse Relation Recognition with Semantic Memory0
Neural Regularized Domain Adaptation for Chinese Word Segmentation0
Shifted Autoencoders for Point Annotation Restoration in Object Counting0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video0
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code0
Organizing Linked Data Quality Related Methods0
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering0
A Joint Planning and Learning Framework for Human-Aided Decision-Making0
PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking0
Luminoso at SemEval-2018 Task 10: Distinguishing Attributes Using Text Corpora and Relational KnowledgeCode0
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMsCode0
Leveraging Large Language Models for Automated Dialogue AnalysisCode0
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful KnowledgeCode0
Efficient Transfer Learning for Video-language Foundation ModelsCode0
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?Code0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
Task-Driven and Experience-Based Question Answering Corpus for In-Home Robot Application in the House3D Virtual EnvironmentCode0
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context EncodingCode0
BnMMLU: Measuring Massive Multitask Language Understanding in BengaliCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Learning to Understand Phrases by Embedding the DictionaryCode0
Efficient Relation-aware Neighborhood Aggregation in Graph Neural Networks via Tensor DecompositionCode0
GenKnowSub: Improving Modularity and Reusability of LLMs through General Knowledge SubtractionCode0
SciDeBERTa: Learning DeBERTa for Science Technology Documents and Fine-Tuning Information Extraction TasksCode0
Show:102550
← PrevPage 7 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified