SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 351399 of 399 papers

TitleStatusHype
Test-Time Self-Adaptive Small Language Models for Question AnsweringCode0
Connecting a French Dictionary from the Beginning of the 20th Century to WikidataCode0
What Does My QA Model Know? Devising Controlled Probes using Expert KnowledgeCode0
Pruning neural network models for gene regulatory dynamics using data and domain knowledgeCode0
Effective Skill Unlearning through Intervention and AbstentionCode0
Learning to Learn Variational Semantic MemoryCode0
Domain Generalization via Model-Agnostic Learning of Semantic FeaturesCode0
Dive into the Resolution Augmentations and Metrics in Low Resolution Face Recognition: A Plain yet Effective New BaselineCode0
Comprehensive Fair Meta-learned Recommender SystemCode0
A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service RoboticsCode0
Knowledge graphs for empirical concept retrievalCode0
Should We Really Edit Language Models? On the Evaluation of Edited Language ModelsCode0
Knowledge Distillation for Detection Transformer with Consistent Distillation Points SamplingCode0
Joey NMT: A Minimalist NMT Toolkit for NovicesCode0
Distribution-aware Noisy-label Crack SegmentationCode0
Distilling Stereo Networks for Performant and Efficient Leaner NetworksCode0
Patching as Translation: the Data and the MetaphorCode0
PELMS: Pre-training for Effective Low-Shot Multi-Document SummarizationCode0
What Makes Cryptic Crosswords Challenging for LLMs?Code0
Are Large Language Models a Good Replacement of Taxonomies?Code0
Planning Safety Trajectories with Dual-Phase, Physics-Informed, and Transportation Knowledge-Driven Large Language ModelsCode0
Integrating Semantic Knowledge to Tackle Zero-shot Text ClassificationCode0
Commonsense Knowledge in Word Associations and ConceptNetCode0
Improving Personalized Search with Regularized Low-Rank Parameter UpdatesCode0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language ModelsCode0
Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement LearningCode0
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing CapabilitiesCode0
Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine ComprehensionCode0
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsCode0
PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online LearningCode0
G-MAP: General Memory-Augmented Pre-trained Language Model for Domain TasksCode0
Visual Question Answering: A Survey of Methods and DatasetsCode0
From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological EngineeringCode0
Quantized Prompt for Efficient Generalization of Vision-Language ModelsCode0
World Knowledge in Multiple Choice Reading ComprehensionCode0
Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray AnalysisCode0
Fed-CO2: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated LearningCode0
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained ModelCode0
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case StudyCode0
REFinD: Relation Extraction Financial DatasetCode0
Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov LogicCode0
Exploiting Adapters for Cross-lingual Low-resource Speech RecognitionCode0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Towards Knowledge-Augmented Visual Question AnsweringCode0
ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist ExaminationCode0
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge GraphCode0
DAGPrompT: Pushing the Limits of Graph Prompting with a Distribution-aware Graph Prompt Tuning ApproachCode0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Show:102550
← PrevPage 8 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified