SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 251300 of 399 papers

TitleStatusHype
ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation0
Shifted Autoencoders for Point Annotation Restoration in Object Counting0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
AcademicGPT: Empowering Academic Research0
Towards Few-shot Out-of-Distribution Detection0
PELMS: Pre-training for Effective Low-Shot Multi-Document SummarizationCode0
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case StudyCode0
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI0
SAGE: Smart home Agent with Grounded Execution0
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained ModelCode0
Test-Time Self-Adaptive Small Language Models for Question AnsweringCode0
Motif-Based Prompt Learning for Universal Cross-Domain Recommendation0
Learning to Adapt SAM for Segmenting Cross-domain Point Clouds0
Dobby: A Conversational Service Robot Driven by GPT-40
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Key Factors Affecting European Reactions to AI in European Full and Flawed Democracies0
Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis0
Leveraging Large Language Models for Automated Dialogue AnalysisCode0
Learning to Model the World with Language0
Multilingual Tourist Assistance using ChatGPT: Comparing Capabilities in Hindi, Telugu, and Kannada0
A new algorithm for Subgroup Set Discovery based on Information Gain0
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis0
Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI0
Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement LearningCode0
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge GraphCode0
REFinD: Relation Extraction Financial DatasetCode0
ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist ExaminationCode0
Investigating Forgetting in Pre-Trained Representations Through Continual Learning0
Score: A Rule Engine for the Scone Knowledge Base System0
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code0
"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with Emoji Using Valence Reversal and Semantic Incongruity0
Generative Meta-Learning for Zero-Shot Relation Triplet Extraction0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Stop Words for Processing Software Engineering Documents: Do they Matter?0
Video Question Answering Using CLIP-Guided Visual-Text Attention0
Exploit CAM by itself: Complementary Learning System for Weakly Supervised Semantic Segmentation0
Dive into the Resolution Augmentations and Metrics in Low Resolution Face Recognition: A Plain yet Effective New BaselineCode0
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers0
KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution0
DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning0
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment0
Efficient Relation-aware Neighborhood Aggregation in Graph Neural Networks via Tensor DecompositionCode0
G-MAP: General Memory-Augmented Pre-trained Language Model for Domain TasksCode0
Rethinking Two Consensuses of the Transferability in Deep Learning0
Knowledge Distillation for Detection Transformer with Consistent Distillation Points SamplingCode0
World Knowledge in Multiple Choice Reading ComprehensionCode0
Evident: a Development Methodology and a Knowledge Base Topology for Data Mining, Machine Learning and General Knowledge Management0
Dominance-based Rough Set Approach, basic ideas and main trends0
Towards Ontology Reshaping for KG Generation with User-in-the-Loop: Applied to Bosch Welding0
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer0
Show:102550
← PrevPage 6 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified