SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 251300 of 399 papers

TitleStatusHype
KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution0
PASS-FC: Progressive and Adaptive Search Scheme for Fact Checking of Comprehensive Claims0
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models0
Pilot: Building the Federated Multimodal Instruction Tuning Framework0
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment0
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain0
Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions0
Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training0
Rethinking Two Consensuses of the Transferability in Deep Learning0
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning0
SAGE: Smart home Agent with Grounded Execution0
SAM-Guided Robust Representation Learning for One-Shot 3D Medical Image Segmentation0
Learning to Adapt SAM for Segmenting Cross-domain Point Clouds0
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation0
Sample-Efficient Behavior Cloning Using General Domain Knowledge0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI0
Score: A Rule Engine for the Scone Knowledge Base System0
scReader: Prompting Large Language Models to Interpret scRNA-seq Data0
Sculpting [CLS] Features for Pre-Trained Model-Based Class-Incremental Learning0
Semi-Supervised Medical Image Segmentation via Knowledge Mining from Large Models0
Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models0
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge0
Some Epistemological Problems with the Knowledge Level in Cognitive Architectures0
Specifying Conceptual Models Using Restricted Natural Language0
Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer0
Spoken Conversational Search for General Knowledge0
STELLA: Towards Protein Function Prediction with Multimodal LLMs Integrating Sequence-Structure Representations0
Stop Words for Processing Software Engineering Documents: Do they Matter?0
Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering0
Successive POI Recommendation via Brain-inspired Spatiotemporal Aware Representation0
TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice Questions0
Teaching Uncertainty Quantification in Machine Learning through Use Cases0
Tencent AI Lab Machine Translation Systems for WMT20 Chat Translation Task0
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers0
The Scaling Law for LoRA Base on Mutual Information Upper Bound0
The Wisdom of Crowds in the Recollection of Order Information0
The World in My Mind: Visual Dialog with Adversarial Multi-modal Feature Encoding0
Thinking LLMs: General Instruction Following with Thought Generation0
TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning0
Towards a Continuous Knowledge Learning Engine for Chatbots0
Towards Few-shot Out-of-Distribution Detection0
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs0
Towards Ontology Reshaping for KG Generation with User-in-the-Loop: Applied to Bosch Welding0
Transaction Logic with (Complex) Events0
Transferable Natural Language Interface to Structured Queries aided by Adversarial Generation0
Transfer learning of chaotic systems0
Show:102550
← PrevPage 6 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified