SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 151175 of 399 papers

TitleStatusHype
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding0
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models0
Enabling Autonomic Microservice Management through Self-Learning Agents0
FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Sample-Efficient Behavior Cloning Using General Domain Knowledge0
DAGPrompT: Pushing the Limits of Graph Prompting with a Distribution-aware Graph Prompt Tuning ApproachCode0
Pilot: Building the Federated Multimodal Instruction Tuning Framework0
How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization0
LLM4WM: Adapting LLM for Wireless Multi-Tasking0
Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana0
Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text0
Collective inference of the truth of propositions from crowd probability judgments0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration0
The Scaling Law for LoRA Base on Mutual Information Upper Bound0
MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning0
KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities0
scReader: Prompting Large Language Models to Interpret scRNA-seq Data0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure0
Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems0
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning0
What Makes Cryptic Crosswords Challenging for LLMs?Code0
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation0
Show:102550
← PrevPage 7 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified