SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 151200 of 399 papers

TitleStatusHype
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding0
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models0
FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM0
Enabling Autonomic Microservice Management through Self-Learning Agents0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Sample-Efficient Behavior Cloning Using General Domain Knowledge0
DAGPrompT: Pushing the Limits of Graph Prompting with a Distribution-aware Graph Prompt Tuning ApproachCode0
Pilot: Building the Federated Multimodal Instruction Tuning Framework0
How to Complete Domain Tuning while Keeping General Ability in LLM: Adaptive Layer-wise and Element-wise Regularization0
Generating Diverse Q&A Benchmarks for RAG Evaluation with DataMorgana0
LLM4WM: Adapting LLM for Wireless Multi-Tasking0
Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text0
Collective inference of the truth of propositions from crowd probability judgments0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration0
The Scaling Law for LoRA Base on Mutual Information Upper Bound0
MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning0
KnowRA: Knowledge Retrieval Augmented Method for Document-level Relation Extraction with Comprehensive Reasoning Abilities0
scReader: Prompting Large Language Models to Interpret scRNA-seq Data0
Survey on Abstractive Text Summarization: Dataset, Models, and MetricsCode0
Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure0
Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems0
MoSLD: An Extremely Parameter-Efficient Mixture-of-Shared LoRAs for Multi-Task Learning0
What Makes Cryptic Crosswords Challenging for LLMs?Code0
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation0
Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey0
GOT4Rec: Graph of Thoughts for Sequential Recommendation0
GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning0
Efficient Transfer Learning for Video-language Foundation ModelsCode0
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMsCode0
Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet?0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
Extracting Unlearned Information from LLMs with Activation Steering0
Evaluating Company-specific Biases in Financial Sentiment Analysis using Large Language Models0
A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service RoboticsCode0
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
Fast constrained sampling in pre-trained diffusion models0
Should We Really Edit Language Models? On the Evaluation of Edited Language ModelsCode0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Large Language Models as a Tool for Mining Object Knowledge0
Enhance Graph Alignment for Large Language Models0
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation0
Thinking LLMs: General Instruction Following with Thought Generation0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Distribution-aware Noisy-label Crack SegmentationCode0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation0
Mars: Situated Inductive Reasoning in an Open-World Environment0
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified