SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 176200 of 399 papers

TitleStatusHype
Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey0
GOT4Rec: Graph of Thoughts for Sequential Recommendation0
GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning0
Efficient Transfer Learning for Video-language Foundation ModelsCode0
MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMsCode0
Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet?0
SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents0
Extracting Unlearned Information from LLMs with Activation Steering0
Evaluating Company-specific Biases in Financial Sentiment Analysis using Large Language Models0
A Comparison of Prompt Engineering Techniques for Task Planning and Execution in Service RoboticsCode0
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
Fast constrained sampling in pre-trained diffusion models0
Should We Really Edit Language Models? On the Evaluation of Edited Language ModelsCode0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Large Language Models as a Tool for Mining Object Knowledge0
Enhance Graph Alignment for Large Language Models0
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation0
Thinking LLMs: General Instruction Following with Thought Generation0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Distribution-aware Noisy-label Crack SegmentationCode0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation0
Mars: Situated Inductive Reasoning in an Open-World Environment0
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
Show:102550
← PrevPage 8 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified