SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 101125 of 399 papers

TitleStatusHype
Large Language Models as a Tool for Mining Object Knowledge0
Enhance Graph Alignment for Large Language Models0
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Thinking LLMs: General Instruction Following with Thought Generation0
Distribution-aware Noisy-label Crack SegmentationCode0
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object DetectionCode1
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud LearningCode3
Mars: Situated Inductive Reasoning in an Open-World Environment0
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
Selective Aggregation for Low-Rank Adaptation in Federated LearningCode2
Cascade Prompt Learning for Vision-Language Model AdaptationCode3
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language ModelsCode1
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Aligning Medical Images with General Knowledge from Large Language ModelsCode1
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language ModelsCode1
DIAGen: Diverse Image Augmentation with Generative ModelsCode1
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
Show:102550
← PrevPage 5 of 16Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified