SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 101150 of 399 papers

TitleStatusHype
Large Language Models as a Tool for Mining Object Knowledge0
Enhance Graph Alignment for Large Language Models0
MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation0
Thinking LLMs: General Instruction Following with Thought Generation0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Distribution-aware Noisy-label Crack SegmentationCode0
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object DetectionCode1
Mars: Situated Inductive Reasoning in an Open-World Environment0
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud LearningCode3
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
Selective Aggregation for Low-Rank Adaptation in Federated LearningCode2
Cascade Prompt Learning for Vision-Language Model AdaptationCode3
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language ModelsCode1
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Aligning Medical Images with General Knowledge from Large Language ModelsCode1
How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language ModelsCode1
DIAGen: Diverse Image Augmentation with Generative ModelsCode1
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning0
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Code1
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
Can Editing LLMs Inject Harm?Code1
Constructing Enhanced Mutual Information for Online Class-Incremental Learning0
An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph0
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment RetrievalCode2
Quantized Prompt for Efficient Generalization of Vision-Language ModelsCode0
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era0
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects0
Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian0
ElecBench: a Power Dispatch Evaluation Benchmark for Large Language ModelsCode1
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation0
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMsCode1
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning0
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language ModelsCode1
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Leveraging Large Language Models for enhanced personalised user experience in Smart Homes0
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank DecompositionCode1
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
CityBench: Evaluating the Capabilities of Large Language Models for Urban TasksCode1
Exploring Safety-Utility Trade-Offs in Personalized Language Models0
Are Large Language Models a Good Replacement of Taxonomies?Code0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified