SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 201250 of 399 papers

TitleStatusHype
What Would You Ask When You First Saw a^2+b^2=c^2? Evaluating LLM on Curiosity-Driven Questioning0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small ModelsCode0
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning0
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian0
Constructing Enhanced Mutual Information for Online Class-Incremental Learning0
An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph0
Quantized Prompt for Efficient Generalization of Vision-Language ModelsCode0
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era0
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects0
Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian0
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation0
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning0
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Leveraging Large Language Models for enhanced personalised user experience in Smart Homes0
Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA0
Exploring Safety-Utility Trade-Offs in Personalized Language Models0
Are Large Language Models a Good Replacement of Taxonomies?Code0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Avoiding Copyright Infringement via Large Language Model UnlearningCode0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Learning from Natural Language Explanations for Generalizable Entity Matching0
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers0
ContextFlow++: Generalist-Specialist Flow-based Generative Models with Mixed-Variable Context EncodingCode0
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge0
MoST: Multi-modality Scene Tokenization for Motion Prediction0
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs0
Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation0
Evaluating Consistency and Reasoning Capabilities of Large Language Models0
Learning Electromagnetic Metamaterial Physics With ChatGPT0
When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering0
Pretraining and Updates of Domain-Specific LLM: A Case Study in the Japanese Business Domain0
Knowledge graphs for empirical concept retrievalCode0
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful KnowledgeCode0
Juru: Legal Brazilian Large Language Model from Reputable Sources0
Are LLMs Good Cryptic Crossword Solvers?0
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning0
Deep Prompt Multi-task Network for Abuse Language Detection0
K-Link: Knowledge-Link Graph from LLMs for Enhanced Representation Learning in Multivariate Time-Series Data0
Pruning neural network models for gene regulatory dynamics using data and domain knowledgeCode0
Bootstrapping Cognitive Agents with a Large Language Model0
Inductive Graph Alignment Prompt: Bridging the Gap between Graph Pre-training and Inductive Fine-tuning From Spectral Perspective0
GALA: Generating Animatable Layered Assets from a Single Scan0
INCPrompt: Task-Aware incremental Prompting for Rehearsal-Free Class-incremental Learning0
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling0
A Unified Industrial Large Knowledge Model Framework in Industry 4.0 and Smart Manufacturing0
Fed-CO2: Cooperation of Online and Offline Models for Severe Data Heterogeneity in Federated LearningCode0
Show:102550
← PrevPage 5 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified