SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 151200 of 399 papers

TitleStatusHype
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery0
Advancing Retrieval-Augmented Generation for Persian: Development of Language Models, Comprehensive Benchmarks, and Best Practices for Optimization0
A Dynamic Approach to Probabilistic Inference0
A Factoid Question Answering System for Vietnamese0
A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming0
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking0
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era0
An Adaptive Deep Learning Framework for Day-ahead Forecasting of Photovoltaic Power Generation0
An Ad-hoc graph node vector embedding algorithm for general knowledge graphs using Kinetica-Graph0
Analysis of Watson's Strategies for Playing Jeopardy!0
An Energy Ontology for Global City Indicators (ISO 37120)0
AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis0
Applying SoftTriple Loss for Supervised Language Model Fine Tuning0
Are LLMs Good Cryptic Crossword Solvers?0
Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems0
A Self-Supervised Learning of a Foundation Model for Analog Layout Design Automation0
Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources0
ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation0
Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis0
A Unified Industrial Large Knowledge Model Framework in Industry 4.0 and Smart Manufacturing0
Autonomous Intelligent Software Development0
BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization0
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer0
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data0
Boosting LLM Translation Skills without General Ability Loss via Rationale Distillation0
Bootstrapping Cognitive Agents with a Large Language Model0
Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code0
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering0
Learning Electromagnetic Metamaterial Physics With ChatGPT0
Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving0
Collaborative ontology sharing and editing0
Collective inference of the truth of propositions from crowd probability judgments0
Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval0
Comparative Insights from 12 Machine Learning Models in Extracting Economic Ideology from Political Text0
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners0
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis0
Constructing Enhanced Mutual Information for Online Class-Incremental Learning0
Context and Humor: Understanding Amul advertisements of India0
Controversy Rules - Discovering Regions Where Classifiers (Dis-)Agree Exceptionally0
CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation0
DAML-ST5: Low Resource Style Transfer via Domain Adaptive Meta Learning0
Data structuring for the ontological modelling of wind energy systems0
Deep Prompt Multi-task Network for Abuse Language Detection0
Differentially Private Distributed Learning for Language Modeling Tasks0
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning0
Generating Question Relevant Captions to Aid Visual Question Answering0
Distributed Fine-tuning of Language Models on Private Data0
DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified