SOTAVerified

General Knowledge

This task aims to evaluate the ability of a model to answer general-knowledge questions.

Source: BIG-bench

Papers

Showing 150 of 399 papers

TitleStatusHype
Training Compute-Optimal Large Language ModelsCode6
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric PerspectivesCode5
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling CapabilitiesCode3
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud LearningCode3
Cascade Prompt Learning for Vision-Language Model AdaptationCode3
RAGEval: Scenario Specific RAG Evaluation Dataset Generation FrameworkCode3
SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and MoreCode3
Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender EstimationCode3
VoiceBench: Benchmarking LLM-Based Voice AssistantsCode3
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and MultimodalCode3
ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational KnowledgeCode2
A Survey of Personalized Large Language Models: Progress and Future DirectionsCode2
Adapting a Language Model While Preserving its General KnowledgeCode2
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation LearningCode2
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity KnowledgeCode2
ConceptNet 5.5: An Open Multilingual Graph of General KnowledgeCode2
F-LMM: Grounding Frozen Large Multimodal ModelsCode2
Scaling Language Models: Methods, Analysis & Insights from Training GopherCode2
Selective Aggregation for Low-Rank Adaptation in Federated LearningCode2
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment RetrievalCode2
Exploring the Potential of Large Language Models (LLMs) in Learning on GraphsCode2
MMA: Multi-Modal Adapter for Vision-Language ModelsCode2
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language ModelCode2
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language ModelsCode2
Imagine Before Go: Self-Supervised Generative Map for Object Goal NavigationCode2
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language ModelCode2
Continual Pre-training of Language ModelsCode2
LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input ContextsCode2
GeoGalactica: A Scientific Large Language Model in GeoscienceCode1
Generic Knowledge Boosted Pre-training For Remote Sensing ImagesCode1
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language ModelCode1
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model FusionCode1
A Dual-Space Framework for General Knowledge Distillation of Large Language ModelsCode1
Generative Pre-Training from MoleculesCode1
Go From the General to the Particular: Multi-Domain Translation with Domain Transformation NetworksCode1
A New Learning Paradigm for Foundation Model-based Remote Sensing Change DetectionCode1
Few-Shot Class-Incremental Learning via Class-Aware Bilateral DistillationCode1
Better Question-Answering Models on a BudgetCode1
Bert4XMR: Cross-Market Recommendation with Bidirectional Encoder Representations from TransformerCode1
EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion RecognitionCode1
HAE-RAE Bench: Evaluation of Korean Knowledge in Language ModelsCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented GenerationCode1
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic CalibrationCode1
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language ModelsCode1
BEAMetrics: A Benchmark for Language Generation Evaluation EvaluationCode1
DIAGen: Diverse Image Augmentation with Generative ModelsCode1
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report GenerationCode1
Aligning Medical Images with General Knowledge from Large Language ModelsCode1
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Chinchilla-70B (few-shot, k=5)Accuracy94.3Unverified
2Gopher-280B (few-shot, k=5)Accuracy93.9Unverified
3Chinchilla-70B (few-shot, k=5)Accuracy 85.7Unverified
4Gopher-280B (few-shot, k=5)Accuracy 84.8Unverified
5Gopher-280B (few-shot, k=5)Accuracy84.2Unverified
6Gopher-280B (few-shot, k=5)Accuracy 84.1Unverified
7Gopher-280B (few-shot, k=5)Accuracy 83.9Unverified
8Gopher-280B (few-shot, k=5)Accuracy83.3Unverified
9Gopher-280B (few-shot, k=5)Accuracy 81.8Unverified
10Gopher-280B (few-shot, k=5)Accuracy 81Unverified