SOTAVerified

World Knowledge

Papers

Showing 76100 of 818 papers

TitleStatusHype
ASER: A Large-scale Eventuality Knowledge GraphCode1
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive SummarizationCode1
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name RecognitionCode1
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value AdaptorsCode1
InGram: Inductive Knowledge Graph Embedding via Relation GraphsCode1
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open WorldsCode1
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial ApplicationCode1
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent AdvancesCode1
AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic FrameworkCode1
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] TokenCode1
Breaking NLI Systems with Sentences that Require Simple Lexical InferencesCode1
A-OKVQA: A Benchmark for Visual Question Answering using World KnowledgeCode1
HeadlineCause: A Dataset of News Headlines for Detecting CausalitiesCode1
Can LLMs' Tuning Methods Work in Medical Multimodal Domain?Code1
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question AnsweringCode1
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?Code1
Imagine This! Scripts to Compositions to VideosCode1
Knowledge Editing through Chain-of-ThoughtCode1
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data ClassificationCode1
Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World KnowledgeCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning ChainsCode1
GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task AssistantsCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Show:102550
← PrevPage 4 of 33Next →

No leaderboard results yet.