SOTAVerified

World Knowledge

Papers

Showing 51100 of 818 papers

TitleStatusHype
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about ChangeCode2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in VideosCode2
On Softmax Direct Preference Optimization for RecommendationCode2
Agent Planning with World Knowledge ModelCode2
MMLU-CF: A Contamination-free Multi-task Language Understanding BenchmarkCode2
ConTextTab: A Semantics-Aware Tabular In-Context LearnerCode2
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsCode2
RETA-LLM: A Retrieval-Augmented Large Language Model ToolkitCode2
Learnable Item Tokenization for Generative RecommendationCode2
MeaCap: Memory-Augmented Zero-shot Image CaptioningCode2
ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital HumanCode2
A Survey on Knowledge Graphs: Representation, Acquisition and ApplicationsCode2
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied AgentsCode2
GreaseLM: Graph REASoning Enhanced Language Models for Question AnsweringCode2
A Synthetic Dataset for Personal Attribute InferenceCode2
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text EnvironmentsCode2
Aligning AI With Shared Human ValuesCode2
Language Representations Can be What Recommenders Need: Findings and PotentialsCode2
Measuring Massive Multitask Language UnderstandingCode2
Is ChatGPT a Good Recommender? A Preliminary StudyCode1
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational GraphsCode1
Can OOD Object Detectors Learn from Foundation Models?Code1
Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge GraphsCode1
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval AugmentationCode1
SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object DetectorCode1
ASER: A Large-scale Eventuality Knowledge GraphCode1
Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive SummarizationCode1
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name RecognitionCode1
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value AdaptorsCode1
InGram: Inductive Knowledge Graph Embedding via Relation GraphsCode1
Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open WorldsCode1
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial ApplicationCode1
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent AdvancesCode1
AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic FrameworkCode1
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] TokenCode1
Breaking NLI Systems with Sentences that Require Simple Lexical InferencesCode1
A-OKVQA: A Benchmark for Visual Question Answering using World KnowledgeCode1
HeadlineCause: A Dataset of News Headlines for Detecting CausalitiesCode1
Can LLMs' Tuning Methods Work in Medical Multimodal Domain?Code1
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question AnsweringCode1
Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?Code1
Imagine This! Scripts to Compositions to VideosCode1
Knowledge Editing through Chain-of-ThoughtCode1
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data ClassificationCode1
Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World KnowledgeCode1
F-ViTA: Foundation Model Guided Visible to Thermal TranslationCode1
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning ChainsCode1
GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task AssistantsCode1
BLADE: Benchmarking Language Model Agents for Data-Driven ScienceCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Show:102550
← PrevPage 2 of 17Next →

No leaderboard results yet.