| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 | 5 |
| PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change | Jun 21, 2022 | Common Sense ReasoningDiversity | CodeCode Available | 2 | 5 |
| MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark | Dec 19, 2024 | MMLUMultiple-choice | CodeCode Available | 2 | 5 |
| Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | May 24, 2024 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 2 | 5 |
| Measuring Massive Multitask Language Understanding | Sep 7, 2020 | Elementary MathematicsMulti-task Language Understanding | CodeCode Available | 2 | 5 |
| Aligning AI With Shared Human Values | Aug 5, 2020 | Ethicsreinforcement-learning | CodeCode Available | 2 | 5 |
| ExpeL: LLM Agents Are Experiential Learners | Aug 20, 2023 | Decision MakingTransfer Learning | CodeCode Available | 2 | 5 |
| Language Representations Can be What Recommenders Need: Findings and Potentials | Jul 7, 2024 | Collaborative FilteringContrastive Learning | CodeCode Available | 2 | 5 |
| Can AI Assistants Know What They Don't Know? | Jan 24, 2024 | MathOpen-Domain Question Answering | CodeCode Available | 2 | 5 |
| A Synthetic Dataset for Personal Attribute Inference | Jun 11, 2024 | AttributeAuthor Profiling | CodeCode Available | 2 | 5 |
| Free-form language-based robotic reasoning and grasping | Mar 17, 2025 | FormRobotic Grasping | CodeCode Available | 2 | 5 |
| A Survey on Knowledge Graphs: Representation, Acquisition and Applications | Feb 2, 2020 | Graph EmbeddingGraph Representation Learning | CodeCode Available | 2 | 5 |
| Learnable Item Tokenization for Generative Recommendation | May 12, 2024 | DiversityWorld Knowledge | CodeCode Available | 2 | 5 |
| MeaCap: Memory-Augmented Zero-shot Image Captioning | Mar 6, 2024 | Caption GenerationImage Captioning | CodeCode Available | 2 | 5 |
| LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments | Jun 24, 2024 | World Knowledge | CodeCode Available | 2 | 5 |
| Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | Jan 18, 2022 | Robot Task PlanningWorld Knowledge | CodeCode Available | 2 | 5 |
| RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit | Jun 8, 2023 | Answer GenerationFact Checking | CodeCode Available | 2 | 5 |
| CapsFusion: Rethinking Image-Text Data at Scale | Oct 31, 2023 | World Knowledge | CodeCode Available | 2 | 5 |
| Grasp-Anything: Large-scale Grasp Dataset from Foundation Models | Sep 18, 2023 | DiversityRobotic Grasping | CodeCode Available | 2 | 5 |
| Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach | Jun 6, 2023 | Decision MakingSequential Decision Making | CodeCode Available | 1 | 5 |
| KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs | Sep 9, 2021 | Common Sense ReasoningLanguage Modelling | CodeCode Available | 1 | 5 |
| Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs | Dec 10, 2024 | Knowledge GraphsRAG | CodeCode Available | 1 | 5 |
| Is ChatGPT a Good Recommender? A Preliminary Study | Apr 20, 2023 | Recommendation SystemsWorld Knowledge | CodeCode Available | 1 | 5 |
| SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector | Dec 14, 2023 | Knowledge DistillationObject | CodeCode Available | 1 | 5 |
| ASER: A Large-scale Eventuality Knowledge Graph | May 1, 2019 | Knowledge GraphsWorld Knowledge | CodeCode Available | 1 | 5 |
| Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models | Apr 9, 2024 | Few-Shot LearningLanguage Modelling | CodeCode Available | 1 | 5 |
| Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models | May 15, 2024 | AI AgentWorld Knowledge | CodeCode Available | 1 | 5 |
| Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors | Nov 20, 2022 | Model EditingWorld Knowledge | CodeCode Available | 1 | 5 |
| Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds | May 27, 2023 | Task PlanningWorld Knowledge | CodeCode Available | 1 | 5 |
| Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation | Jul 20, 2023 | Open-Domain Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | May 7, 2024 | Collaborative FilteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Imagine This! Scripts to Compositions to Videos | Apr 10, 2018 | RetrievalWorld Knowledge | CodeCode Available | 1 | 5 |
| AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework | Aug 26, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition | Oct 8, 2020 | Question AnsweringWorld Knowledge | CodeCode Available | 1 | 5 |
| InGram: Inductive Knowledge Graph Embedding via Relation Graphs | May 31, 2023 | Entity EmbeddingsGraph Embedding | CodeCode Available | 1 | 5 |
| Breaking NLI Systems with Sentences that Require Simple Lexical Inferences | May 6, 2018 | World Knowledge | CodeCode Available | 1 | 5 |
| A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge | Jun 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 1 | 5 |
| Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? | Aug 20, 2023 | Knowledge GraphsWorld Knowledge | CodeCode Available | 1 | 5 |
| How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances | Oct 11, 2023 | World Knowledge | CodeCode Available | 1 | 5 |
| I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token | Dec 9, 2024 | World Knowledge | CodeCode Available | 1 | 5 |
| Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization | Aug 30, 2021 | Abstractive Text SummarizationReinforcement Learning (RL) | CodeCode Available | 1 | 5 |
| Knowledge Editing through Chain-of-Thought | Dec 23, 2024 | knowledge editingWorld Knowledge | CodeCode Available | 1 | 5 |
| Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | May 9, 2024 | Data VisualizationLanguage Modeling | CodeCode Available | 1 | 5 |
| Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge | Apr 6, 2021 | Common Sense ReasoningWorld Knowledge | CodeCode Available | 1 | 5 |
| A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering | Nov 13, 2023 | Decision MakingExplanation Generation | CodeCode Available | 1 | 5 |
| Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | Sep 17, 2024 | Active LearningDiversity | CodeCode Available | 1 | 5 |
| BLADE: Benchmarking Language Model Agents for Data-Driven Science | Aug 19, 2024 | BenchmarkingDecision Making | CodeCode Available | 1 | 5 |
| Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios | May 26, 2023 | counterfactualCounterfactual Reasoning | CodeCode Available | 1 | 5 |
| Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language | Mar 1, 2021 | SentenceWorld Knowledge | CodeCode Available | 1 | 5 |
| CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Sep 27, 2024 | Reinforcement Learning (RL)World Knowledge | CodeCode Available | 1 | 5 |