| Multimodal Chain-of-Thought Reasoning in Language Models | Feb 2, 2023 | HallucinationLanguage Modelling | CodeCode Available | 4 |
| Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | Feb 5, 2024 | Science Question AnsweringText-to-Video Generation | CodeCode Available | 4 |
| Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs | Aug 23, 2023 | counterfactualQuestion Answering | CodeCode Available | 3 |
| Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding | Nov 14, 2023 | Image-based Generative Performance BenchmarkingLanguage Modeling | CodeCode Available | 2 |
| Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering | Sep 20, 2022 | Multimodal Deep LearningMultimodal Reasoning | CodeCode Available | 2 |
| Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models | May 24, 2023 | ChatbotNatural Language Understanding | CodeCode Available | 2 |
| Honeybee: Locality-enhanced Projector for Multimodal LLM | Dec 11, 2023 | MMEScience Question Answering | CodeCode Available | 2 |
| SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation | May 16, 2024 | Open-Ended Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training | Nov 23, 2023 | Multimodal ReasoningScience Question Answering | CodeCode Available | 1 |
| A Survey on Interpretable Cross-modal Reasoning | Sep 5, 2023 | Cross-Modal RetrievalDecision Making | CodeCode Available | 1 |