| From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation | Nov 21, 2023 | Explanation GenerationVisual Question Answering (VQA) | —Unverified | 0 |
| SelfEval: Leveraging the discriminative nature of generative models for evaluation | Nov 17, 2023 | AttributeVisual Reasoning | —Unverified | 0 |
| The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task | Nov 15, 2023 | Visual Reasoning | —Unverified | 0 |
| Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method | Nov 14, 2023 | ARCDimensionality Reduction | CodeCode Available | 0 |
| Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels | Nov 12, 2023 | PathfinderVisual Reasoning | —Unverified | 0 |
| Visual Commonsense based Heterogeneous Graph Contrastive Learning | Nov 11, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Towards A Unified Neural Architecture for Visual Recognition and Reasoning | Nov 10, 2023 | Objectobject-detection | —Unverified | 0 |
| Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting | Oct 28, 2023 | RelationVisual Reasoning | —Unverified | 0 |
| OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning | Oct 28, 2023 | Data AugmentationOut-of-Distribution Generalization | —Unverified | 0 |
| ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese | Oct 27, 2023 | Information RetrievalNatural Language Queries | CodeCode Available | 0 |
| Multimodal Representations for Teacher-Guided Compositional Visual Reasoning | Oct 24, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Superpixel Semantics Representation and Pre-training for Vision-Language Task | Oct 20, 2023 | Self-Supervised LearningSuperpixels | —Unverified | 0 |
| Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Sep 21, 2023 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 0 |
| Visual Question Answering in the Medical Domain | Sep 20, 2023 | Contrastive LearningMedical Visual Question Answering | —Unverified | 0 |
| A Continual Learning Paradigm for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks | Sep 18, 2023 | Continual LearningVisual Reasoning | —Unverified | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| On the Potential of CLIP for Compositional Logical Reasoning | Aug 30, 2023 | Logical ReasoningVisual Reasoning | —Unverified | 0 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories | Aug 21, 2023 | ClassificationClustering | —Unverified | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning | Aug 18, 2023 | Visual Reasoning | —Unverified | 0 |
| Multimodal Analysis Of Google Bard And GPT-Vision: Experiments In Visual Reasoning | Aug 17, 2023 | Common Sense ReasoningOptical Character Recognition | —Unverified | 0 |
| Learning logic programs by discovering higher-order abstractions | Aug 16, 2023 | Inductive logic programmingProgram Synthesis | CodeCode Available | 0 |
| Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices | Aug 12, 2023 | Visual Reasoning | CodeCode Available | 0 |
| Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks | Jul 31, 2023 | Image RetrievalObject | —Unverified | 0 |
| LOIS: Looking Out of Instance Semantics for Visual Question Answering | Jul 26, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Grounded Object Centric Learning | Jul 18, 2023 | ObjectObject Discovery | —Unverified | 0 |
| Does Visual Pretraining Help End-to-End Reasoning? | Jul 17, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Abstracting Concept-Changing Rules for Solving Raven's Progressive Matrix Problems | Jul 15, 2023 | Answer GenerationAnswer Selection | —Unverified | 0 |
| Look, Remember and Reason: Grounded reasoning in videos with language models | Jun 30, 2023 | Objectobject-detection | —Unverified | 0 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 |
| PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture | Jun 26, 2023 | Visual ReasoningZero-shot Generalization | —Unverified | 0 |
| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 |
| V-LoL: A Diagnostic Dataset for Visual Logical Learning | Jun 13, 2023 | DiagnosticLogical Reasoning | CodeCode Available | 0 |
| A Domain-Independent Agent Architecture for Adaptive Operation in Evolving Open Worlds | Jun 9, 2023 | MinecraftVisual Reasoning | —Unverified | 0 |
| Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding | Jun 9, 2023 | Few-Shot Learningimage-classification | CodeCode Available | 0 |
| Systematic Visual Reasoning through Object-Centric Relational Abstraction | Jun 4, 2023 | ObjectSystematic Generalization | CodeCode Available | 0 |
| Simple Token-Level Confidence Improves Caption Correctness | May 11, 2023 | HallucinationImage Captioning | —Unverified | 0 |
| Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs | May 10, 2023 | Scene UnderstandingVisual Reasoning | —Unverified | 0 |
| Visual Transformation Telling | May 3, 2023 | Dense Video CaptioningVideo Captioning | CodeCode Available | 0 |
| The role of object-centric representations, guided attention, and external memory on generalizing visual relations | Apr 14, 2023 | RelationVisual Reasoning | —Unverified | 0 |
| Boosting Cross-task Transferability of Adversarial Patches with Visual Relations | Apr 11, 2023 | Image CaptioningObject Recognition | —Unverified | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 |
| Explainable AI And Visual Reasoning: Insights From Radiology | Apr 6, 2023 | DiagnosticExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Navigating to Objects Specified by Images | Apr 3, 2023 | NavigateVisual Reasoning | —Unverified | 0 |
| Curriculum Learning for Compositional Visual Reasoning | Mar 27, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| 3D Concept Learning and Reasoning from Multi-View Images | Mar 20, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation | Mar 10, 2023 | Image Generationmultimodal generation | CodeCode Available | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| Abstract Visual Reasoning Enabled by Language | Mar 7, 2023 | ARCVisual Reasoning | —Unverified | 0 |