| Improving Selective Visual Question Answering by Learning from Your Peers | Jun 14, 2023 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| Scalable Neural-Probabilistic Answer Set Programming | Jun 14, 2023 | Probabilistic ProgrammingQuestion Answering | CodeCode Available | 1 |
| Global and Local Semantic Completion Learning for Vision-Language Pre-training | Jun 12, 2023 | cross-modal alignmentImage-text Retrieval | CodeCode Available | 1 |
| Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark | Jun 10, 2023 | Image-text RetrievalMedical Report Generation | CodeCode Available | 1 |
| Modular Visual Question Answering via Code Generation | Jun 8, 2023 | Code GenerationIn-Context Learning | CodeCode Available | 1 |
| An Approach to Solving the Abstraction and Reasoning Corpus (ARC) Challenge | Jun 6, 2023 | ARCQuestion Answering | CodeCode Available | 1 |
| Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! | Jun 6, 2023 | counterfactualData Augmentation | CodeCode Available | 1 |
| Revisiting the Role of Language Priors in Vision-Language Models | Jun 2, 2023 | Image-text matchingImage-text Retrieval | CodeCode Available | 1 |
| Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models | May 31, 2023 | Cross-Modal RetrievalQuestion Answering | CodeCode Available | 1 |
| Multi-Scale Attention for Audio Question Answering | May 29, 2023 | Audio Question AnsweringQuestion Answering | CodeCode Available | 1 |
| CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers | May 27, 2023 | Image CaptioningImage Retrieval | CodeCode Available | 1 |
| Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models | May 24, 2023 | document understandingImage Captioning | CodeCode Available | 1 |
| The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models | May 24, 2023 | Language ModellingMath | CodeCode Available | 1 |
| MemeCap: A Dataset for Captioning and Interpreting Memes | May 23, 2023 | Image CaptioningMeme Captioning | CodeCode Available | 1 |
| What Makes for Good Visual Tokenizers for Large Language Models? | May 20, 2023 | Image CaptioningObject Counting | CodeCode Available | 1 |
| VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | May 20, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner | May 19, 2023 | Dense CaptioningImage Captioning | CodeCode Available | 1 |
| MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts | May 18, 2023 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 1 |
| PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering | May 17, 2023 | BenchmarkingDiagnostic | CodeCode Available | 1 |
| What You See is What You Read? Improving Text-Image Alignment Evaluation | May 17, 2023 | Image GenerationImage to text | CodeCode Available | 1 |
| Combo of Thinking and Observing for Outside-Knowledge VQA | May 10, 2023 | DecoderQuestion Answering | CodeCode Available | 1 |
| Vision-Language Models in Remote Sensing: Current Progress and Future Trends | May 9, 2023 | Image CaptioningImage Generation | CodeCode Available | 1 |
| Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining | Apr 26, 2023 | cross-modal alignmentMedical Visual Question Answering | CodeCode Available | 1 |
| A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering | Apr 26, 2023 | DecoderKnowledge Distillation | CodeCode Available | 1 |
| SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery | Apr 19, 2023 | Question AnsweringScene Segmentation | CodeCode Available | 1 |