| NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training | Sep 15, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| New Ideas and Trends in Deep Multimodal Content Understanding: A Review | Oct 16, 2020 | Cross-Modal RetrievalDeep Learning | —Unverified | 0 |
| NEWSKVQA: Knowledge-Aware News Video Question Answering | Feb 8, 2022 | Common Sense ReasoningManagement | —Unverified | 0 |
| NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning | Jul 9, 2018 | General ClassificationMachine Translation | —Unverified | 0 |
| Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering | Sep 23, 2019 | Inductive LearningLogical Reasoning | —Unverified | 0 |
| Normalized and Geometry-Aware Self-Attention Network for Image Captioning | Mar 19, 2020 | Image CaptioningMachine Translation | —Unverified | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks | Jan 1, 2018 | MemorizationQuestion Answering | —Unverified | 0 |
| Object-based reasoning in VQA | Jan 29, 2018 | Objectobject-detection | —Unverified | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 |
| Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases | Oct 21, 2024 | ObjectQuestion Answering | —Unverified | 0 |
| OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | Sep 5, 2024 | Autonomous DrivingMotion Planning | —Unverified | 0 |
| OMCAT: Omni Context Aware Transformer | Oct 15, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval | May 10, 2025 | Cross-Modal RetrievalQuestion Answering | —Unverified | 0 |
| On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization | May 24, 2022 | DescriptiveImage Captioning | —Unverified | 0 |
| OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities | Sep 17, 2024 | cross-modal alignmentQuestion Answering | —Unverified | 0 |
| One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering | Nov 4, 2024 | Continual LearningQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints | Sep 25, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints | Sep 30, 2019 | Data AugmentationQuestion Answering | —Unverified | 0 |
| On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study | Oct 4, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| On the Effects of Video Grounding on Language Models | Oct 1, 2022 | Image CaptioningQuestion Answering | —Unverified | 0 |
| On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering | Jan 11, 2022 | POSQuestion Answering | —Unverified | 0 |
| On the Flip Side: Identifying Counterexamples in Visual Question Answering | Jun 3, 2018 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering | Feb 24, 2020 | Question AnsweringReferring Expression | —Unverified | 0 |
| On the Limitations of Vision-Language Models in Understanding Image Transforms | Mar 12, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |