| Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources | Apr 1, 2025 | GPULarge Language Model | —Unverified | 0 | 0 |
| Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy | Feb 27, 2025 | Large Language ModelMinecraft | —Unverified | 0 | 0 |
| Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training | Mar 31, 2025 | GPULanguage Modeling | —Unverified | 0 | 0 |
| ORQA: A Benchmark and Foundation Model for Holistic Operating Room Modeling | May 19, 2025 | Graph GenerationKnowledge Distillation | —Unverified | 0 | 0 |
| OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography | Aug 30, 2024 | Computed Tomography (CT)Diagnostic | —Unverified | 0 | 0 |
| PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis | Aug 18, 2024 | Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA) | —Unverified | 0 | 0 |
| Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin | Jun 5, 2025 | Large Language ModelMorphological Analysis | —Unverified | 0 | 0 |
| PHRASED: Phrase Dictionary Biasing for Speech Translation | Jun 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks | Mar 6, 2025 | document understandingLanguage Modeling | —Unverified | 0 | 0 |
| Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model | Apr 9, 2025 | Image Quality AssessmentImage Restoration | —Unverified | 0 | 0 |