| Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward | Apr 1, 2024 | Instruction FollowingLanguage Modeling | CodeCode Available | 2 |
| Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Mar 29, 2024 | Instruction FollowingLanguage Modelling | CodeCode Available | 2 |
| Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis | Mar 28, 2024 | Change DetectionLanguage Modelling | CodeCode Available | 2 |
| Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving | Mar 28, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 2 |
| DreamLIP: Language-Image Pre-training with Long Captions | Mar 25, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| RepairAgent: An Autonomous, LLM-Based Agent for Program Repair | Mar 25, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Mar 22, 2024 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| SelfIE: Self-Interpretation of Large Language Model Embeddings | Mar 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Mar 15, 2024 | EgoSchemaForm | CodeCode Available | 2 |