| Automated radiotherapy treatment planning guided by GPT-4Vision | Jun 21, 2024 | In-Context LearningLanguage Modelling | —Unverified | 0 |
| The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge | Jun 18, 2024 | Few-Shot Object DetectionLanguage Modeling | —Unverified | 0 |
| TRINS: Towards Multimodal Language Models that Can Read | Jun 10, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Efficient Indirect LLM Jailbreak via Multimodal-LLM Jailbreak | May 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model | May 28, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | May 27, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | May 24, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | May 23, 2024 | cross-modal alignmentLanguage Modelling | —Unverified | 0 |
| Layout Generation Agents with Large Language Models | May 13, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition | May 7, 2024 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |