| Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | Aug 5, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions | Mar 20, 2025 | 2D Object DetectionDistributed Computing | CodeCode Available | 1 |
| Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning | Nov 18, 2024 | AttributeCompositional Zero-Shot Learning | CodeCode Available | 1 |
| LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge | Nov 20, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Voice Jailbreak Attacks Against GPT-4o | May 29, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations | Dec 9, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection | Dec 20, 2024 | Cancer ClassificationChatbot | CodeCode Available | 1 |
| Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics | Jan 16, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks | Oct 24, 2024 | image-classificationImage Classification | —Unverified | 0 |
| DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation | Dec 10, 2024 | Image GenerationLanguage Modelling | —Unverified | 0 |