| OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions | May 27, 2025 | Audio-Visual SynchronizationConversational Response Generation | —Unverified | 0 |
| What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models | May 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging | May 26, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes | May 26, 2025 | DeepFake DetectionFace Generation | —Unverified | 0 |
| Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | May 26, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | May 26, 2025 | Image RetrievalLarge Language Model | —Unverified | 0 |
| Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion | May 26, 2025 | DenoisingImage Generation | CodeCode Available | 1 |
| OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model | May 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning | May 23, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding | May 22, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |