| OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions | May 27, 2025 | Audio-Visual SynchronizationConversational Response Generation | —Unverified | 0 |
| Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models | May 26, 2025 | image-classificationImage Classification | CodeCode Available | 0 |
| What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models | May 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes | May 26, 2025 | DeepFake DetectionFace Generation | —Unverified | 0 |
| MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | May 26, 2025 | Image RetrievalLarge Language Model | —Unverified | 0 |
| OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model | May 25, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning | May 23, 2025 | Large Language ModelMultimodal Large Language Model | —Unverified | 0 |
| LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning | May 22, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification | May 21, 2025 | Data AugmentationLarge Language Model | —Unverified | 0 |
| Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval | May 21, 2025 | AttributeImage Retrieval | —Unverified | 0 |