TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model

2025-07-08Unverified0· sign in to hype

Yujie Hu, Xuanyu Zhang, Weiqi Li, Jian Zhang

Unverified — Be the first to reproduce this paper.

Abstract

Virtual try-on has made significant progress in recent years. This paper addresses how to achieve multifunctional virtual try-on guided solely by text instructions, including full outfit change and local editing. Previous methods primarily relied on end-to-end networks to perform single try-on tasks, lacking versatility and flexibility. We propose TalkFashion, an intelligent try-on assistant that leverages the powerful comprehension capabilities of large language models to analyze user instructions and determine which task to execute, thereby activating different processing pipelines accordingly. Additionally, we introduce an instruction-based local repainting model that eliminates the need for users to manually provide masks. With the help of multi-modal models, this approach achieves fully automated local editings, enhancing the flexibility of editing tasks. The experimental results demonstrate better semantic consistency and visual quality compared to the current methods.

Tasks

Language Modeling Language Modelling Large Language Model Multimodal Large Language Model Virtual Try-on

TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model

Abstract

Tasks

Reproductions