Text-Guided Human Image Manipulation via Image-Text Shared Space


Text is a new way to guide human image manipulation. Albeit natural and flexible, text usually suffers from inaccuracy in spatial description, ambiguity in the description of appearance, and incompleteness. We in this paper address these issues. To overcome inaccuracy, we use structured information (e.g., poses) to help identify correct location to manipulate, by disentangling the control of appearance and spatial structure. Moreover, we learn the image-text shared space with derived disentanglement to improve accuracy and quality of manipulation, by separating relevant and irrelevant editing directions for the textual instructions in this space. Our model generates a series of manipulation results by moving source images in this space with different degrees of editing strength. Thus, to reduce the ambiguity in text, our model generates sequential output for manual selection. In addition, we propose an efficient pseudo label loss to enhance editing performance when the text is incomplete. We evaluate our method on various datasets and show its precision and interactiveness to manipulate human images.

In IEEE Transation on Pattern Analysis and Machine Intelligence
Ying-Cong Chen
Ying-Cong Chen
Assistant Professor

Ying-Cong Chen is an Assistant Professor at AI Thrust, Information Hub of Hong Kong University of Science and Technology (Guangzhou), and also an affliliated Assistant Professor at Department of Computer Science and Engineering, Hong Kong University of Science and Technology.