Previous Master Project: Text-Driven Manipulation for Disentangled Representation of Fashion Imagery
The project will focus on a specific multi-modal fashion image synthesis task that
enhances more flexibility to control the generated images. To make it more concrete,
imagine a scenario to change the corresponding area of clothes upon semantics while
other areas do not vary accordingly. This problem can be interpreted as a task that
generates multi-modal results at the semantic level with other semantic parts untouched,
called Semantically Multimodal Image Synthesis (SMIS) [1].
This study feeds into compelling research that tends to control what we modify over
different aspects of fashion images through SMIS, according to [1]. The student will
control low-level variations like color, sewing pattern pieces with text-to-image
synthesis over generated images. It is supposed to carry out this problem by Natural
Language Processing (NLP), either Contrastive Language-Image Pre-training (CLIP)
model [2] or DALL.E [3] for MPV (Multi-Pose Virtual Try-on) dataset in a plausible
way.
Required qualifications: Proficiency in Machine Learning and Data Science.
Applicants are expected to have passed KTH courses such as Machine Learning, Project
Course in Data Science, Generative Adversarial Networks, or equivalent. Confidence
in Python, Tensorflow, NLP is a merit.
References:
[1] Zhu Z, Xu Z, You A, Bai X. Semantically multi-modal image synthesis. InProceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 5467-
5476).
[2] Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D. Styleclip: Text-driven
manipulation of stylegan imagery. InProceedings of the IEEE/CVF International Conference
on Computer Vision 2021 (pp. 2085-2094).
[3] Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I. Zero-shot
text-to-image generation. arXiv preprint arXiv:2102.12092. 2021 Feb 24.