Previous Master Project: Text-Driven Manipulation for Disentangled Representation of Fashion Imagery

The project will focus on a specific multi-modal fashion image synthesis task that

enhances more flexibility to control the generated images. To make it more concrete,

imagine a scenario to change the corresponding area of clothes upon semantics while

other areas do not vary accordingly. This problem can be interpreted as a task that

generates multi-modal results at the semantic level with other semantic parts untouched,

called Semantically Multimodal Image Synthesis (SMIS) [1].

This study feeds into compelling research that tends to control what we modify over

different aspects of fashion images through SMIS, according to [1]. The student will

control low-level variations like color, sewing pattern pieces with text-to-image

synthesis over generated images. It is supposed to carry out this problem by Natural

Language Processing (NLP), either Contrastive Language-Image Pre-training (CLIP)

model [2] or DALL.E [3] for MPV (Multi-Pose Virtual Try-on) dataset in a plausible

way.

Required qualifications: Proficiency in Machine Learning and Data Science.

Applicants are expected to have passed KTH courses such as Machine Learning, Project

Course in Data Science, Generative Adversarial Networks, or equivalent. Confidence

in Python, Tensorflow, NLP is a merit.

References:

[1] Zhu Z, Xu Z, You A, Bai X. Semantically multi-modal image synthesis. InProceedings of

the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 (pp. 5467-

5476).

[2] Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D. Styleclip: Text-driven

manipulation of stylegan imagery. InProceedings of the IEEE/CVF International Conference

on Computer Vision 2021 (pp. 2085-2094).

[3] Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I. Zero-shot

text-to-image generation. arXiv preprint arXiv:2102.12092. 2021 Feb 24.

Portfolio