Skip to main content
To KTH's start page To KTH's start page

Correspondence Estimation in Human Face and Posture Images

Time: Fri 2014-10-10 10.00

Location: Kollegiesalen, Brinellvägen 8, KTH, Stockholm

Subject area: Computer Science

Doctoral student: Vahid Kazemi , CVAP

Opponent: Professor Tim Cootes, The University of Manchester, UK

Supervisor: Prof. Stefan Carlsson

Export to calendar

Abstract

Many computer vision tasks such as object detection, pose estimation, and alignment are directly related to the estimation of correspondences over instances of an object class. Other tasks such as image classification and verification if not completely solved can largely benefit from correspondence estimation.

This thesis presents practical approaches for tackling the correspondence estimation problem with an emphasis on deformable objects. Different methods presented in this thesis greatly vary in details but they all use a combination of generative and discriminative modeling to estimate the correspondences from input images in an efficient manner. While the methods described in this work are generic and can be applied to any object, two classes of objects of high importance namely human body and faces are the subjects of our experimentations.

When dealing with human body, we are mostly interested in estimating a sparse set of landmarks – specifically we are interested in locating the body joints. We use pictorial structures to model the articulation of the body parts generatively and learn efficient discriminative models to localize the parts in the image. This is a common approach explored by many previous works. We further extend this hybrid approach by introducing higher order terms to deal with the double-counting problem and provide an algorithm for solving the resulting non-convex problem efficiently. In  another work we explore the area of multi-view pose estimation where we have multiple calibrated cameras and we are interested in determining the pose of a person in 3D by aggregating 2D information. This is done efficiently by discretizing the 3D search space and use the 3D pictorial structures model to perform the inference.

In contrast to the human body, faces have a much more rigid structure and it is relatively easy to detect the major parts of the face such as eyes, nose and mouth, but performing dense correspondence estimation on faces under various poses and lighting conditions is still challenging. In a first work we deal with this variation by partitioning the face into multiple parts and learning separate regressors for each part. In another work we take a fully discriminative approach and learn a global regressor from image to landmarks but to deal with insufficiency of training data we augment it by a large number of synthetic images. While we have shown great performance on the standard face datasets for performing correspondence estimation, in many scenarios the RGB signal gets distorted as a result of poor lighting conditions and becomes almost unusable. This problem is addressed in another work where we explore use of depth signal for dense correspondence estimation. Here again a hybrid generative/discriminative approach is used to perform accurate correspondence estimation in real-time.