Till KTH:s startsida Till KTH:s startsida

Project Proposals

Estimation of Volumetric Depth from Multiview Video

Capturing multiple images of the same object from different viewpoints permits the estimation of the underlying 3D geometry of the object. Widely used are so-called depth maps that indicate the shortest distance from the camera plane to the object for each pixel in the image. With multiple depth maps from different viewpoints, we have multiple estimates of depth for a same 3D point of the object. In general, these multiple depth values for a unique 3D point of the object are not consistent [1], and result in an ambiguous description of the underlying 3D geometry.

The multiple depth measurements can be used to construct a volumetric description of the underlying geometry of the 3D object [2]. However, this description will not be unique due to the inconsistent depth measurements. Hence, it is a challenge to estimate the volumetric depth in each time instance from multiview video. With multiview video, we have not only the neighboring views at the current time instance, but also neighboring views at past and future time instances. Hence, it is desirable to develop estimation techniques for volumetric depth that use both neighboring views and multiple time instances.

In this project, a group of students will work to design, implement, and evaluate algorithms for estimating volumetric depth data. The design shall accomplish accurate estimates of the volumetric depth. The quality will be assessed by extracting enhanced depth maps from the estimated volumetric depth and by comparing them to the provided ground truth depth maps. The implementation will be evaluated for given data sets. The results will be used to assess your achieved design goals.

For more information on projects in this area, please forward your inquiries to Markus Flierl or Pravin K. Rana.

References:

[1] P.K. Rana and M. Flierl, Depth Consistency Testing for Improved View Interpolation, Proc. IEEE International Workshop on Multimedia Signal Processing, Saint-Malo, France, Oct. 2010.

[2] S. Parthasarathy, A. Chopra, E. Baudin, P.K. Rana, and M. Flierl, Denoising of Volumetric Depth Confidence for View Rendering, Proc. 3DTV-Conference, Zurich, Switzerland, Oct. 2012.

 

Mobile Visual Search using Stereo Features

Visual search allows users interactive and semantic access to real-world objects. With the integration of digital cameras into mobile devices, image-based information retrieval for mobile visual search [1] is developing rapidly. The challenges of mobile image retrieval are rooted in the bandwidth constraint and the limited computational capability of mobile devices.

Usually, one query image is used for mobile visual search and the well-known Scale Invariant Feature Transform (SIFT) [2] is utilized to extract relevant image features. The server receives the extracted and encoded image features from the mobile device, decodes them, and matches them with the stored image features of the image database. For matching, the descriptor vector of each image feature is used. The efficiency of matching is usually improved by random sample consensus (RANSAC) [3].

To improve the success rate of visual search, i.e. the rate of retrieval, we allow queries that use a pair of stereo images to be matched to a set of multiview images of the same object at the server. For that, we extract from the stereo pair so-called stereo features. Now, the challenge is to use the stereo features efficiently to maximize the rate of retrieval.

In this project, a group of students will work to design, implement, and evaluate algorithms for mobile visual search using stereo features. Using our dataset “Stockholm Buildings”, the rate of building retrieval shall be maximized by using stereo feature queries. The implementation will be evaluated for a given test dataset. The results will be used to compare to the rate of retrieval when using monocular query images only.

For more information on projects in this area, please forward your inquiries to Markus Flierl or Haopeng Li.

References:

[1] B. Girod, V. Chandrasekhar, D.M. Chen, N.M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S.S. Tsai and R. Vedantham, Mobile Visual Search, IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 61-76, July 2011.

[2] D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60(2), pp. 91–110, 2004.

[3] M. Fischler and R. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, vol. 24, no. 6, pp. 381–395, June 1981.

 

Assistent Pravin Kumar Rana skapade sidan 27 februari 2013

Pravin Kumar Rana redigerade 14 mars 2013

This page will be updated soonEstimation of Volumetric Depth from Multiview Video Capturing multiple images of the same object from different viewpoints permits the estimation of the underlying 3D geometry of the object. Widely used are so-called depth maps that indicate the shortest distance from the camera plane to the object for each pixel in the image. With multiple depth maps from different viewpoints, we have multiple estimates of depth for a same 3D point of the object. In general, these multiple depth values for a unique 3D point of the object are not consistent [1], and result in an ambiguous description of the underlying 3D geometry.¶

The multiple depth measurements can be used to construct a volumetric description of the underlying geometry of the 3D object [2]. However, this description will not be unique due to the inconsistent depth measurements. Hence, it is a challenge to estimate the volumetric depth in each time instance from multiview video. With multiview video, we have not only the neighboring views at the current time instance, but also neighboring views at past and future time instances. Hence, it is desirable to develop estimation techniques for volumetric depth that use both neighboring views and multiple time instances.¶

In this project, a group of students will work to design, implement, and evaluate algorithms for estimating volumetric depth data. The design shall accomplish accurate estimates of the volumetric depth. The quality will be assessed by extracting enhanced depth maps from the estimated volumetric depth and by comparing them to the provided ground truth depth maps. The implementation will be evaluated for given data sets. The results will be used to assess your achieved design goals.¶

For more information on projects in this area, please forward your inquiries to Markus Flierl or Pravin K. Rana.¶

References: [1] P.K. Rana and M. Flierl, Depth Consistency Testing for Improved View Interpolation, Proc. IEEE International Workshop on Multimedia Signal Processing, Saint-Malo, France, Oct. 2010.¶

[2] S. Parthasarathy, A. Chopra, E. Baudin, P.K. Rana, and M. Flierl, Denoising of Volumetric Depth Confidence for View Rendering, Proc. 3DTV-Conference, Zurich, Switzerland, Oct. 2012.¶

 ¶

Mobile Visual Search using Stereo Features Visual search allows users interactive and semantic access to real-world objects. With the integration of digital cameras into mobile devices, image-based information retrieval for mobile visual search [1] is developing rapidly. The challenges of mobile image retrieval are rooted in the bandwidth constraint and the limited computational capability of mobile devices.¶

Usually, one query image is used for mobile visual search and the well-known Scale Invariant Feature Transform (SIFT) [2] is utilized to extract relevant image features. The server receives the extracted and encoded image features from the mobile device, decodes them, and matches them with the stored image features of the image database. For matching, the descriptor vector of each image feature is used. The efficiency of matching is usually improved by random sample consensus (RANSAC) [3].¶

To improve the success rate of visual search, i.e. the rate of retrieval, we allow queries that use a pair of stereo images to be matched to a set of multiview images of the same object at the server. For that, we extract from the stereo pair so-called stereo features. Now, the challenge is to use the stereo features efficiently to maximize the rate of retrieval.¶

In this project, a group of students will work to design, implement, and evaluate algorithms for mobile visual search using stereo features. Using our dataset “Stockholm Buildings”, the rate of building retrieval shall be maximized by using stereo feature queries. The implementation will be evaluated for a given test dataset. The results will be used to compare to the rate of retrieval when using monocular query images only.¶

For more information on projects in this area, please forward your inquiries to Markus Flierl or Haopeng Li.¶

References: [1] B. Girod, V. Chandrasekhar, D.M. Chen, N.M. Cheung, R. Grzeszczuk, Y. Reznik, G. Takacs, S.S. Tsai and R. Vedantham, Mobile Visual Search, IEEE Signal Processing Magazine, vol. 28, no. 4, pp. 61-76, July 2011.¶

[2] D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60(2), pp. 91–110, 2004.¶

[3] M. Fischler and R. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, vol. 24, no. 6, pp. 381–395, June 1981.¶