Skip to main content

Invariant receptive fields under natural image transformations

When a visual agent observes three-dimensional objects in the world by a two-dimensional light sensor (retina), the image data will be subject to basic image transformations in terms of:

  • local scaling transformations caused by objects of different size and at different distances to the observer,
  • local image deformations caused by variations in the viewing direction relative to the object,
  • local Galilean transformations caused by relative motions between the object and the observer, and
  • local intensity transformations caused by illumination variations.

Nevertheless, we perceive the world as stable and use visual perception based on brightness patterns for inferring properties of objects in the surrounding world.

Figure 2 from Lindeberg (2013) 'A computational theory of visual receptive fields, Biological Cybern

We have developed a general framework for handling such inherent variabilities in visual data because of natural image transformations and for computing invariant (stable) visual representations under these:

Based on symmetry properties of the environment and additional assumptions regarding the internal structure of computations of an idealized vision system, we have formulated a normative theory for visual receptive fields and shown that it is possible to derive families of idealized receptive fields from a requirement that the vision system must have the ability of computing invariant image representations under natural image transformations.

There are very close similarities between the receptive fields predicted from our theory and receptive fields found by cell recordings in mammalian vision, including (i) spatial on-center-off-surround and off-center-on-surround receptive fields in the retina and the LGN, (ii) simple cells with spatial directional preference in V1, (iii) spatio-chromatic double-opponent cells in V1, (iv) space-time separable spatio-temporal receptive fields in the LGN and V1 and (v) non-separable space-time tilted receptive fields in V1.

Figure 11 from Lindeberg (2013) 'Invariance of visual operations at the level of receptive fields',

Thereby, our theory shows that it is possible to predict properties of visual neurons from a principled axiomatic theory. The receptive field families generated by this theory can also constitute a general basis for expressing visual operations for computational modelling of visual processes and for computer vision algorithms.

Specifically, our notions of scale selection based on local extrema over scale of scale-normalized derivative responses, and affine or Galilean normalization by affine shape adaptation or Galilean velocity adaptation, alternatively by detecting affine invariant or Galilean invariant fixed points over filter families in affine or spatio-temporal scale space, provides a general framework for computing scale invariant, affine invariant and Galilean invariant image features and image descriptors both for generic purposes in computer vision and as plausible mechanisms for achieving invariance to natural image transformations in computational models of biological vision.

Figure 12 from Lindeberg (2013) 'Invariance of visual operations at the level of receptive fields',