Skip to main content

Scale-Space Theory


Scale-space theory with applications: Selected publications sorted by subject


Review articles

Basic theory of scale-space representation

Axiomatic theories for continuous and discrete scale-space as well as foveal scale-space. General theoretical framework for modelling the deep structure of how image features are related over scales and forhow to measure the lifelength of image structures over scales with general validity for both continuous and discrete signals.

Computational modelling of visual receptive fields

Cell recordings of neurons in the primary visual cortex (V1) have shown that mammalian vision has developed receptive fields tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time. We show how such families of idealized receptive field profiles can be derived mathematically from a small set of basic assumptions that correspond to structural properties of the environment. We also show how basic invariance properties of a visual system can be obtained already at the level of receptive fields, and that we can explain the different shapes of receptive field profiles found in biological vision from a requirement that the visual system should be able to be covariant or invariant to the natural types of image transformations that occur in the environment.

Computational modelling of auditory receptive fields

A scale-space theory is developed for auditory signals, showing how temporal and spectro-tempioral receptive fields can be derived by necessity and with good qualitative similarity to biological receptive fields in the inferior colliculus (ICC) and primary auditory cortex (A1).

Feature detection, automatic scale selection and scale-invariant image features

Feature detection methods based on the combination of Gaussian derivative operators at multiple scales. Special focus is given to the problem of scale selection, in order to adapt the local scales of processing to the local image structure. Specifically, the notion of automatic scale selection based on local extrema over scales of gamma-normalized derivatives makes it possible to define scale-invariant image features. The use of such scale-invariant image features allows the vision system to automatically handle the unknown scale variations that may occur in real-world image data, due to objects of different physical size as well as objects with different distances to the camera.

This theory, which includes the definition of scale-invariant feature detectors from scale-space extrema of the scale normalized Laplacian and the scale normalized determinant of the Hessian, constitutes the theoretical basis for the scale-invariant properties of the SIFT and SURF descriptors. The differences-of-Gaussians operator in the SIFT descriptor can be seen as an approximation of the scale normalized Laplacian and the blob detector in the SURF descriptor can be seen as an approximation of the scale-normalized determinant of the Hessian, with the underlying second-order Gaussian derivative operators replaced by Haar wavelets. In addition, we have proposed additional scale-invariant interest point detectors based other Hessian feature strength measures and a scale-invariant corner detector based on the scale-normalized rescaled level curve curvature of level curves.

Object recognition

Approaches to object recognition based on histograms of receptive field responses computed based on the scale-space framework.

Deep networks

Deep networks that handle scaling transformations and other natural image transformations in a theoretically well-founded manner, preferably in terms of provable covariance and invariance properties.

Multi-scale processing of temporal data including temporal and spatio-temporal scale-space as well as temporal scale selection

Temporal and spatio-temporal scale-space concepts as well as methods for temporal and spatio-temporal scale selection.

Video analysis

Methods for video analysis based on histograms of spatio-temporal receptive field responses, computed based on the scale-space framework and with a fully time-causal and time-recursive image operations over the temporal domain.

Spatio-temporal image features, image descriptors, velocity adaptation and Galilean diagonalization with application to recognition of motion patterns, human actions and spatio-temporal events

Direct methods for recognizing spatio-temporal events with associated activities based on the local spatio-temporal image structure, without explicit inclusion of tracking mechanisms or other temporal trajectories. To handle a priori unknown relative motions relative to the observer, a general notion ofl ocal velocity adaptation is introduced. For parameterizing the spatio-temporal second-moment matrix/structure tensor and other related spatio-temporal image descriptors, we propose the notion of Galilean diagonalization, which gives a much more natural parameterization of purely spatial components and combined spatio-temporal relations compared to previous approaches in terms of eigenvalues that correspond to a non-physical rotation of space-time. These works also include the first formulation of local scale-adapted histograms of spatio-temporal gradients and optic flow, which can be seen as generalizations of the SIFT descriptor from space to space-time.

Estimation of affine image deformations and direct computation of cues to surface shape including the theories for multi-scale second moment matrices/structure tensors and affine shape adaptation

Theories and algorithms for shape from texture and shape from disparity gradients based on local affine deformations of 2-D brightness patterns. Specifically, this framework includes a theory for local affine normalization of local image descriptors by affine shape adaptation, which makes it possible to define affine invariant image features and to perform affine invariant image and feature matching. These papers also outline the theory for multi-scale second-moment matrices, also referred to as multi-scale structure tensors.

Structure and motion estimation (including visual control based on the 3-D hand mouse)

Methods for computing 3-D structure and motion from rigid point and line configurations that are projected from 3-D to 2-D using an affine projection model. The papers and the patent applications also show how the motion of a controlled object A can be controlled using motion estimates that are computed by visually observing another controlling object B (visual servoing), which we used for developing methods for human-computer interaction by 2-D and 3-D hand gestures.

Hand tracking and gesture recognition

Methods for real-time tracking of hand motions and recognition of hand poses based on scale-invariant image features, including the use of hand gestures for controlling other equipment using no other interface equipment than the user's own hand gestures. A real-time prototype system was demonstrated for the general public already in 2001, to try by themselves to experience how it is like to control other objects at distance using just hand gestures.

Medical image analysis

Methods ford etecting brain activations in functional PET images and for automatically segmenting the brain from other tissue in an MRI image of a human head. In the European project Neurogenerator, we also developed a database with functional PET and fMRI images and cytoarchitectonically classified anatomical regions in the brain, including tools for metaanalysis to relate the functionally activated regions from different tasks to corresponding cytoarchitectonically defined neuroanatomical regions in the brain.


Applications of scale-space techniques to different types of more specific computer vision problems:

External links

Further reading

Further publications on these and related topics are available from: