Hoppa till huvudinnehållet
Till KTH:s startsida Till KTH:s startsida

Covariant and invariant deep networks

We study and develop deep networks that handle scaling transformations and other image transformations in a theoretically well-founded manner, preferably in terms of provable covariance and invariance properties.

Specifically, we study the ability of deep networks to generalize to previously unseen scales that are not spanned by the training data. For this purpose, we have developed two classes of scale-covariant and scale-invariant networks, based on either (i) using multiply rescaled continuous filters applied to the original input image or (ii) applying the same primitive discrete deep network to multiple rescalings of the input image.

Gaussian derivative networks

According to the first approach, based on rescaling continuous models of image filters, we have developed scale-channel networks that are constructed by coupling parameterized linear combinations of Gaussian derivatives in cascade, complemented by non-linear ReLU stages in between, and a final stage of max pooling over the different scale channels. Given that the learned parameters in the linear combinations of Gaussian derivatives are shared between the scale channels, the raw scale channels are provably scale covariant. The final stage after max pooling over the scale channels is, in addition, provably scale invariant. Experimentally, we have demonstrated that the approach allows for scale generalization, with good ability to classify image patterns at scales not present in the training data. 

Scale-invariant scale-channel networks

According to the second approach, we have developed a class of scale-channel networks based on applying the same discrete deep network to multiple rescaled copies of the input image, followed by max pooling or average pooling over the scale channels. For such networks, it is also possible to construct formal proofs of scale covariance and scale invariance properties. These foveated network architectures are well able to handle scaling transformations between the training data and the test data over the range of scale factors for which there are supportive scale channels. In our experiments, we handle scaling factors up to 8:

Characterization of invariance properties of spatial transformer networks

We have also performed an in-depth study of the ability of spatial transformer networks to support true invariance properties. First, we have shown that spatial transformers that transform the CNN feature maps do not support true invariance properties for purely spatial transformations of CNN feature maps. Only spatial transformer networks that transform the input allow for true invariance properties. Then, we have performed a systematic study of how these properties affect the classification performance. Specifically, we have investigated different architectures for spatial transformer networks that make use of more complex features for computing the image transformations that transform the input data to a reference frame, and demonstrated that these new spatial transformer architectures lead to better experimental performance:

Scale-covariant biologically inspired hierarchical networks

In earlier studies, we have shown how it is generally possible to define provably scale-covariant hand-crafted networks by coupling scale-space operations in cascade. Specifically, we have studied a sub-class of such networks in more detail, also motivated from biological inspiration, by coupling models of complex cells in terms of quasi quadrature measures in cascade.  Experimentally, we have evaluated these networks on the task of texture classification, in our experiments with scaling transformations for scaling factors up to 4: