ICA applied to color and stereo image data

Patrik Hoyer and Aapo Hyvärinen

The above results, on the link between simple cell receptive fields and ICA applied to natural images, were first demonstrated for static, monochrome (gray-scale) images. Subsequently, it has been shown that the features learned from dynamic image data (video) also strongly resemble simple-cell receptive fields (see, e.g. van Hateren and Ruderman, 1998; Olshausen 2000). The goal of this project was to extend the analysis to consider the effect of chromatic and stereo information.

As data, we used a set of natural images in full colour. Three images from this dataset are shown below:

As with the standard ICA on image data approach, we model our data (small image patches from the images) by a linear model, and estimate the basis vectors that give sparse, independent stochastic coefficients. The only difference is that we have three channels (RGB) instead of one (brightness):

Of course, these can still be displayed normally by superimposing the channels, and that is how we display them subsequently.

Having sampled a large number of image patches from our natural scene data, we estimate the linear ICA model given above, and visualize the basis vectors:

Examining the basis closely reveals that the features found are very similar to earlier results on monochrome image data, i.e. the basis patches resemble Gabor functions. The decomposition clearly separates 3 different channels: red-green, blue-yellow, and monochrome features.

How do these features compare with V1 receptive fields, in terms of colour coding? This is a difficult question, as there are quite conflicting results on the chromatic dimension of simple-cell receptive fields. However, the ICA representation does seem to be in agreement with many physiological findings. See our paper (Hoyer and Hyvärinen, 2000) for some discussion on this issue.

To see if the ICA model can account for binocular properties of simple-cells in V1, we estimated the model from a set of stereo images of natural scenes. One such image is shown below:

The left image should be seen with the left eye, and the right image with the right eye (i.e. uncrossed viewing). Note the subtle but important differences in the images due to the three-dimensional viewing geometry.

We simulate fixations (by finding matching points), and then sample corresponding image patches in the two images. This gives a distribution of disparities (centred on zero) in the data. We then model this data by the linear ICA model (here, the top patch is the patch from the left-eye image, and the bottom patch is the corresponding one from the right-eye image):

Estimating the model leads to the following kinds of features:

Each pair of patches (horizontal neighbours) corresponds to one basis vector. It is readily seen that the features exhibit varying degrees of 'ocular dominance': some basis vectors code for features present in one eye only, whereas others code for features present in both views. Also, our features have inter-ocularly matched preferred spatial frequency and orientation. This is quite similar to the representation in the visual cortex. Finally, our features show some selectivity to disparity as well: some represent zero disparities, some positive disparities, and some negative ones.

References

 


Last updated on 23 Jan 2009 by Aapo Hyvärinen - Page created on 13 Jan 2007 by Webmaster