2013/TheRipplePondRecognitionNetworkWithSKIM

TheRipplePondRecognitionNetworkUsingSKIM

Project members: Tara Julia Hamilton, Jonathan Tapson, Greg Cohen

The Ripple Pond Network (RPN), is a simply connected spiking neural network that, operating together with the SKIM network enables rapid, unsupervised, scale and rotation invariant object recognition using efficient spatio-temporal spike coding. In this project we implemented the recently submitted (published on arXiv - attached) RPN with the recently published SKIM algorithm to demonstrate rotation invariant recognition of the MNIST handwriting recognition database using a simple jpeg camera and a small, low-power, mircocontroller - the mbed.

The RPN

The image below summarises the operation of the RPN. For more information on how it works, see the attached pre-print from arXiv.

In this project we used the mbed and a simple jpeg camera to capture an image and process it on the static disc. The jpeg image was converted into a 24-bit bitmap image with resolution 40 px by 40 px. This is very small but we were limited by the amount of flash memory on the mbed and the need to convert the jpeg image to a bmp image used a lot of memory. This 40 x 40 image was converted to a monochrome image using averaging across RGB. This is not the most effective method of generating monochrome bitmaps but it is the most computationally efficient. The black and white bitmap is then remapped to the static disc (the points on the spiral arms in the image above map to pixels in the image). This remapping is performed by a look-up table. The black pixels are "on" and are stepped out to the edge of the disc and added to create a temporal pattern (TP) (see image above). This TP is rotationally invariant. Using normalization, we can also make the TP scale invariant, however, due to time and hardware constraints this was not demonstrated here.

SKIM

SKIM is a network that can be trained to learn spatio-temporal patterns. More details on the operation of SKIM are given on the dendritic group main project page. The SKIM was trained with rescaled images (rescaled from 28px x 28px to 40px x 40px) from the MNIST database. Due to time constraints, hardware memory constraints and the time taken to rescale the images we trained the SKIM with a limited dataset of 7s and 0s. Two example, rescaled MNIST images are shown below.

The SKIM network was trained using 21 images for each data class (4s, 7s) from the MNIST database. This network was then implemented in C++ and loaded onto the mbed so that the entire RPN and SKIM were all on the small mbed board.

Results

Preliminary results looked at the effectiveness of the RPN to produce TPs that were separable by the SKIM for different rotations. Below shows the TPs for 6s and 4s taken from the MNIST database at different rotations, and indicates the separability of the TPs.

The images we transformed are given below.

The hardware testing was performed with the following set-up.

The camera only took jpeg images so they were transformed to bmp on the mbed microcontroller. The mbed has very limited memory so only the top left-hand corner of the jpeg image was transformed. The test image set-up, an example jpeg image from the mbed and the corresponding bitmap image are shown below.

An example temporal pattern that corresponds to the bitmap image above, calculated on the microcontroller is given below.

The learning was performed directly on 40px x 40 px images from the MNIST database in Matlab. The images and subsequent transformations into TPs were run through the learned SKIM network for 0s and 7s (we only had time to convert this data) and the results are given below.

Here we see the zeros recognised on the left and in the blue and the sevens being recognised on the right and in the red. We can see that the sevens are recognised with greater precision than the zeros, however, there is clear separation. This recognition was done with images directly from the computer rather than from the camera and the scaling and centring were not carefully adhered to. Also, training of the SKIM occurred on only 21 sets for each class (0 and 7) and therefore with more data (the MNIST database has 40k images!) we would have a much more robust network.

The data collected from the hardware was for all orientations and clearly this is not an issue for recognition, indicating that the RPN with SKIM is indeed rotationally invariant.

This work shows that the RPN is an excellent way to do convert data for view invariance and the SKIM algorithm is ideal for learning noisy, variable data.

Attachments