2010/visrec10

Visual recognition

Projects

Visual recognition using webCam and spiking network

'Qiuyan Peng' 'Vaibhav Garg' John Harris

The static card visual recognition is a procedure in the main loop of Heart game.

Card images of resolution 100*100 are captured by the webcam, and then pre-processed by high pass filtering and thresholding to create the binary image. During the training process, we have 104 classifiers (52 for upward direction and 52 for downward). Each classifier is a neuron connected with 100*100 input synapses. The synapse strengths (weights) are trained once based on all 104 patterns in the training set, excitatory (positive) for input pixel with binary value of 1 and inhibitory (negative) for the others. The weights are also normalized by the number of excitatory/inhibitory synapses respectively.

During the test process, one card image is again captured and converted into binary image. The input spike trains are generated based on the binary image. The input neuron for each pixel fires at a uniform rate of 20Hz if and only if the corresponding pixel binary value is 1, which is the case for those pixels around the edges. Then, the 100*100 spike trains are fed into the 104 trained classifiers, which are modeled by the leaky integrator in the simulation. The match between the input pattern and classifier should make the output neuron easier to fire a spike, and thus should result in a higher firing rate. A simple winner-take-all process is finally applied to find out the most likely card (number and suit). Meanwhile, some probability information based on the output rate is also computed and sent out for the sensory fusion purpose later on.

Here is a figure showing the output rate during testing at each classifier neuron for each input pattern in the training set. We can see that the matched classifier has the highest the output rate.

The classification result for the training set by WTA is shown in the figure below.

When tested with new placed patterns out of training set, the classifier can still recognize almost all of the cards, given the lighting and other environment conditions are similar. The test on newly placed cards can be viewed in the video of the whole card game (link?).

Visual recognition using a jiggling DVS

Daniel Lofaro Tobi Delbruck

Phase coding using a region-based coupled MRF model for visual image processing

'Kazuki Nakada' John Harris

Fast DVS recognizer using FPAER

Marco Rodrigues

The goal of this project was to develop a hardware implemented fast French card recognizer. In this project we used a DVS 128 sensor (1), a pan and tilt unit (for shaking the DVS because it only “see” the motion), a FPAER that is stack of 4 USB-AER boards (2) (USB-AER board is AER board based on a FGPA) and a computer devoted to card classification.

The system works as follow: the DVS sends by a AER parallel cable events that correspond with the motion, these events go to the first USB-AER board that is configured as a AER monitor, then the events go to the next board configured as a background activity filter (this filter is similar to the jAER one (3) but described in VHDL); the third board is devoted to cluster tracking (4); The cluster tracking information is downloaded to the PC, that is running a classifier based on the relative position of the cluster (each cluster corresponds with the card's tips); the classifier returns the most probable card.

The cluster tracker only can track up to 8 objects so only card from As to 8s can be recognized. The performance of the system is shown in the next table.

CardBlackRed
A80 %100%
2100 %100 %
3100 %90 %
4100 %100 %
5100 %90 %
6100 %90 %
770 % 70 %
830 % 20 %

See video demo: https://neuromorphs.net/nm/attachment/wiki/2010/snapc10/FastCardPaco.wmv

References:

(1) F. Gómez-Rodríguez, R. Paz, A. Linares-Barranco, M. Rivas, L. Miró, G. Jiménez, A. Civit. “AER tools for Communications and Debugging”. Proc. IEEE ISCAS06. Kos, Greece, May 2006.

(2) P. Lichtsteiner, et al., "A 128×128 120dB 30mW Asynchronous Vision Sensor that Responds to Relative Intensity Change," ISSCC Dig. of Tech. Papers, San Francisco, 2006, pp. 508-509 (27.9).

(3) jAER:  http://sourceforge.net/apps/trac/jaer/wiki

(4) F. Gómez- Rodríguez, L. Miró-Amarante, F. Diaz-del-Rio, A. Linares-Barranco, G. Jimenez. “Real time objects tracking using a bio-inspired processing cascade architecture” Proc. IEEE ISCAS10. Paris, France, May 2010.




Using a silicon retina for playing card recognition
“System 2: 2D FFT based”
By Daniel M. Lofaro
(Created and conducted at the 2010 Neuromorphic Workshop in Telluride, CO)
2010-07-15
Using a silicon retina for playing card recognition. The goal of this project was to create a system that uses the biologically inspired DVS-128 silicon retina (SR) to detect a playing card and determine the card’s value and suit. The DVS-128 is an event based visual system that detects changes in relative intensity in a live image. An important attribute of this system is that it is asynchronous meaning that it sends event data (changes in the image) back as it happens and not at a fixed frame rate. The silicon retina only detects changes in relative intensity which means it will only “see” moving or changing objects. Thus in order for the system to see the playing card the image needs to be moving. This is similar to how our eyes work. If we stair at an unchanging scene for an extended period of time without eye movement the scene will be perceived as becoming dulled then it will fade away. Our eye’s saccade (quick simultaneous movements of both eyes in the same direction) is believed to help elevate this.

System 2 consists of a DVS-128 silicon retina (simulate of the human retina) attached to a pan tilt unit (simulate of the eye’s saccade). The silicon retina can now rotate on the x and the y axis. The x and y axis are defined as the image plane of the silicon retina’s imaging unit. The system (System 2) can be seen in the image below.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s2/image001.jpg

Above is System 2. On the right of System 2 is the silicon retina with pan-tilt unit (SR-PT). Attached to the SR-PT is a light blocking system (the brown board). This system blocks the overhead lights in the building because they flicker between 100Hz and 120Hz. This frequency is within the SR’s pass band. A constant light source is supplied via a high intensity white light flashlight. The playing card is placed on a monotone background (in this case it is white but any color will surface). The SR-PT will move slightly in reference to the card simulating the eye’s saccade. An image produced in real time by the SR-PT can be seen below.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s2/image002.jpg

The grey pixels represent no events. White pixels represent events where the intensity increased (over a specified relative threshold) and the black pixels represent events where the intensity decreased (over a specified relative threshold). Please note that the frame size of the above image is for 750 events. This means that the image is re drawn with the most recent 750 events.

How to determine the card:

The playing card is determined by:
- Making a template to check the cards against
- Checking the desired card against the template
Making the template:
- Record 10k events for each card
- Make a 2D event array (EA) where the x,y indices are the pixel locations. The x-y values at the indices are the number of times an event occurred at that pixel within the 10k events.
- Normalize the EA
- Take the 512 point 2D FFT on each of the EA (52 in total). This will create a unique 2D frequency spectrum of each card. This is orientation indipendent to the nature of the 2D FFT. It is important to note that each of the four suits (spades, hearts, clubs, diamonds) have different shapes and thus have different 2D FFTs. In addition the number of “pips” on each card will change the 2D FFT for the card as well thus giving each card a unique 2D FFT.
- Normalize the 2D FFT of the EA.
- This set of 2D FFTs on the EA will be known as the gold standard (GS) or template for each of the matching tests.
Check the desired card against the template:
- Record 10k events for the unknown card
- Make a 2D unknown event array (UEA) where the x,y indices are the pixel locations. The x-y values at the indices are the number of times an event occurred at that pixel within the 10k events.
- Take the 512 point 2D FFT of the UEA.
- Normalize the 2D FFT of the UEA.
- Find the error between the UEA and each of the 52 EAs.
- The EA that has the lowest error (i.e. highest correlation) is the unknown card.
An example of the 2D FFTs of playing cards can be found below. The cards shown below are (from top left to bottom right) the 2 of clubs (2C), 2 of spades (2S), 2 of hearts (2H), and the 2 of diamonds (2D). It is important to note the unique shape of each of the 2D FFTs is caused by the different shapes of the suit. Suits with a triangular point (such as spades, hears, and diamonds) will have higher frequency content then more rounded suits such as clubs. Rounded suits will have lower frequency content. Spades and hearts have both rounded and pointed content so they have both high and low frequency content.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s2/image003.jpg

Results: The above method was tested with a GS created from two separate decks of cards. System 2 was also tested with a Gaussian blur applied to the GS. This blur ranged from size 1 to size 5. A test set of 104 cards (different from those used to create the GS) were used to test the identification accuracy. Below is a plot of the percent correct identification vs. Gaussian blur.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s2/image004.jpg

It was found that there was little difference with a blur ranging from 1 to 3. The average correct identification is 85%. Conclusion: It was found that System 2 was able to consistently identify the correct playing card out of a deck of 52 with a 85% accuracy.







Using a silicon retina for playing card recognition
“System 1: FFT based”
By Daniel M. Lofaro

(Created and conducted at the 2010 Neuromorphic Workshop in Telluride, CO)
2010-07-15

Using a silicon retina for playing card recognition. The goal of this project was to create a system that uses the biologically inspired DVS-128 silicon retina (SR) to detect a playing card and determine the card’s value and suit. The DVS-128 is an event based visual system that detects changes in relative intensity in a live image. An important attribute of this system is that it is asynchronous meaning that it sends event data (changes in the image) back as it happens and not at a fixed frame rate. The silicon retina only detects changes in relative intensity which means it will only “see” moving or changing objects. Thus in order for the system to see the playing card the image needs to be moving. This is similar to how our eyes work. If we stair at an unchanging scene for an extended period of time without eye movement the scene will be perceived as becoming dulled then it will fade away. Our eye’s saccade (quick simultaneous movements of both eyes in the same direction) is believed to help elevate this. To keep consistency it is desired to have the playing card move in a constant direction with a constant velocity or constant acceleration. In addition the silicon retina should be at a fixed location for all of the tests. In this case the DVS-128 silicon retina (lends: 12mm 2.8f) was placed 19cm away from the card. The silicon retina is directly facing the card. The card slides down the vertical slide as seen in the picture below. The card slide makes the cards direction constant and the entry velocity and acceleration consistent between each run. The picture below depicts the system (System 1).

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image001.jpg

The card slides down the slide and passes in front of the silicon retina. The change in intensity that occurs, as read by the silicon retina, creates an image in three dimensions, the x and y dimensions as well as in time (t). The picture below shows the output of the silicon retina in respect to time. The bottom two axes are x and y values and the vertical axis is time. Each dot represents an event at the specified pixel location. The silicon retina has a resolution of 128 by 128. During operation in a given time span there are multiple events occurring at the same pixel location. Below is the graph of the time span when the card is dropped and passes through the field of view of the silicon retina. Below is a plot of the x-y coordinates of the silicon retina (the two horizontal axes) vs. time (the vertical axis). Each blue dot represents an event (as described above).

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image002.jpg

The location in time (t) where there are significantly more events is the period of time when the card fell through the silicon retina’s field of view. The other sparse events are noise. A close up of the period of time when the card past the silicon retina’s field of view can be found below.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image003.jpg

It is plane to see that there is structure in the event period (EP). Further examination of the falling trajectory will help us identify the card. Below is a plot of the number of events occurring on the x-y plane during the EP. The color scale is normalized where red is a large number of events and blue is a low number of events.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image004.jpg

In the above plot there is a well defined strip down the center of the x plane. This denotes that there is one column of “pips” (also known as card suits symbols).

Below is the y-t plot.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image005.jpg

There are two defined stripes in the y-t plot. This shows that there are two columns of “pips” in the card that was dropped. The number of rows and the number of columns present can help us discern the card number (ace through ten). Below is an example of a card that only shows with one column and two rows, it is a two.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image006.jpg

Next is a card that shows one column and one row, it is an ace

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image007.jpg

Next is a card that shows three columns and five rows, it is an eight. The eight and nine are special cases because they both have three columns and five rows. The density of events occurring in the center column will determine weather it is an eight or a nine. The lower density as compared to the side two columns is the nine.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image008.jpg

Below is a chart of all of definitions of the cards per number of columns and rows present.

http://dasl.mem.drexel.edu/~danLofaro/events/telluride/2010/s1/image009.jpg

Conclusion:

Overall this method worked however it was unable to detect face cards and suite. This method did show that the silicon retina can determine the card value (ace through 10) while the card is only visible for less then 80ms.

Attachments