Results from Attention Topic Area

A central problem for modern information processing systems is that they can gather more information from the environment than they can process in real time. Sensory systems expanded on the surfaces and inner organs but the central processing unit grew much slower, so gathered signals could not be processed all in parallel. Specializations such as the human eye with a central high-resolution fovea and the ability to rotate within the orbit, and the hand with the high density of touch receptors, accompanied by an over-representation of these parts in the central processing unit (the brain) partially solved this problem.

However, such hard-wired specializations are not enough to relieve the overload of the central processing unit (e.g., the brain). Fortunately, nervous systems have adapted to solve this problem quickly, efficiently and thoroughly. Essentially all animals, including insects, have developed mechanisms of selective attention. Attention is a primary cognitive function, and arguably one of the most important aspects of understanding the world around us. Without limiting the amount of information that is to be processed in detail in a smart, and situation-dependent manner, higher-level cognitive processing would be impossible. Auditory attention is used to “hear out” the desired signal, yet attention can shift with sounds that are particularly salient. Visual attention is used to explore our world, helping the animal to direct its attention to portions of the visual field needed to understand the scene. Importantly, all these different inputs from different modalities have to be integrated and taken into account for making a “good” behavioral decision.

Attention is one of the most important cognitive processes. Not only does it help us select the input we need to perform a task, but it prioritizes peripheral inputs so we can accomplish our tasks with limited cognitive resources. Towards this goal, the attention group limited themselves to four main projects, as described below.

Much of our work was directed at how do we put together a complete system that does attention-driven scene analysis. Most of the attention work (i.e. Reynolds and Heeger) is directed at modeling low-level aspects of attention. Thus at the neural level we see tuning curves that change in one of three different ways: either by broadening/narrowing the response, a gain change, or even a complete shift. This is important work, but doesn't tell us how attention works at the higher level.

Three subprojects looked at different system aspects of attention. The first project "Attention While Eye Tracking" studied whether subjects were better able to attend to a visual change based on retinal-centered or spatial-centered reference frame. Clearly subjects can attend to a number of different aspects of a visual signal. In this study we wanted to know whether subjects could be better directed to follow a position at a fixed point on their retina (using an eye-tracking paradigm so the fovea was always centered on a known fixation point), or whether a ego-centric spatial filter was better.

The second project looked at the impact on spiking neural models of attention. How does "Spike Based Visual Saliency" work? Do spikes affect the computations?

The third project looked at auditory features used when operating under an attentional spotlight. As described above, the neural data suggests the input representations changes to emphasize the most important part of the stimulus. This is unfortunate since many classification systems depend on global features. The computer vision world has great success with a local feature known as SIFT or SURF. But most audio classification is done with a global feature known as MFCC. Changing the input represention, via an attentional spotlight, would lead to vastly different MFCC vectors, and very poor recognition scores. The third project looked at a form of "High Level Saliency" which used high-level feature for saliency, and also looked at local auditory features for recognition.

Finally, the ultimate project prepared a sandbox for an attention system. The subject's goal is to understand the most "important" conversation in a cocktail party. This mimics a real party where each listener is trying to listen the conversation with the most relevance to the listener. Relevance and importance are hard to define. So for the purposes of this sandbox we asked listeners to listen to multiple overlapping two-digit sentences (One nine. Three four.) and then pick out the most important sentence. This formed our "Real-time Attention Driven Scene Analysis Sandbox."

Subproject Pages