cogrob11/results/Introduction

The Telluride Cog Rob experiment

The CogRob experiment was conceived in an effort to (a) investigate whether the current state of the art is adequate for integrating the different components of a cognitive system into a working artifact and (b) and if so, to demonstrate for the first time a system that integrates visual processing, auditory processing, action (robotics), reasoning and language (i.e. signals and symbols). The experiment takes us away from the traditional way of approaching perceptual and cognitive problems.

Indeed, for computer vision, for example, the basic problem is given an image (or an image sequence) to recognize the objects or human actions present. The usual way is to provide labels associated with a probability (or confidence), and that’s the end of the process. The case is similar for audio processing. But our cognitive robot is supposed to continuously look at the table with objects and tools as well at the human actions taking place. It should continuously segment the scene and continuously assign recognition labels to objects and events, which are continuously analyzed by the reasoning engine that either accepts the results or provides the perceptual modules with feedback in order to obtain new data and resolve uncertainties. Much like vision, the auditory analysis sub-system of the robot performs similar operations using input the passive sounds as objects interact with each other as well the active acoustic data from the micro-Doppler sonar (bat like system).

This changes fundamentally the problems surrounding neuromorphic cognition. Before CogRob, vision and auditory processes would independently provide an answer, and it would be the final one (albeit a probabilistic one). With the CogRob, vision and auditory processes are still providing answers as evidence into two separate channels, that are amenable to revision, as they are appropriately instructed by the Language-Reasoning Executive that checks compatibilities and reasons about the different possibilities. If necessary the robot may have to move to obtain a better view of the scene, focus on the tool in the hand of an agent, as well as to position its sensorium, auditory and visual to improve the reliability of the earlier decision processes. Thus, the problem of understanding a scene (recognizing objects and actions, planning and action, making a decision) becomes easier if all components are considered together in an active agent. This is perhaps the biggest lesson from CogRob, although somewhat counter-intuitive. There is a lot of structure in the world and most of that structure is encapsulated in the language-reasoning system. It is this structure that allows the system to recover from perceptual mistakes, something foreign to current approaches in the state of the art.

CogRob was implemented as a cognitive dialogue (message passing between the visual executive, the auditory executive and the language executive) in the ROS operating system. Although corners were cut and many decisions had to be made in order to achieve a working demo on time, CogRob can deal, with minor extensions in the software, with hundreds of objects and actions, making it the first ever integrated cognitive machine, pointing the way to a new era of bio-inspired cognitive robotics.

CogRob worked almost flawlessly. Despite the fact that the individual components (recognition of objects, segmentation and recognition of action) were often making mistakes, the final outcome after the Reasoner’s work and feedback, was almost always right.