Doing social robotics with the iCub (or how to build a flirtbot)

Kevin Mazurek, Mehdi Khamassi, Timmer Horiuchi, Ulysses Bernardet, Sergi Bermudez i Badia

The overall objective of this project was to focus on the fundamental importance that social interaction and communication plays for how biological system have evolved. This approach requires us putting neurobiological models of perception, decision making and learning in their functional context, i.e. the context of social interaction. With this aim we aimed at building a “simple” robotic pet, that displays goal oriented behavior, and that a user can interact with in a meaningful way.

The project required of a mixed software/hardware approach, using both neurobiological models and standard signal processing and machine learning software solutions. The main robotic platform used for the development of this project was the iCub robot (courtesy of Prof. Alexandre Bernardino, ISR/IST lisbon, Portugal). The iCub is the humanoid robot developed at IIT as part of the EU project RobotCub? and subsequently adopted by more than 20 laboratories worldwide. It has 53 motors that move the head, arms & hands, waist, and legs. It can see and hear, it has the sense of proprioception (body configuration) and movement (using accelerometers and gyroscopes).

The iCub robot is controlled via the YARP (Yet Another Robot Platform,  http://eris.liralab.it/yarp/) middle-ware and allows us to fully control head orientation in 3 DoF, eye vergence, and 9 facial expressions, at the same time that we can readout the camera video stream in real-time (Figure 1). In addition to the iCub robot, we developed we implemented a real-time user analysis based on the SHORE toolkit ( http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/). The tool uses a video stream to analyze Gender, Age, and 5 basic emotions (angry, happy, sad, surprised). For the integration into the overall system we added a YARP interface to the framework. In order to track people's behavior (position in space relative to the iCub robot) we used the AnTS tracking system, a versatile camera based tracking system, making use of an overhead USB camera tracking an area of about 4 x 3 meters (PSEye, Sony, Tokyo, Japan) (Bermúdez i Badia,. (2003-2011))

Figure 1: iCub head ((b) from (Ruesch et al., 2008)⁠)

All of these software and hardware systems were integrated using iqr (Bernardet & Verschure, 2010) ( http://iqr.sourceforge.net/?file=kop1.php). iqr is a simulation software to graphically design and control large-scale neuronal models. Simulations in iqr can control real-world devices in real-time. iqr can be extended by new neuron, and synapse types, and custom interfaces to hardware.

Based on an existing model of the basal ganglia based on reinforcement learning (RL) techniques (Khamassi, 2005), we explored how this mechanism could be exploited in a context of social interaction. The main particularity of using a RL approach in the context of social interaction is that the state space in which the robot “navigates” to achieve its goals is defined externally, by the state of a human with respect to the robot. Therefore, the transition from state to state can only be achieved via social interaction and being able to influence the other's behavior. To allow for a robot to learn the appropriate behaviors in an environment, applying different learning algorithms serves useful. In RL there are 2 main approaches that can be taken: a model-based learning rule and a model free learning rule. Model-based would require an estimate for appropriate action-behavior patterns but this is sometimes not completely attainable so the alternative is to take a model free approach. In this method it is possible to learn the appropriate behaviors to take when observing a specific action with the associated rewards, and this is reflected by computing a Q value. A Q value represents the gain associated to performing a specific action in a given state, and changes based on past behaviors and rewards (see Khamassi, 2005 for more details). Therefore, the action selection mechanism to choose a behavior (u) given the current internal state of the iCub (x) uses the following policy to maximize the expected reward:

The robot would typically choose the behavior associated with the greatest Q value but to aid in having the robot explore other options a softmax policy is taken to encourage exploration as well. This policy has been implemented in an IQR module where two different neural inputs are used to set the input state and the reward matrix. The module can detect the number of input actions and output behaviors based on the size of these neural inputs. The output of the module goes to another neural population where one neuron is activated to associate to a certain action the robot will take (Figure 2).

Figure 2: Interaction paradigm and associated reinforcement learning system. a) The goal of the iCub robot is to engage in an interaction with a human that ultimately leads the human to be smiling in front of the robot. b) A Q value reinforcement learning system associates the person's state (input state, 13 dimensions) to the action space (11 dimensions) that triggers the iCub behaviors. Since 3 independent behavioral systems exist in the iCub (vocalization, head movement, and facial expressions), the Q value matrix that associates input-action patterns is divided into 3 winner-take-all (WTA) networks.

The implemented system in iqr consists of 157 neuronal layers, with a total of 18200 neurons. Integrate and fire, shunting inhibition, linear threshold, sigmoidal and random spike neuron types were used for the implementation (Figure 3). The input sensors were able to measure the proximity of a human to the iCub (absent, near, far), his/her position relative to the head orientation (right, left), and his/her facial expression (smile, no smile). All possible combinations of these measurements of the human status constitute the input space to the RL system. These are then associated to the behavorial action space by means of rewards. The possible behaviors were moving the head (right, left), speaking (“go away”, “come closer”, and a kiss sound), and facial expressions (neutral face, smile). Using neuronal networks implemented in iqr those behaviors were translated onto low level motor/voice commands that could be executed by the robot.

Figure 3: Overal structure of the iqr neuronal system that implements the iCub social learning. Sensor readings classify the human behavior into a discretized input space. The reinforcement learning network associates the input space to high level behaviors that are later translated to low level iCub motor commands.

After a period (30 minutes) of interaction with a human, the robot learned the head sensory-motor contingencies (right-left head turning) to gaze at people. It also learned to not tell people “go away” as well as it associated “come closer” as predictor of reward. Moreover, it generalized the hard coded reward delivered by the iCub (“kisses”) to neighbouring states. This resulted in a behavior that could be described as “flirting”. At further distances and when close but not visible by the iCub cameras, the iCub attracted the human by calling “come closer”. Once the human would be close and the face would be visible to the iCub, it would trigger a sound kiss. If this triggered a smile on the human, the iCub would smile back.

Originally, we designed the system with a high dimensionality of input x output space (11x13), making it very difficult to successfully train the iCub in a rather uncontrolled environment. In addition, we understood that there was a need of the iCub displaying a minimally structured behavior from the beginning in order to be able to explore all the state space. Without any priors about what actions are meaningful, the learning process would be extremely long as well as the learnt behaviors could be very difficult to interpret. The RL approach used in this project, Q learning, is model free. This can be seen as beneficial since it does not require us to understand what behaviors of the iCub trigger what state change. However, a model based approach can help us to understand how people change their behavior in response to iCub actions.


Khamassi, M., Lachèze, L., Girard, B., Berthoz, A., and Guillot, A. (2005). Actor-critic models of reinforcement learning in the basal ganglia: From natural to artificial rats. Adaptive Behavior, Special Issue Towards Artificial Rodents, 13(2):131-148

Bermúdez i Badia, S. (2003-2011). AnTS (Version 2) [software]. Retrieved from sergibermudez.blogspot.com

Bernardet, U., & Verschure, P. F. M. J. (2010). iqr: A Tool for the Construction of Multi-level Simulations of Brain and Behaviour. Neuroinformatics, 8(2), 113-34. Humana Press Inc. doi:10.1007/s12021-010-9069-7

Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., & Pfeifer, R. (2008). Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. ICRA 2008 - IEEE International Conference on Robotics and Automation (pp. 962-967). Pasadena, CA, USA: Ieee. doi:10.1109/ROBOT.2008.4543329