We address the problem of identifying when a child playing an interactive game in a small group is speaking to an animated or robotic character versus conferring with his friend. This judgment about addressee is critical for turn-taking. We explore a machine learning approach using a Support Vector Machine (SVM) to integrate audio and visual features that we believe can be sensed accurately. We extend the basic model by including a simple form of group information, limited speech recognition, and limited game state to improve classification accuracy. Our results demonstrate high accuracy in detecting when the character is being addressed. This model improves our understanding of children’s group behavior in interacting with an agent.
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.