Interpreting and Generating Gestures with Embodied Human Computer Interactions
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In this paper, we discuss the role that gesture plays for an embodied intelligent virtual agent (IVA) in the context of multimodal task-oriented dialogues with a human. We have developed a simulation platform, VoxWorld, for modeling and building Embodied Human-Computer Interactions (EHCI), where communication is facilitated through language, gesture, action, facial expressions, and gaze tracking. We believe that EHCI is a fruitful approach for studying and enabling robust interaction and communication between humans and intelligent agents and robots. Gesture, language, and action are generated and interpreted by an IVA in a situated meaning context, which facilitates grounded and contextualized interpretations of communicative expressions in a dialogue. The framework enables multiple methods for performing evaluation of gesture generation and recognition. We discuss four separate scenarios involving the generation of non-verbal behavior in dialogue: (1) deixis (pointing) gestures, generated to request information regarding an object, a location, or a direction when performing a specific action; (2) iconic action gestures, generated to clarify how (what manner of action) to perform a specific task; (3) affordance-denoting gestures, generated to describe how the IVA can interact with an object, even when it does not know what it is or what it might be used for; and (4) direct situated actions, where the IVA responds to a command or request by acting in the environment directly.