The ability of a machine to understand the motion and behaviour of a particular actor is a very important task in machine vision. This problem has so many possible applications in domains such as motion retargeting, robot navigation, healthcare, psychology, augmented reality applications such as games etc. In this paper we demonstrate a human-robot interaction system based on a gestural query, where the computer response is a computer generated video of another human movement. This work differs from other recent video retargeting systems since it is not meant to modify the target video as such, but rather query a video database for the most responsive segment through gestural interpretation process. For this purpose we developed a generative video system capable of extracting the latent representation of free movements such as dance and expressive gesture, and querying and re-editing multiple found video segments in response to an input movement query. One of the main challenges in this approach is finding the 'units' of continuous movement input so that both the style of the target video and the relevant aspect of the query video would be related in a meaningful way. In this paper we describe a gestural motif extraction system that combines deep feature learning with structural similarity analysis to allow such query based human-computer motion interaction.