RT Other SR 00 ID 10.1109/ICASSP49357.2023.10096097 A1 Xu, Songpei A1 Kaul, Chaitanya A1 Ge, Xuri A1 Murray-Smith, Roderick T1 Continuous Interaction With a Smart Speaker via Low-Dimensional Embeddings of Dynamic Hand Pose AB This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose. SN 9781728163277 LK https://eprints.gla.ac.uk/298426/