Rossetto, F., Dalton, J. and Murray-Smith, R. (2023) Generating Multimodal Augmentations with LLMs from Song Metadata for Music Information Retrieval. In: 1st Workshop on Large Generative Models Meet Multimodal Application (LGM3A), Ottawa, Canada, 2 November 2023, pp. 51-59. ISBN 9798400702839 (doi: 10.1145/3607827.3616842)
Full text not currently available from Enlighten.
Abstract
In this work we propose a set of new automatic text augmentations that leverage Large Language Models from song metadata to improve on music information retrieval tasks. Compared to recent works, our proposed methods leverage large language models and copyright-free corpora from web sources, enabling us to release the knowledge sources collected. We show how combining these representations with the audio signal provides a 21% relative improvement on five of six datasets on genre classification, emotion recognition and music tagging, achieving state-of-the-art in three (GTZAN, FMA-Small and Deezer). We demonstrate the benefit of injecting external knowledge sources by comparing them withintrinsic text representation methods that rely only on the sample's information.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Murray-Smith, Professor Roderick and Dalton, Dr Jeff and Rossetto, Federico |
Authors: | Rossetto, F., Dalton, J., and Murray-Smith, R. |
College/School: | College of Science and Engineering > School of Computing Science |
ISBN: | 9798400702839 |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record