Skip to content
This repository was archived by the owner on Jan 6, 2026. It is now read-only.

Conversation

@pkmital
Copy link
Contributor

@pkmital pkmital commented May 23, 2017

Here is an implementation of using queues for the WaveNet decoder in NSynth as described in:
Ramachandran, P., Le Paine, T., Khorrami, P., Babaeizadeh, M., Chang, S., Zhang, Y., … Huang, T. (2017). Fast Generation For Convolutional Autoregressive Models, 1–5.

This should let you encode using the existing NSynth model and then synthesize from any encoding using a much faster method than the current approach. You can generate a 4 second audio file in a few minutes this way, which isn't terrible. I can get about 100 samples per second using this method (not at all accurate measurements), which means a 4 second clip @ 16 KHz can be synthesized in about 10 minutes. You can potentially use this to also explore different encodings from interpolation or encode your own sounds and explore their syntheses with this generation method much more easily than before.

There is no CLI tool I'm afraid but I'm hoping someone else can develop that to make it easier for others! This just includes a simple python module magenta.models.nsynth.wavenet.generate which includes a function synthesize showing how to use the FastGenerationConfig to load an audio file, encode it, and then synthesize from the encoding.

Lastly, I wasn't familiar with the BUILD system so please let me know if that looks okay.

@jesseengel jesseengel self-requested a review May 23, 2017 22:09
@jesseengel jesseengel self-assigned this May 23, 2017
Copy link
Collaborator

@jesseengel jesseengel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome submission! Just a first comment, I think a couple of the functions could be moved over to utils.py. I'm going to run this PR through our internal linters and let you know if anything needs to be changed.

You should probably add a py_binary to run the program from the command line. It can be super simple, something like...

# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
r"""DOC STRING HERE
"""
# internal imports
import tensorflow as tf

from magenta.models.nsynth.generate import synthesize

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string("wav_file", "'model.ckpt-200000", "Path to input file.")
tf.app.flags.DEFINE_string("out_file",  "'synthesis.wav", "Path to output file.")
tf.app.flags.DEFINE_string("ckpt_path", "'model.ckpt-200000", "Path to checkpoint.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Input file size in samples.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Output file size in samples.")
tf.app.flags.DEFINE_string("log", "INFO",
                           "The threshold for what messages will be logged."
                           "DEBUG, INFO, WARN, ERROR, or FATAL.")

def main(unused_argv=None):
  tf.logging.set_verbosity(FLAGS.log)
  synthesize(wav_file=FLAGS.wav_file,
                    ckpt_path=FLAGS.ckpt_path,
                    out_file='synthesis.wav',
                    sample_length=64000,
                    synth_length=64000):

if __name__ == "__main__":
  tf.app.run()

import numpy as np


def inv_mu_law(x, mu=255.0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.

return out


def load_audio(wav_file, sample_length=64000):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.

Copy link
Collaborator

@jesseengel jesseengel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after my commits ;).

@jesseengel jesseengel merged commit bd5f28b into magenta:master Jun 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants