-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Fast WaveNet generation using queues (NSynth) (CLA) #669
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome submission! Just a first comment, I think a couple of the functions could be moved over to utils.py. I'm going to run this PR through our internal linters and let you know if anything needs to be changed.
You should probably add a py_binary to run the program from the command line. It can be super simple, something like...
# Copyright 2017 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
r"""DOC STRING HERE
"""
# internal imports
import tensorflow as tf
from magenta.models.nsynth.generate import synthesize
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("wav_file", "'model.ckpt-200000", "Path to input file.")
tf.app.flags.DEFINE_string("out_file", "'synthesis.wav", "Path to output file.")
tf.app.flags.DEFINE_string("ckpt_path", "'model.ckpt-200000", "Path to checkpoint.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Input file size in samples.")
tf.app.flags.DEFINE_integer("sample_length", 64000, "Output file size in samples.")
tf.app.flags.DEFINE_string("log", "INFO",
"The threshold for what messages will be logged."
"DEBUG, INFO, WARN, ERROR, or FATAL.")
def main(unused_argv=None):
tf.logging.set_verbosity(FLAGS.log)
synthesize(wav_file=FLAGS.wav_file,
ckpt_path=FLAGS.ckpt_path,
out_file='synthesis.wav',
sample_length=64000,
synth_length=64000):
if __name__ == "__main__":
tf.app.run()| import numpy as np | ||
|
|
||
|
|
||
| def inv_mu_law(x, mu=255.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.
| return out | ||
|
|
||
|
|
||
| def load_audio(wav_file, sample_length=64000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions should probably be loaded from / added to utils.py. You could just rename them as inv_mu_law_numpy() for example.
…librosa for wavfile loading
jesseengel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, after my commits ;).
Here is an implementation of using queues for the WaveNet decoder in NSynth as described in:
Ramachandran, P., Le Paine, T., Khorrami, P., Babaeizadeh, M., Chang, S., Zhang, Y., … Huang, T. (2017). Fast Generation For Convolutional Autoregressive Models, 1–5.
This should let you encode using the existing NSynth model and then synthesize from any encoding using a much faster method than the current approach. You can generate a 4 second audio file in a few minutes this way, which isn't terrible. I can get about 100 samples per second using this method (not at all accurate measurements), which means a 4 second clip @ 16 KHz can be synthesized in about 10 minutes. You can potentially use this to also explore different encodings from interpolation or encode your own sounds and explore their syntheses with this generation method much more easily than before.
There is no CLI tool I'm afraid but I'm hoping someone else can develop that to make it easier for others! This just includes a simple python module magenta.models.nsynth.wavenet.generate which includes a function synthesize showing how to use the FastGenerationConfig to load an audio file, encode it, and then synthesize from the encoding.
Lastly, I wasn't familiar with the BUILD system so please let me know if that looks okay.