0% found this document useful (0 votes)
310 views70 pages

FonaDyn Handbook 2-4-9

Manual FonaDyn

Uploaded by

Viviana Florez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
310 views70 pages

FonaDyn Handbook 2-4-9

Manual FonaDyn

Uploaded by

Viviana Florez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

The FonaDyn Handbook

Version 2.4

Sten Ternström

with Dennis Johansson and Andreas Selamtzis

Division of Speech, Music and Hearing


School of Electrical Engineering and Computer Science
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
© Sten Ternström 2017-2022 - [email protected]

Disclaimer: FonaDyn and its supporting materials are provided to the public
domain, free of charge and with no commitment to further support. Much effort has
been invested in the development, and the code has no risky aspects that we are
aware of. Still, all software is likely to have errors of one kind or another. We are of
course interested in learning of any bugs or other problems that you might come
across.

You may install, run or modify FonaDyn only if you agree that you will not hold the
author S.T. nor co-authors Dennis Johansson and Andreas Selamtzis responsible for
any loss of data or function that may be incurred as a consequence of installing or
using FonaDyn.

Licence: A link to the European Union Public Licence v1.2 can be found below. Any
and all use of FonaDyn must comply with this licence.

EUPL v1.2: https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12

Mandatory citation: publications of any research that you perform using FonaDyn
or any derivative thereof must cite the following article. Likewise, distribution of any
software that you develop using FonaDyn or any derivative thereof must observe the
licensing conditions, and also cite the following article:

Ternström S, Johansson D, Selamtzis A (2018). FonaDyn - a system for real-time


analysis of the electroglottogram, over the voice range. SoftwareX number 7, 2018,
pp 74-80. DOI: 10.1016/j.softx.2018.03.002

The above is an Open Access article that can be accessed at this link:
http://linkinghub.elsevier.com/retrieve/pii/S235271101830030X ,
where also major updates to FonaDyn can be downloaded when available.
Minor updates can be found at www.kth.se/profile/stern.

FonaDyn version: 2.4.9

The FonaDyn Handbook ii Version 2.4.9


Contents

0 Overview ............................................................................................................. 1
1 PART ONE - Getting started ............................................................................. 2
1.1 Hardware requirements ............................................................................. 2
1.2 Software setup ............................................................................................ 2
2 PART TWO - Theory ....................................................................................... 10
2.1 Introduction ............................................................................................. 10
2.2 Signal processing overview ...................................................................... 10
2.3 Audio processing .......................................................................................12
2.4 EGG processing .........................................................................................16
2.5 EGG time-domain metrics ........................................................................ 17
2.6 EGG harmonic-domain analysis .............................................................. 20
2.7 Sample entropy of cycle data ....................................................................21
2.8 Clustering ................................................................................................. 22
2.9 Limitations ............................................................................................... 23
3 PART THREE - Using FonaDyn ..................................................................... 24
3.1 Window layout.......................................................................................... 24
3.2 Recording ................................................................................................. 36
3.3 SPL calibration ......................................................................................... 40
3.4 Listening from maps ................................................................................ 48
3.5 Files............................................................................................................ 51
3.6 EGG Shape Analysis ................................................................................. 59
3.7 Using FonaDyn with RME audio interfaces ............................................ 62
3.8 Known problems ...................................................................................... 63
Acknowledgments................................................................................................... 64
References............................................................................................................... 64

The FonaDyn Handbook iii Version 2.4.9


The FonaDyn Handbook iv Version 2.4.9
0 Overview

FonaDyn was first conceived to analyze electroglottographic (EGG) signals over


all or part of the voice range, thereby to visualize the great variability that exists
within and between individual voices. Over time, FonaDyn continues to be developed
into a more general system for voice exploration and measurement. It gives real-time
visual feedback of EGG metrics and phonation types, in the paradigm of the voice
range profile (VRP) [1]. ‘Phonation types’ may refer to modal/falsetto, or to any other
types of phonatory differences, including gradings within a single voice register.
FonaDyn can be used to pursue many kinds of research questions on phonation, voice
source dynamics, source-filter interaction and effects of therapy and/or training. It
can also be used interactively, giving visual feedback on EGG and acoustic
characteristics of the voice, or as a data-acquisition front-end with extensive preview
facilities. The only constraint is that the phonation must be reasonably periodic
throughout. Of course, the researcher/educator/clinician needs to understand in
some detail what FonaDyn does, and how to use it. That is the main goal of this
handbook.
In addition to reporting several conventional voice metrics, FonaDyn can auto-
matically categorise EGG shapes, and assign them to layers in the voice map, using a
simple form of ‘machine learning’. The learning feature has several important advant-
ages. The categories need not be known in advance, rather, FonaDyn helps you to
find them. The classification does not initially rely on any prescribed thresholds, but
rather emerges from the data. With a careful choice of the clustering parameters, it is
possible to produce a classification or stratification of the EGG that is specific to the
research question at hand.
FonaDyn is not a stand-alone application; it runs from within the SuperCollider
system, which must be installed first. SuperCollider is a free open-source software
development tool that runs on MacOS, Windows and Linux. It was originally deve-
loped by James McCartney for the computer music community. And “SuperCollider”
happens to be an apt name for analysing vocal fold collisions. For the hardware, you
will need a fast computer, a high-end digital audio interface, a good microphone, an
EGG system with an analog line output, and a quiet room in which to record.
SuperCollider can use any Windows- or Mac-compatible audio interface, but not
laboratory data acquisition boards. With some optional hardware, it is nevertheless
possible to acquire also non-audio signals, in parallel and in synchrony with the voice
and EGG signals.

This handbook has three parts:


• Part 1 describes how to get started with FonaDyn: the installation and the
hardware requirements.
• Part 2 describes the theory and design choices underlying FonaDyn.
• Part 3 describes the user interface and the most common handling procedures.

The FonaDyn Handbook 1 Version 2.4.9


1 PART ONE - Getting started

1.1 Hardware requirements


• A high-performance computer running Windows 7 or higher, or Apple Macintosh
running macOS, or a Linux system (which requires more effort and expertise).
The screen resolution needs to be at least 1200× 800 pixels, bigger is better. We
recommend having two display screens, since other programs will often be used at
the same time.
• Important: On laptops running Windows, click the battery icon in the task bar,
and pull the performance slider to the maximum.
• A low-noise microphone at a fixed distance from the mouth. It is important to
verify that no audible hum, hiss or other extraneous sound is present, by listening
through the system, at high gain.
• A high-quality digital audio interface that can be configured to have a microphone
signal on the first input and a line-level signal on the second input. We
recommend the RME line of USB/Firewire audio interfaces (www.rme-
audio.com). Others may work just as well.
• An electroglottograph that has an analog output for the EGG signal. Note that this
handbook does not cover the use of your EGG device as such. Please observe its
instructions carefully, for how it must be powered, maintained and connected for
use.
• For prompting of the subject, a pair of closed circumaural headphones.
• For parallel acquisition of non-audio signals, such as aerodynamic data, see
section 3.2.7.

1.2 Software setup

IMPORTANT: If your computer is centrally managed by an IT department, there


are several things that they have to know about, in order to install SuperCollider and
FonaDyn in a practical way. These are listed in section 1.2.6.

If you have not already done so, you will need to install the driver and control
software that came with your digital audio interface or sound card, and become
reasonably familiar with it.
Then install SuperCollider on your system, as described below. Once you have
installed SuperCollider itself, you should probably play around with it a bit, if only to
learn how to run anything at all. You need to know at least how to evaluate a line of
SuperCollider code, so that you can install and start FonaDyn. There is a good Help
system built into SuperCollider. There is also a separate website at doc.sccode.org
that you can read from anywhere, with the same content. See [2] for the most gentle
introduction to writing SC code.
Installing FonaDyn will also incorporate a number of FonaDyn-specific docu-
ments into the SuperCollider Help system. These will be of interest mostly if you are
interested in how the FonaDyn software works, under the hood.

The FonaDyn Handbook 2 Version 2.4.9


1.2.1 Versions of SuperCollider

SuperCollider is maintained by an active community of voluntary developers.


New versions are released about once a year. There are versions for the major
operating systems, and also in some cases separate 32-bit and 64-bit versions. From
the FonaDyn user’s perspective, there is little or no difference between them. The
versions mentioned here were the current versions in January 2022.
The SClang source code for FonaDyn is portable across all platforms. The
FonaDyn ‘UGen’ plugins, however, come in different versions for each platform.
FonaDyn version 2.4.9 is supplied with both 32-bit and 64-bit plugins. It has been
developed on Windows and tested on MacOS and Linux.

With the release of version 3.12.0 of SuperCollider, a new version of one of the
supporting code libraries (libsndfile, now at 1.0.31) was adopted, and FonaDyn had to
follow suit, breaking the backward compatibility. With FonaDyn 2.3.0 and higher,
you must use SuperCollider 3.12.0 or higher.

The FonaDyn Handbook 3 Version 2.4.9


1.2.2 Windows

Installing SuperCollider

1. Visit the SC download web page [3] and download one of the following two
components (the SC version number may have changed).

If you are running a 32-bit Windows, for instance on a breadboard computer,


choose the 32-bit version. On the current versions of 64-bit Windows, either
of these will work. (sn*) refers to the ‘SuperNova’ multi-processor option,
which is not used by FonaDyn.

2. Install the current version of SuperCollider, by running the downloaded .EXE


file.

3. Back on the SC download web page, scroll down to Plugins and Extensions
and click on sc3-plugins. On the new page, click on the button at Download
latest release. Download the .ZIP archive that matches your choice at (1).
sc3-plugins-3.11.1-Windows-64bit-VS.zip
sc3-plugins-3.11.1-Windows-32bit-VS.zip
4. Open the ZIP and read the README file, to choose the right storage location.
Then install the sc3-plugins, by extracting the other contents of the
downloaded .ZIP file into the appropriate Extensions directory. We
recommend that you install the sc3-plugins into the directory for all users. On
recent versions of Windows, this will be the directory
C:\ProgramData\SuperCollider\Extensions.

5. See section 1.2.6 about configuring your system’s anti-virus tools. We have
found that some anti-virus scanning products can slow down FonaDyn too
much. It is important to find the right level of protection.

The FonaDyn Handbook 4 Version 2.4.9


1.2.3 MacOS

Installing SuperCollider

1. Visit the SC web download page [3]. Download one of the following (the SC
version may have changed):

This will download a zip file that contains the Mac installer package and some
README files. Read them. (sn*) refers to the ‘SuperNova’ multi-processor
option, which is not used by FonaDyn.

2. For getting the sc3-plugins, scroll down to Plugins and Extensions and
click on sc3-plugins. On the new page, click on the button at Download
latest release. Download the file sc3-plugins-3.11.1-macOS-signed.zip .

3. In the sc3-plugins .ZIP archive, read the README file, to choose the right
location. Then install the sc3-plugins, by extracting the other contents of
the downloaded .ZIP file into the appropriate Extensions directory. We
recommend that you install the sc3-plugins into the directory for all users.
On recent versions of macOS, this is
/Library/Application Support/SuperCollider/Extensions.

1.2.4 Linux
Linux runs on many diverse platforms. At this writing, the FonaDyn-specific
plugins have not yet been compiled for any Linux (older versions did run). Please
contact the author at [email protected] if you are interested in using FonaDyn on some
implementation of Linux, and/or in helping us make that build. You will then be
provided with the plugin source codes, and with some guidance.
There is a Linux distribution for SuperCollider [3], but, as is customary with
Linux, it is provided in source code form only. This means that you have to download
the whole source and build SuperCollider itself, before you can install FonaDyn on
Linux. Know-how and experience of making such builds will be necessary. You will
need to find version 3.12.0 or higher in the Releases link at
supercollider.github.io/download. Some of the Linux packages there are still for older
versions of SuperCollider. You will need also to build the sc3-plugins.

The FonaDyn Handbook 5 Version 2.4.9


1.2.5 Installing FonaDyn into SuperCollider

1. If you are about to update to a newer version of FonaDyn, you must first
uninstall your current version by evaluating the line
FonaDyn.uninstall;

This deletes all files in the installed folders, so do not save anything else
there that you want to keep.
2. Download FonaDynInstall-version.zip using the download link that you
have obtained from the SoftwareX repository, or directly from us. The
most recent version is available at https://www.kth.se/profile/stern .
3. Unzip only the contained directories FonaDyn and FonaDynTools into
an Extensions directory, preferably your own SuperCollider User
Extensions directory. On recent versions of Windows, this will be
C:\Users\<username>\AppData\Local\SuperCollider\Extensions.
The directory C:\Users\<username>\AppData is often hidden, but if you
type it into the address bar in the Windows Explorer, it will open.
On MacOS, this will be the directory
/Users/<username>/Library/Application Support/SuperCollider/Extensions.

4. The remaining contents of FonaDynInstall-version.zip are various useful


files that are not SuperCollider code. Unzip and store them wherever you
like, except in the …/Extensions directory or any of its subdirectories.
5. Evaluate the sclang code Platform.userAppSupportDir ; the path to this
folder will appear in the Post window.
6. Unzip the file startup.scd.example.txt and adapt the settings to your
system. Save it as startup.scd in the folder from (5).
7. Run SuperCollider. Wait for the "Post window" to display this line:

*** Welcome to SuperCollider 3.12.1. *** For help press Ctrl-D.

8. In a new code window (File | New or Ctrl+N), type and then evaluate the
line
FonaDyn.install;

This will perform a few checks, and copy a few additional files to their
proper locations.
9. You should now be able to start FonaDyn, by evaluating the line
FonaDyn.run;

The FonaDyn directory contains source code and Help files that are specific to
FonaDyn. You can read these help files using the built-in SuperCollider help system,
under “Browse | FonaDyn”. The FonaDynTools directory contains supporting code
that was written for FonaDyn, and that may be of interest also to developers of other
SC programs. You can read the corresponding help files similarly, under “Browse |
Tools”. The FonaDynTools directory also contains subfolders with the platform-
specific dynamic-link libraries for the FonaDyn UGens.

The FonaDyn Handbook 6 Version 2.4.9


1.2.6 Centrally managed installation

If your computer is centrally managed, it may have access policies in effect that
will stop SuperCollider from doing certain things. Please show this section 1.2 of the
handbook to your IT-support department, so that they can install SuperCollider with
the necessary privileges, etc. Have them note especially the following:
SuperCollider is a software development environment in which it is necessary for
users to be able to create/modify files in certain privileged locations and/or functions.
At run-time, it does real-time processing of audio files, which needs to be fast, while
being harmless from a malware point of view.
There are three executable modules: scsynth.exe, sclang.exe and scide.exe,
which communicate with each other using a network protocol, and they must be
allowed to do so.
For performance reasons, it may also be necessary to include their processes in
the list of processes that are excepted from anti-virus monitoring. In particular, when
the DSP server, scsynth.exe, is booted for the first time in a Windows session, the
Windows Defender anti-virus protection can take up to two minutes before allowing
scsynth.exe to run (subsequent boots are quick). This is very annoying, and most
users will want to except the processes “scsynth.exe” and “sclang.exe” for this reason.
We have noted also that anti-virus products from F-Secure will need to be specially
configured for usability with SuperCollider. They valiantly try to scan everything in
real time, which usually does not work.
For its signal processing functions, SuperCollider uses a large number of small
‘plugin’ DLL’s that have the filename extension .SCX (on Linux: “.SO”) . On rare
occasions it may happen that an anti-virus program will quarantine one of these,
causing SuperCollider to complain that a particular module is not found. We have
never known this actually to be malign, but of course your policies must be upheld.
You will have to decide what to do about it.
For Windows, FonaDyn is supplied with plugins for both 64-bit and 32-bit
versions of Supercollider. The appropriate set of plugins is selected automatically
during installation using the routine FonaDyn.install, as described above.
On MacOS, the FonaDyn installation replaces the sc3-plugin PitchDetection.scx
with one that has been recompiled to use the FFTW3F library (statically linked), rather
than the MacOS own vDSP library – they give different results. Running
FonaDyn.uninstall will restore the original .scx file.

The FonaDyn Handbook 7 Version 2.4.9


1.2.7 Audio device configuration
FonaDyn requires one stereo pair in and one stereo pair out. If your audio device
has only two inputs and outputs, then these are what FonaDyn will use. If you are
using an external multichannel audio interface (recommended), then your operating
system will probably list more than one pair of audio inputs and one pair of audio
outputs. By default, FonaDyn will use the first pair of inputs and the first pair of
outputs. It can be configured to record on any combination of multiple inputs and
outputs, provided that they all sit on the same audio interface hardware (or on
separate, but electrically synchronized devices). A list of currently available audio
devices is printed in the Post window, whenever the server process SCSYNTH is booted.
If you will not be using SuperCollider for anything else, you can specify 2 ins and
2 outs in the SC startup file. For instructions on how to do this, see the SC
documentation for the startup file. The startup file has its own entry in the File menu
of SCIDE, so it is easy to find. Activating more ins and outs than are needed incurs an
unnecessary CPU load.

Figure 1. Typical recording setup.

A typical experimental setup is shown in Figure 1. The first (left) input channel
must receive the signal from a microphone, via a microphone preamplifier. Many
audio interfaces have built-in microphone preamps, on two or four inputs. You will
need to calibrate the gain of the microphone channel (→3.2).

The FonaDyn Handbook 8 Version 2.4.9


The second (right) input channel must receive the line-level signal from the
electroglottograph. Unlike the microphone signal, the EGG signal level does not need
to be calibrated. However, it should be adjusted so as to prevent clipping. Note that
some EGG devices have a large voltage swing on their outputs, typically ±10 volts,
which is too strong for most audio interfaces. If the EGG device’s output level cannot
be turned down enough, you may need to pass it via an attenuator to the audio
interface. If your audio interface has adjustable input sensitivity, choose the least
sensitive setting for the EGG signal.
As for outputs: if “Echo/Playback” is active, FonaDyn will play a copy of the audio
signal (live or from file) to the left-hand output and a copy of the unprocessed EGG
signal (live or from file) to the right-hand output.

1.2.8 Using alternative audio inputs


On some audio interfaces, the microphone pre-amps are not on the first two
inputs. You can tell FonaDyn to use any other two inputs for voice and/or EGG by
inserting a statement in the file startup.scd, like this:

FonaDyn.config(inputVoice: 8, inputEGG: 9);

Inputs in SuperCollider are numbered from zero, so “inputVoice: 8” means to fetch


the microphone signal on the ninth channel of the audio interface. You will need also
to enable all the inputs up to the chosen ones with a line in the startup file:

Server.local.options.numInputBusChannels = 10;

There can be multiple calls to FonaDyn.config(…) in the startup file, and it can
control several options (see also →3.1.4, →3.3.6, and the topic “FonaDyn” in the
SuperCollider Help System).

The FonaDyn Handbook 9 Version 2.4.9


2 PART TWO - Theory

2.1 Introduction
FonaDyn allows you to explore how the acoustic and EGG signals vary with the
fundamental frequency (fo) and the sound pressure level (SPL). It does this by
computing various features of the signals and mapping these by colour onto the fo-
SPL plane, also called the ‘voice field’.
The time-domain features of the audio and the EGG signals are all scalar
numbers that are mapped straight to a colour scale. The resulting ‘voice maps’ unfold
in real time as the subjects vocalizes, or as an existing recording is analyzed. The
signal preconditioning and the computation of the time-domain features are
described in the rest of this section, and in [4].
The harmonic-domain representation (→2.6) used for the EGG waveform is in
the form of high-dimensional vectors that cannot be assigned to colours as such. They
are first clustered to ordinal cluster numbers, which then are mapped to colours. The
clustering can be automatic, or, an experienced operator can cajole the clustering into
categories that are of interest for a particular application. Normally such an analysis
has these stages:
(1) record to file
(2) learn a set of EGG waveshapes from the recording
(3) use that set to classify other EGG waveshapes
(4) export the results for further treatment.
Here, stages (1) to (3) are done in FonaDyn. The “further treatment” in (4) is done
with the tools of your choice, e.g. MS Excel or Matlab. To facilitate research work,
practically all signals and results from FonaDyn can be exported to files in common
formats (.csv, .wav, .aiff), for subsequent processing in other software.

2.2 Signal processing overview


The audio and EGG signals are processed in parallel, see Figure 2. From the
microphone audio signal, six metrics are derived: the signal level, the fundamental
frequency fo, the fo periodicity (or “clarity”), the crest factor, the spectrum balance
and the cepstrum peak prominence (CPP). The level and fo serve to specify the plot
position in the voice field. The clarity metric is used as a periodicity gate: ranging
from 0 to 1, it must be close to 1.0 for the EGG processing to proceed. The default
threshold is 0.96 . The crest factor gives an approximate but useful indication of the
high frequency content of the voice signal [7]. For open vowels, it is somewhat similar
to the MFDR. The spectrum balance is the ratio of acoustic powers above 2 kHz and
below 1.5 kHz, as separated by two filters that slope by 24dB/octave. The CPP is a
metric that reflects the periodicity of the audio signal spectrum.
The gain of EGG signals is difficult to calibrate, and also typically varies a lot (due
to varying larynx height, for instance). Therefore, all EGG metrics computed by
FonaDyn are normalized such that the EGG amplitude is disregarded. Only the EGG
wave shape is considered to be of interest, and hence only a coarse adjustment of the
EGG signal gain is needed.

The FonaDyn Handbook 10 Version 2.4.9


Figure 2. Overview of the signal processing in FonaDyn.

The EGG signal is first segmented into cycles. From here on, the data rate is
reduced, being proportional to the EGG cycle rate, rather than to the sampling rate.
Each cycle is analyzed individually, both in the time domain, and in the harmonic
domain. The harmonic analysis uses a Discrete Fourier Transform (DFT) for its N
lowest Fourier components, where N is typically 10 or less.
The underlying ideas for the harmonic-domain analysis were first proposed by
Selamtzis and Ternström [5]. The input EGG signal is first segmented into individual
cycles, each of which is analysed with the Discrete Fourier Transform. The resulting
harmonic magnitudes and phases are then input to a so-called K-means clustering
algorithm. This algorithm treats the data as point coordinates in a high-dimensional
space, and allocates each point to the cluster whose centroid is closest in this space.
That centroid is then updated to include the new point. This gives rise to a given
number of categories, into which the cycles are classified. The clustering is ‘learned’

The FonaDyn Handbook 11 Version 2.4.9


in real time, without any prior knowledge of EGG waveshapes. The clusters are
colour-coded for display in the voice map, with a separate layer for each cluster. The
number of categories, or clusters, must be chosen to be appropriate to the research
question, usually 2...5 . Arriving at that choice usually requires some exploratory
passes over the data. ‘Learned’ cluster data can then be saved, reloaded and used to
classify new signals.
Each Fourier component is a complex number with a real and an imaginary part
(cartesian representation), or, equivalently, with a magnitude and a phase (polar
representation). To minimize the influence of the variable gain of the EGG signal,
relative magnitudes and phases are used instead. The first Fourier component, that
is, the fundamental, is taken as the reference. For components 2…N, the relative
magnitude levels and the phase differences to component 1 are taken as the input to
a clustering algorithm. This means that the overall EGG amplitude does not affect the
clustering. In addition to the N first Fourier components, the system estimates the
level of that residual EGG signal power which is not accounted for by the first N
Fourier components, and uses that as an extra clustering dimension.
In parallel with the clustering, the time series of per-cycle DFT data are analyzed
also for their ‘sample entropy’, or SampEn for short. This is a metric of phonatory
stability that is designed to detect abrupt changes, such as register breaks, while
suppressing minor instabilities.

2.3 Audio processing

2.3.1 Input selection


FonaDyn takes a microphone audio signal on the first A/D channel and an EGG
signal on the second A/D channel; or, takes its input from audio files with at least
those two channels. Valid input audio file formats are those supported by the
‘libsndfile’ library, in other words, almost any format. The FonaDyn sampling rate is
44100 Hz per channel. FonaDyn’s own recordings of Voice+EGG are stored in 16-bit
integer format .WAV files.
When recording from the live inputs, the ‘raw’ microphone and EGG signals are
written unchanged to the output file, with no prior processing. This is to avoid
repeated preprocessing when analyzing a recording rather than the live inputs. It also
enables the recordings to be used by other analysis systems, with no constraints
having been imposed on the signals. When analyzing a first-generation FonaDyn
recording, the digital input signals are identical to the signals that were obtained ‘live’
from the A/D-converters (→3.2.6).

2.3.2 Audio preprocessing


A second-order high-pass Butterworth filter at 30 Hz is always applied to the
microphone signal, to suppress low-frequency rumble. The filter characteristic is
essentially equivalent to a C-weighting filter for level measurements. This filter is not
linear-phase, and does not need to be.

The FonaDyn Handbook 12 Version 2.4.9


2.3.3 Voice map metrics
From the microphone audio signal, six metrics are derived: the signal level in dB,
the fundamental frequency fo , the fo periodicity (or ”clarity”), the crest factor, the
spectrum balance and the cepstral peak prominence (smoothed).

• The level of the audio signal serves to provide the vertical plot position in the
voice maps. It is computed as follows.
1. The audio signal is high-pass filtered as described above.
2. The periods in the audio signal are found by double-sided peak-picking,
following Dolanský [8].
3. The RMS of the signal is computed period by period. No smoothing is applied.
4. The RMS values are converted to decibels down, relative to full scale. These are
the values that are written to the log file (→3.5.3).

The SPL is measured for every phonatory period, so that each period can be
assigned a position in the fo-SPL plane (the voice field). Hence the SPL measure-
ment in FonaDyn corresponds most closely to a C-weighted “fast” reading; but it
is faster. Calibrating the signal level to true sound pressure level (SPL) is very
important for meaningful comparisons between across recordings; both your own
and those of others (→3.2). The audio signal in the *_Voice_EGG.wav files is
expected to be scaled such that a peak-to-peak full scale sine wave corresponds to
±20 Pa peak, which corresponds to 117 dB SPL RMS. You need to calibrate the gain
in your signal chain so that this is true (→3.2), or, be prepared to correct your
results in post-processing.
Some supplementary detail on how the SPL is represented internally is
warranted here. As recommended by the European Union of Phoniatricians, the
SPL scale in FonaDyn extends from 40 to 120 dB as measured at distance of
0.30 m. By convention in digital audio, the full-scale signal amplitude is the
floating-point value ±1.0, which corresponds to the maximum peak amplitude
that the A/D-converters can handle. In acoustics, levels are computed from the
RMS of a signal rather than from the peak amplitude. However, the RMS value of a
signal depends on its waveform. Only a symmetrical square wave with negligible
transition times has an RMS amplitude that is equal to the peak amplitude. For all
other waveforms, the RMS value is smaller than the peak amplitude. Most often
we use sine waves when calibrating the level, and the RMS level of a sine wave is
3 dB below its peak amplitude. Hence the highest sine wave SPL that can be
represented in FonaDyn using the nominal calibration is 117 dB. For recording
louder voices, see 3.3.6.

• The fo value serves to provide the horizontal plot position in the voice map. In
FonaDyn, fo is measured in semitones, using MIDI note numbers with fractions.
MIDI = 57.0 corresponds to fo = 220.00 Hz. No calibration of fo is needed. It is
computed by an algorithm based on autocorrelation via the FFT [6] (in the
SuperCollider ‘UGen’ Tartini). The value of fo is updated at intervals of about
23 ms. This fo extraction algorithm is the best that we have come across; it is good

The FonaDyn Handbook 13 Version 2.4.9


at ignoring strong overtones, for example. However, it is also quite sensitive and
will find periodicity where other algorithms might not. In particular, if there is an
AC mains hum in the audio signal, even a weak one, there will be a risk of locking
on to that when the voice fo is at some multiple of the mains frequency (which is
50 or 60 Hz, depending on where your lab is). Therefore it is always very
important not to have hum in your signal chain.

• The clarity metric is used as a periodicity tester. Ranging from 0 to 1, it must be


above a certain threshold for the EGG processing to proceed. The default
threshold is 0.96, which is a fairly strict requirement, corresponding to quite
stable phonation. Mathematically, the ‘clarity’ is the value of the autocorrelation
function of the audio signal, at a delay of one cycle at the estimated fo. Like fo, it is
computed by the UGen Tartini.

• The crest factor gives a simple but useful indication of the high frequency content
of the voice signal [7]. For open vowels only, it is a close cousin of the maximum
flow declination rate (MFDR). It is computed as the ratio of the peak amplitude to
the RMS amplitude, for every phonatory cycle. A sine wave has a crest factor of
2  1.414 ( 3 dB), while a very bright voice signal can reach a crest factor greater
than 4 ( 12 dB).

• The spectrum balance SB is the ratio of the acoustic power above 2 kHz to that
below 1.5 kHz, as separated by two filters that slope by 24 dB/octave. The two
powers are smoothed at 50 Hz and then the level difference High-Low in dB is
computed. The spectrum balance is usually negative and typically increases
(becomes less negative) with increasing vocal effort. In the literature, there are
several variants for how spectrum balance is defined, so watch out for that. In our
experience, the exact filter cutoff frequencies have only a minor effect on SB, at
least for the vowel /a/.

• The cepstral peak prominence smoothed (CPPs) is a metric that is higher the
more regularity and the less noise there is in the audio signal. It is reported in dB
and will typically range from 0 dB in a severely dysphonic voice to +20 dB in a
very ‘clean’ voice. As such it tends to correlate (negatively) with various dys-
phonias. However, like all other voice metrics, CPPs varies with SPL and fo, which
is why it is instructive to see it on a voice map. For more details, the reader is
referred to the scientific literature on this metric.
In FonaDyn, CPPs is computed for voiced segments only, as computing the CPP
for unvoiced segments is not meaningful [19]. As for all the other metrics, voicing
is detected on the basis of the 'Clarity' metric. For running speech, you may wish
to lower the default Clarity threshold to about 0.9. This will also somewhat lower
the CPPs, because a greater proportion of irregular and/or noisy phonations will
be accepted.

The FonaDyn Handbook 14 Version 2.4.9


Many settings need to be chosen in order to compute the CPPs, and these will
affect the outcome. In FonaDyn, the cepstrum parameters have been set as
follows, partly following Awan et al (2013)[20].

- The initial FFT is 2048 points, Hanning window.


- The cepstrum is 1024 points, resulting in 512 quefrency bins.
- The cepstrum is converted to dB before the linear trend line is computed.
- The frame rate is 43.07 Hz, so a new estimate of CPPs is produced every 23 ms.
(Results from unvoiced segments are discarded.)
- Voice fundamental frequencies from 60 Hz to 880 Hz are covered.
- Temporal smoothing is implemented as one first order low-pass per quefrency
bin. The filter coefficient is 0.7, which corresponds to about 7 cepstrum frames
averaged in time. The smoothing window in time is an exponential decay rather
than a rectangular average.
- Quefrency smoothing is a mean of 11 bins, corresponding to a quefrency window
of 0.5 ms.

The settings for smoothing and bin ranges are easy to change, but this has to be
done in the SCLANG code, in the file VRPSDVRP.SC.

The FonaDyn Handbook 15 Version 2.4.9


2.4 EGG processing

2.4.1 Preprocessing
The absolute EGG peak amplitude is monitored during recording (→3.2.3). If it
comes within 0.5 % of full scale, an EGG CLIPPING warning is flashed on the screen.
Even a slightly clipped signal would compromise the spectral analysis for the
clustering, while a signal that is completely out of range is invalid.
The EGG signal is then
1. high-pass filtered at 100 Hz (-3 dB @ 80 Hz), using a 1024-point linear phase
FIR filter. This practically eliminates the large near-DC content that is common
in EGG signals. To do so is necessary for the Qci computation. However, it also
means that the lower that fo descends below 100 Hz, the more the EGG
waveform will be distorted. So, interpret low bass note EGGs with caution.
2. median-filtered over 9 sample points; this reduces somewhat the low-level
‘crackle’ that is common on some EGG devices, especially when the battery cells
are aging;
3. low-pass filtered at 10 kHz, using a “brick wall” 64-point linear phase FIR filter
(-80 dB @ 12 kHz).

This EGG preprocessing introduces a 34 ms delay, which is accounted for


internally. It also limits the range in fo : downwards, to 100 Hz, and upwards to
10 kHz divided by the number of DFT components in the analysis.

Figure 3.

Frequency responses of the


EGG filters: high-pass
(yellow) and low-pass (blue).
The 9-point median filter
(dotted) affects only low-level
noise. The log scale frequency
labels are at 10, 100, 1000
and 10000 Hz.

The FonaDyn Handbook 16 Version 2.4.9


2.4.2 Period segmentation

FonaDyn implements two strategies for period segmentation: double-sided peak-


following, after Dolanský [8]; and phase tracking. The double-sided peak follower
is the simpler one; it is applied to the differentiated EGG signal (dEGG). The dEGG is
computed as the differences between consecutive sample points. The positive peak-
picker selects a point somewhere just prior to a local positive maximum of the dEGG
signal. The ‘circuit’ then waits for a local negative maximum to occur, before allowing
the next positive dEGG peak to trigger a new period. Using the dEGG rather than the
EGG makes the triggering point somewhat less dependent on fo , but also more
sensitive to noise.
Sudden changes in dEGG amplitude (which are common) may lead to one or a
few lost cycles. Sudden increases in dEGG amplitude may cause the preceding cycle
time to be estimated as slightly shorter.
The default method, however, is based on the phase portrait [4]. The EGG
signal is time-integrated and then paired with the original to form a complex-valued
signal of the integrated EGG. By using the integrated signal (rather than the more
common Hilbert transform, which is somewhat similar to the derivative of the EGG),
we obtain better rejection of low-level noise, and the cycle trigger point is usually very
close to positive-going zero-crossings in the EGG signal. The phase of this pseudo-
analytic signal is now computed as the arctangent of the complex pair; and then that
phase signal is subjected to the Dolanský method described above. The phase wrap-
around from  to - typically occurs very near the dEGG peak, giving a convenient
transient in the phase signal. This method for cycle segmentation generally performs
better than peak-picking with noisy signals, as well as with fluctuating amplitudes,
and it seems to give more robust segmentation overall. With unusual EGG signals,
however, which may have multiple deep inflexion points per cycle, this method is
sensitive to multiple loops in the phase portrait, and may be less reliable than the
dEGG peak-following method.
Cycles longer than 20 ms (50 Hz) or shorter than 0.23 ms (4410 Hz) are rejected
at this stage.

2.5 EGG time-domain metrics


A somewhat more detailed discussion of these metrics is given in reference [4].
These metrics are computed directly from the conditioned EGG signal.

2.5.1 The contact quotient Qci


The contact quotient is that fraction of a whole EGG cycle for which the vocal
folds can be said to be in contact; hence, it can range from zero to one. Numerous
schemes have been proposed for calculating it [9]. Here, we are content to find a
quantification that represents the relative amount of contacting over the cycle,
without regard for the actual instants of opening or closing, nor for the possible
complementarity with the glottal flow waveform.

The FonaDyn Handbook 17 Version 2.4.9


Consider an EGG pulse normalized to length 1, with the cycle amplitude
normalized to the range 0…1 . The area under this normalized pulse can also range
from 0 to 1. This area can be estimated for arbitrary pulse shapes, so it does not
require a single peak, nor well-defined opening and closing events, nor a contacting
threshold that would always be in some sense arbitrary. We will call this metric the
Qci , for “quotient of contact by integration”. The figure below shows how it is
computed.

Figure 4. Computation of Qci . If the non-normalized EGG signal is DC free, with a


mean of zero over the cycle, then the integral of the normalized pulse can be
computed simply as min/(min - max) .

For typical pulse shapes, Qci correlates well with other quotient metrics, and
hence gives much the same information. A commonly used scheme is the one
proposed by David Howard (here: the CQH): find the positive peak in the dEGG, and
then the next point on the EGG that has an amplitude of 3/7 or less of the peak-to-
peak amplitude. The duration between these points is divided by the period time, to
yield the CQH. We have found that for EGG waveshapes that have a contacting phase,
there is a practically linear relationship between Qci and CQH:
𝑄𝑐𝑖 ≈ 0.75 ∙ 𝐶𝑄𝐻 + 0.1
For the common case of CQ = 0.4, which is typical of normal speech in male and
female adults, Qci was also 0.4 . The usefulness of Qci lies in quantifying unusual pulse
shapes that might, for instance, cross a prescribed threshold more than once per
cycle.
Notice that the Qci metric has the same problem as other definitions of the contact
quotient, in that when vocal fold contact ceases, the waveform becomes low in
amplitude and nearly sinusoidal. For such a waveshape, Qci will “erroneously” tend
towards 0.5 (and CQH toward 0.523), which would correspond to rather a large
amount of contact, if interpreted as normal phonation. You need to keep this in mind
when considering the Qci in the voice map display. The next metric does not have this
problem.

The FonaDyn Handbook 18 Version 2.4.9


2.5.2 The QΔ metric
The QΔ metric gives the maximum positive slope of the cycle-normalized EGG
signal. This is a metric of the relative rate of change of contact area. It is not the same
thing as the speed with which the vocal folds are moving towards each other,
although often there will be a co-variation of these two speeds. The QΔ value is
normalized such that a sine wave receives a slope of 1 . The wave shapes and their
colour mapping in the voice map are shown in the table below.

Mean QΔ EGG shape and slope Interpretation

1…1.5 Nearly sinusoidal: no vocal fold collision.


This would be at very low EGG amplitudes.

1.5…3 Incomplete contacting:


just above the threshold of collision

3…10: normal range of contacting rates


in modal voice

N = max rate of contacting using N harmonics


(synthesized with phases=0 and amplitudes 1/k)

Because the EGG is high-pass filtered (DC free), the peak of the EGG derivative
usually coincides closely in time with positive-going zero crossings in the EGG. The
exception is for partial contacting (the second row in the table above).
At low fo, the QΔ metric converges toward the iNAQ metric proposed by C. Herbst
in Enflo et al (2016), which in turn is similar to the inverse of the Normalized
Amplitude Quotient (NAQ), introduced by P. Alku, that is often used to characterize
glottal flow.
FonaDyn computes the QΔ metric from two different version of the EGG pulse,
with different results. The QΔ metric as given in FonaDyn’s voice maps and in the Log
files is computed from the input EGG waveform, in a fast computation. However,
because of the 10 kHz brick-wall pre-conditioning filter, this QΔ has an inherent
dependency on fo : the higher the fo, the more high harmonics will be filtered out.
Hence, for a constant EGG wave shape, this QΔ decreases in proportion to increasing
fo . Also, it is rather sensitive to noise at low EGG amplitudes, which typically
prevents it from descending completely down to its minimum value of 1.
Another QΔ is the one shown in the waveform resynthesis display of the Clusters
panel (→3.1.3). This value is computed from the resynthesized waveform, which has a
constant number N of harmonics (→2.6). Therefore this QΔ is smaller, but it does not
depend on fo , for values of fo < 10 kHz / N . Also, it is less sensitive to noise in low-
level EGG signals and therefore can descend close to 1 when there is little or no vocal
fold contact. Although this is the preferred variant of QΔ , it would take a prohibitive

The FonaDyn Handbook 19 Version 2.4.9


amount of time to compute on every cycle. (In the Clusters panel, it is recomputed
only at the redrawing rate of the graphics.) Therefore, it is not saved by FonaDyn; but
it can be obtained by post-processing Log files in Matlab.

2.5.3 The Ic metric


The ‘index of contacting’ Ic was introduced by Ternström [4]. It is a combination
of Qci and QΔ , namely Ic = Qci × log10(QΔ). The index of contacting has the property of
approaching zero for a pure sine waveshape, as in very weak phonation, when there is
vocal fold vibration without collision; and it becomes approximately one when Qci
and QΔ are both high, as in very strong phonation. The Ic metric, too, is computed
from the input waveform rather than the resynthesized one. At this writing, the
physical interpretation of the Ic metric remains to be formulated.

2.6 EGG harmonic-domain analysis

2.6.1 DFT analysis


A DFT analysis is performed of the lowest (and strongest) frequency components
of the EGG signal. For N components, the program computes N value pairs
(magnitude and phase) cycle-synchronously over each EGG cycle. N can be chosen
as 2…20; typically it will be 6…10. Because the DFT frame length adapts to the EGG
cycle length, the DFT components are equivalent to the harmonics of the EGG signal.
A small quantization error arises here, in that the true EGG period time is not exactly
an integer multiple of the sampling interval. In practice, though, such errors are
absorbed by the subsequent statistical clustering.
The Fast Fourier Transform is not used, because the frame length here must be
exactly one EGG cycle (to the nearest sampling interval), and because N is small.
Instead, the desired DFT cosine and sine terms for the most recently completed EGG
cycle are computed directly in the time domain. This also results in a constant CPU
load for short and long cycles.

2.6.2 EGG waveform DFT components


For each EGG cycle that has been judged as valid, the first Fourier component
(the ‘fundamental’) is taken as a reference, for both magnitude and phase. The
magnitudes, or absolute values, of the complex Fourier components k  [ 2…N ] are
computed, and their ratios to the magnitude of Fourier component 1 (the fundamen-
tal) are expressed as level differences Lk . Using only the relative harmonic levels
eliminates the effect of a variable gain on the EGG signal.
Similarly, the phases of the Fourier components k  [2…N] are computed relative
to the phase of the fundamental, i.e., k = k - 1 . The reason for computing the
phases relative to 1 is that the cycle detection algorithm finds a cycle triggering
point, whose location in the EGG cycle will depend somewhat on the current wave
shape. By using the relative phases, this dependency is prevented from affecting the
DFT results.

The FonaDyn Handbook 20 Version 2.4.9


2.6.3 Residual energy
Since N is typically small, most of the high-frequency information in the EGG
signal is lost. However, the overall spectrum slope of the EGG signal is typically quite
uniform towards high harmonics, and thus the higher harmonics all tend to carry
much the same information. Also, at low EGG signal levels, the higher harmonics
often descend below the analog noise floor of the EGG hardware. In FonaDyn, the
residual energy in all the harmonics > N, plus any noise, is estimated by summing the
energies in the N lowest harmonics, subtracting that from the total energy of the
EGG signal, and again expressing the difference as a level relative to that of the
fundamental. For low EGG signal amplitudes, H will tend to represent the relative
level of the noise floor, rather than the relative level of the omitted harmonics.
This analysis method emphasizes the overall periodic contacting pattern of the
vocal folds. It will tend to disregard very brief or transient events that are manifest
mostly at high frequencies, such as multiple contacting events that may occur very
closely in time (small fractions of a cycle).

2.6.4 Phase of the fundamental


Since the phase of the residual energy H is indeterminate, that variable is used
instead for the phase of the fundamental. This value is needed in order to reconstruct
the EGG waveforms from the cluster centroid values. The phase of the fundamental is
taken relative to the cycle trigger point found by the cycle detection. We do not want
this phase to affect the clustering, though, so it is first down-weighted by 0.001.

2.6.5 Representation of phase differences


Since the phase angles k are expressed in the range [ -,  ) , a phase whose
value crosses ± will cause a 2 jump. This must be avoided, since it could give rise
to falsely disjunct data clusters. One could take the absolute phase difference, folded
over  , thereby avoiding jumps. The clustering would then improve, but some
information is lost, and it would not be possible to reconstruct EGG waveforms from
the cluster centroid values. Therefore, each relative phase k = k - 1 is instead
represented by the value pair (cos(k), sin(k)) . This eliminates the discontinuities,
at the cost of an extra clustering dimension per Fourier descriptor.

From here on, the EGG harmonic-domain processing splits into two paths: DFT
component clustering, and cycle-rate sample entropy estimation based on the DFT
components. The entropy estimation is described first.

2.7 Sample entropy of cycle data


The ‘sample entropy’ of a signal [11][12][13] is an interesting metric that has
found numerous biomedical applications. For brevity, it is often called SampEn. The
SampEn is low for a regular, self-similar signal and high when a signal is transient,
erratic, or noisy. While others [14] with some success have taken the SampEn of high-
rate, isochronous signals such as sampled EGG or audio, we have found that the

The FonaDyn Handbook 21 Version 2.4.9


SampEn metric is particularly effective for phonation data that are cycle-
synchronous. SampEn has a threshold or ‘tolerance’ parameter that keeps it at zero
while phonation is stable, even with changing pitch, but when ‘something unusual’
happens, the SampEn peaks. This ‘something’ could be a voice break, such as those
occurring between chest and falsetto voice [5], or other instabilities in phonation.
The SampEn is low when the vocal fold contacting pattern is stable, and increases
with increasing instability. Unlike conventional jitter and shimmer measures, which
report on changes in period time and amplitude, respectively, the cycle-rate SampEn
is fairly insensitive to such variations. This is because FonaDyn normalizes the EGG
pulses in amplitude and period time, prior to computing the SampEn. Also,
aerodynamic turbulence in the larynx does not contribute substantially to the
SampEn, since it is the VF contacting signal that is analyzed. SampEn reports on
cycle-to-cycle changes in the EGG waveform shape only. Typically, SampEn decreases
substantially at the onset of vocal fold collision, and tends to decrease further as vocal
effort increases, analogously to what has been observed for other perturbation
metrics such as jitter.
In FonaDyn, the harmonic-domain analysis produces a vector of values that is
updated on every phonatory cycle, when we obtain new values of the level and the
phase, for all harmonics. Each element in this vector provides the input to one of
several SampEn estimators, running in parallel for all levels and phases of the first
few harmonics. The final SampEn value is simply the sum of the SampEns for all
levels and phases. The algorithm used for estimating the SampEn is given in [15]. The
number of harmonics is selectable; typically it will be smaller than for the clustering
analysis. A local SampEn is computed over a short sliding window of w glottal cycles,
where the integer w can be chosen by the user (‘Window’). The window is advanced
one cycle at a time, so a new value is obtained on every cycle. Another parameter is
the sub-sequence length (‘Length’). The ‘Tolerance’ setting is analogous to a noise
threshold. See also section 3.6.8.
For computing the sample entropy of the phases, we have potentially the same
wrap-around problem as described in section 2.6.5. Jumps of ±2π would give large
but meaningless peaks in the SampEn. Here, this is avoided by substituting the phase
with its absolute value. This gives a continuous signal, for all phases; and the
resulting ambiguity is of little consequence for the SampEn estimation.

2.8 Clustering

2.8.1 Clustering dimensions


As described above, for each harmonic k  [ 2 ... N ] , FonaDyn computes the
relative level, and the cosine and sine of the relative phase. For each EGG cycle, this
results in 3×(N-1) values. Using these values, plus the estimated level of the residual
energy and the value pair [ cos(1), sin(1) ] , the statistical clustering is performed in a
space with 3×N dimensions. This is done using a real-time implementation of the
algorithm ‘online Hartigan k-means’ [10]. Compared to other methods for clustering,
the k-means method has these advantages: (1) it computes quickly even in many

The FonaDyn Handbook 22 Version 2.4.9


dimensions, and (2) the number of points already in the clusters does not affect the
classification, only the centroid updates. The latter means that, in a data set with
thousands of EGG cycles, a small minority of cycles of an unusual shape can still give
rise to a cluster of their own, especially if they occur early in the recording. A typical
example is the weak sinusoidal cycles at onset and offset of voicing.
For the clustering to be effective, all dimensions should have values extending
over roughly the same numerical range. For this reason, the level difference values as
fed to the clusterer are expressed in Bels rather than decibels. This makes for a better
match to the (cos, sin) values, which are always in the range [-1…1].
Note that none of fo, SPL and total EGG amplitude are input as parameters to the
clustering algorithm. Also, the positions in the clustering space of the cluster
centroids are not related to the locations of the coloured regions in the voice map.
Rather, each centroid represents a particular EGG pulse shape. This means that
affinity to a cluster is not given by proximity in the fo/SPL-plane. However, different
EGG pulse shapes do tend to occupy connected regions in the voice map, typically
with some overlap. It is these regions that make up FonaDyn’s ‘cluster maps’ of the
EGG signal.

2.8.2 Resynthesis
In order to facilitate the user’s interpretation of the clustering, the cluster
centroid values of levels, cos and sin are used also to resynthesize and display the
approximated cycle waveform (section 3.1.3), by addition of cosines. A potential
problem here is that the cluster centroid values for sin and cos, since they are not
strictly paired, no longer necessarily fulfil the trigonometric identity cos2(φ) + sin2 (φ)
= 1 . Another potential problem is that the probability distributions of sin and cos are
very far from rectangular. In practice, though, tests with synthetic waveforms
(triangle, sawtooth, square) have shown that such reconstruction works well enough.
Note that the EGG signal is resynthesized so that the EGG parameters can be
evaluated, cycle by cycle. The resynthesized cycles are not concatenated back into a
contiguous signal.

2.9 Limitations
The analysis considers only the N ≤ 20 lowest harmonics of the EGG signal, and
typically 5 or 10 are used. This means that some high-frequency aspects, such as
multiple contacting events in quick succession, might not be resolved.
The preconditioning of the EGG signal means that the fo should be 80 Hz or
greater, and not higher than 10000/N Hz. The more these bounds are exceeded, the
more the clustering will start to depend on fo .

The FonaDyn Handbook 23 Version 2.4.9


3 PART THREE - Using FonaDyn

3.1 Window layout

3.1.1 Main screen


The FonaDyn main screen is a bit like the control panel of a machine. It has no
pull-down menus; almost all controls are visible (Figure 5).

Row 1: Graphs layout, Output directory, Settings


Row 2: Inputs: live signals, file, file batch, or script; Keep Data; Start/Stop, Pause; clock
Row 3: Outputs: playback, record to file, log file (+ optional time files)
Selected time plots: Incoming EGG
Values during the last 1-10 s waveform oscilloscope Voice map panel
Pulse shapes or cycle  SPL fo →
Cluster centroids
counts, per cluster

Figure 5. The main screen of FonaDyn, with the colour scheme “Nordic Light”.

The top row holds general settings. The second row selects the signal for input,
controls START/STOP/PAUSE, and displays the time since START. The buttons in the
third row control the output options. The four subpanels with graphs show the
analysis results, in real time, as described below.
You can use Tab or Shift+Tab to move the keyboard focus from one control to the
next. Numbers can be edited with the keyboard, by dragging the mouse cursor
vertically, or with the up-arrow and down-arrow keys. Ctrl and Shift increase the step

The FonaDyn Handbook 24 Version 2.4.9


size. There are no text fields that can be edited. All file names are entered by browsing
to files, or are generated by FonaDyn.

3.1.2 Choice of Source


The most common use case is to analyze pre-recorded files, so the default choice
here is Source: From file. You can browse to the file in the usual Open File dialog,
or drag it from a file folder outside FonaDyn and drop it onto the file name field. Only
files whose names end in “_Voice_EGG.wav” can be dropped in this way.
In research, one often has reason to run and re-run analyses on multiple files. For
small batches with the same analysis settings, you can queue a moderate number of
input signal files, using Source: Batch multiple files. This choice of source is
useful also for a list of signal files to be conveniently accessed in a presentation, for
instance. The file that is selected in the list will be the first to play when ►START is
pressed.
For visualizing and recording live signals, choose Source: Record. For
recording to take place when ►START is pressed, you must also first press the
Record button. See section 3.2 for more on recording. Or, you can press ►START
without recording, if you are interested only in the real-time displays of live signals,
as for visual feedback of speech or singing.
Finally, the Source list also contains the option Run script. A script is a text file
that you create, with commands for setting the many analysis parameters, and for
running and saving analyses. This lets you set up batches that can run for hours,
while you are doing something else. Scripts are described in section 3.5.5.

3.1.3 The ‘Moving EGG’


This panel displays the incoming EGG cycles in real time, like a triggered
oscilloscope. The signal shown is the EGG signal after pre-processing (→2.4). The
period length is normalized to the width of the display frame, so it is not affected by
the voice fundamental frequency. By default, the peak-to-peak amplitude is also
normalized, to the height of the display frame. To see the actual EGG amplitude,
relative to full scale, choose Normalize: Off. This is useful for checking the
amplitude of the EGG input signal.
The display draws the n=5 most recent cycles with a fading gray scale. The
cycle curves are rendered as k=80 straight line segments. To modify this
rendering, press Settings… (→3.1.7) and check the box Show additional
diagnostic features. This activates the display of some extra control fields.
Change n by entering a different value for Count. Change k by entering a
different value for Samples. Increasing Count or Samples may give a nicer
image, but also increases the processing load.
The rightmost symbol at the top indicates the type of cycle segmentation
(→2.4.2), with Φ for the phase tracker, and Λ for the peak follower. You can
change this option, too, in the Settings… dialog box.

Tip: you can press Alt+M to hide or show the Moving EGG panel.

The FonaDyn Handbook 25 Version 2.4.9


3.1.4 The Voice Map
The Voice Map panel displays a voice map of one of several acoustic and EGG
metrics (Figure 6, Figure 7). The horizontal axis is the fo in semitones (57 MIDI =
220 Hz); or, right-click to display the fo axis in Hz. The vertical axis is the sound level
in dB. This must be calibrated to correspond to SPL @ 0.3 m microphone distance
(→3.2).

Each cell is 1 decibel high and one semitone wide. The axis extents are fixed, to
facilitate comparisons between maps; however, by default, the aspect ratio is not
fixed. It will change when you resize the main window, or rearrange the panel
layout. Although changing the aspect ratio runs against the UEP recommend-
ations, it saves space on the display screen. If you prefer to fix the aspect ratio to
2:1, then insert into the startup file the statement

FonaDyn.config(fixedAspectRatio: true);

For the examples in Figures 6-8, an amateur male singer repeated soft-loud-soft
/ɑ:/ vowels on several constant pitches over more than an octave. This recording took
about 6 minutes. Figures 6-8 show the many layers of a voice map, each of which
maps a different metric. Metrics (a) to (e) are derived from the audio signal (→2.3.3),
while the rest are derived from the EGG signal. (a) ‘Density’, where darkest gray
means >10000 EGG cycles. (b) ‘Clarity’ showing accepted cycles (green) and rejected
cycles (gray). (c) The mean of the crest factor (peak-to-RMS ratio) of the audio signal,
where red means 4 (corresponding to 12 dB). (d) The mean spectrum balance, being
the ratio in dB between the signal powers above 2 kHz and below 1.5 kHz. (e) The
cepstrum peak prominence smoothed (CPPs) in dB.

(d) Spectrum
(a) Density (b) Clarity (c) Crest factor (e) CPP-smoothed
Balance

Figure 6. The first five layers of the Voice Map display are derived from the audio signal.

The FonaDyn Handbook 26 Version 2.4.9


(e) Sample entropy (f) Normalized peak (g) Contact quotient (h) Index of contacting
(depends on settings) dEGG (Q) (Qci) (Ic)

Figure 7. The remaining layers of the Voice Map display are derived from the EGG signal.

Figure 7 shows the layers for the metrics derived from the EGG signal. (e) The
cycle-rate sample entropy; (f) the contact quotient by integration Qci ; (g) the
normalized peak dEGG QΔ; and (h) the index of contacting Ic (→2.5).

(i) Dominant cluster regions (j) Regions of clusters 1-5, with the corresponding EGG wave shapes

Figure 8. A compilation of Voice Map displays of the cluster extents (layers 10…10+N)

Figure 8 shows an example of how the automatic clustering of the EGG wave
shapes is mapped into the voice field (→2.7). The default is to use 5 clusters, coded by
colour. Each cluster has an average wave shape, and its own region in the map
(Figure 8j). The Density plot from Figure 6(a) is in the background. Interestingly, we
see that the vocal folds can vibrate without contacting at SPLs as high as 60-70 dB (or
more, at higher pitches); as shown by the blue area in Figure 7(h) or the purple area
in Figure 8(j). For this example, the wave shapes that correspond to each colour in
can be seen also in Figure 11(c). After the automatic clustering, the colours have been
sorted manually (→3.6.5) in order of descending vocal effort: ‘red’ cluster

The FonaDyn Handbook 27 Version 2.4.9


waveshapes, strongest phonation; ‘yellow’ cluster waveshapes; here, firm phonation
with full vocal fold contact; ‘green’ and ‘blue’ cluster waveshapes; here, fairly soft
phonation with brief contact; ‘purple’ cluster: softest phonation with no vocal fold
contact, as indicated by a nearly sinusoidal EGG.
Figure 8 (i) shows the overlay of all five waveshape regions, with the dominant
EGG waveshape cluster by colour. The ‘dominant’ waveshape is the one to produce
the most cycles in a given cell. Less colour saturation (whiter colour) signifies more
co-existence of wave shapes.

All the data shown in figures 6, 7 and 8 can be saved to a single .CSV text file, for
further analysis in other software (→3.5.3). Press Save Map to open a standard File
Save dialog box. Choose a filename. If the filename you enter does not end in .csv,
then FonaDyn will append _VRP.csv to the name. If the filename you enter does end
in .csv, then FonaDyn will not change it.
You can load a previously saved map by pressing Load Map. This is useful for
inspecting maps from earlier recordings. It also lets you accumulate more data into
an existing map: just check the Keep data box before starting. To accumulate more
data, you must also continue to use the same cluster data as were used to make the
loaded map, or the clustering/classification results will be meaningless.
When you press Save Image, a partial screen dump is generated and shown in a
preview window of its own. When you press ‘F’, a Save File dialog appears. Also, a list
of available image file formats is printed in SuperCollider’s Post window. Choose a
format, and type it in as the extension to the image filename that you specify. The
image as shown on screen will be saved to the chosen format. Or, you can leave the
preview window open, for easy visual comparison with your next voice map. Press ESC
to close it. Several image windows can be open at the same time. Press ‘C’ to close all
of them at once.

Tip: you can press Alt+V at any time to hide or show the Voice Map panel.

Tip: you can open a map file whose name ends in _VRP.csv by dragging the file from
its folder and dropping it onto the Load Map button.

The FonaDyn Handbook 28 Version 2.4.9


3.1.5 The Plots

The Plots panel can show time series of up to five metrics. The live display scrolls,
showing the most recent two seconds by default. You can change the duration of the
plot (1 to 10 seconds) by dragging the grid sideways with the mouse, even when the
plot is running. Durations longer than 2 seconds need to be set before FonaDyn is
started. Each fleck represents data from one phonatory cycle. This makes the display
the most CPU-intensive of the graphics, especially if the time axis is long; so if your
computer is having problems keeping up, limit the number of curves, or hide this
panel entirely (Alt-P).

Figure 9. The Plots panel. The figure shows a short excerpt from an upward glissando by
a male subject. The arrow (added here) indicates a voice break from modal into falsetto
voice. To change the displayed duration, drag the mouse cursor sideways on the grid.
Here the Sample Entropy (the lilac curve) was based on ΔL2, ΔL3, Δ2 and Δ3 .

You can choose from the three time-domain EGG metrics (→2.5), plus the audio
spectrum balance, and the sample entropy of the EGG harmonic-domain data.
Whenever the ‘clarity’ metric of the audio signal drops under the threshold, there will
be a gap in all of these traces.
In the Plots panel, the Sample Entropy controls at top right appear only when the
Sample Entropy box is checked. The SampEn metric is computed from the cycle-time
series of the EGG harmonic magnitudes and phases. By default, only the first four
harmonics (i.e. 2…5, relative to the fundamental) are used. The value shown by the
scrolling lilac curve is the sum of those eight SampEn values. The default values for
Tolerance, Window length and sequence Length (the last two in EGG cycles) have
been found by trial and error to work fairly well, but they may need to be adjusted in
different scenarios.

The FonaDyn Handbook 29 Version 2.4.9


Increasing the Tolerance suppresses the influence of small variations, for instance
if you want to detect only voice breaks. Setting the Tolerance to something small
(<0.1, with Length=1) allows analysis of the stability of the phonated tones.
Increasing the Window makes a smoother curve and reduces the temporal resolution.
Increasing the Length can make the SampEn peaks more localized.

The entire time history of the these and all other metrics can be saved into a
multichannel Log file, cycle by cycle, or at an isochronous frame rate. For making and
saving customized time-series plots of any metric in FonaDyn, create an Analysis Log
file, and then use Matlab or some other software to customize your graphs from that.
The Matlab function FonaDynPlotLogFile demonstrates how to do this.

Tip: you can press Alt+P at any time to hide or show the Plots panel.

3.1.6 The Clusters

The Clusters panel shows information related to the EGG waveshape clustering. It
has five buttons along its top:

Button States Action


Init Relearn On START, clear the current cluster data
Pre-learnedOn START, keep the currently loaded cluster data,
for continued learning, or for classification
Learning On Continue learning, updating the centroids
Off Classify incoming data, without updating the centroids
Reset Counts <push when Set the cycle counts of all clusters to zero, now. All centroids
running> are initialized to the current DFT values – but they soon
diverge. Useful for initializing, and for clearing spurious
undesired centroids.
<push when Arm for an Auto Reset: clustering begins as soon as some
stopped> stable phonation is detected.
Load Clusters Load Load a cluster data file, for initialization or for classification
Unload Clear all cluster data
Save Clusters <push> Save the cluster data to a file (*_clusters.csv)

If the filename you enter for Save does not end in .csv, FonaDyn will append
_clusters.csv to the name. If the filename that you enter does end in .csv, then
FonaDyn will not change it.
A cluster centroid is a set of values that describe the average location of all the
cycle points in that cluster’s space. In the centroids display, each colour corresponds
to one centroid (and, equivalently, to one cluster). In Figure 10, one centroid (yellow)
is shown. The thin gray line shows a running average of the most recent EGG pulses
that were assigned to this cluster.

The FonaDyn Handbook 30 Version 2.4.9


Figure 10. The graphs in the Clusters panel.

The grid to the left shows the levels and phases of the EGG harmonics, relative to
the fundamental. By definition, then, the fundamental or first partial tone has the
level 0 dB and the phase 0 radians. The gray ‘1’ at top center (0 dB, zero phase) is
thus implicit for the first partial, and is shown for visual reference only. The data
points are at the top-left corner of the glyphs. In Figure 10, the second harmonic ‘2’
has a relative level of about -10 dB and a relative phase of about -0.25π radians.
Finally, the ‘H’ shows the relative level of the residual power of all Higher harmonics,
and also the pHase of the fundamental, relative to the cycle trigger point. If you were
to string out the digits in sequence from left to right, keeping their heights, you would
get the typical power spectrum of the EGG pulses in the given cluster.
The graph to the right has three different display modes, as shown in Figure 11. By
clicking on this graph, you can toggle between a bar graph, a single clustered
waveform or all clustered waveforms. The horizontal scroll bar, too, selects one or all
clusters.
The rippling which can be seen in the resynthesized (colour) curves is an artefact
of truncating the spectrum after a few harmonics. It is known as Gibb’s
phenomenon, and it is not a property of the EGG signal. If it annoys you, it can be
suppressed by checking the button Settings… | Suppress Gibbs’ ringing
(→3.1.7). This affects only the display, not the analysis. Because this option
performs a form of low-pass filtering, it will also affect the displayed value of Q
next to the curve. However, that value is not saved anywhere, and the values
stored in the voice map are unchanged.

The FonaDyn Handbook 31 Version 2.4.9


a) Bar graph mode, with EGG cycle counts, per
cluster. The number at top left is the scale of the
vertical axis, which rescales automatically.
Click on one of the bars to display (b).
Click near the top of this panel to display (c).

b) EGG wave resynthesis display, with one wave-


shape selected. Time and amplitude are cycle-
normalized. Vocal fold contact area increases
upwards. Green line: EEG pulse resynthesized
from one of the cluster centroids. Gray line: a
running average of recent pulses in this cluster.
The values of Qci and QΔ shown here are com-
puted from this resynthesized waveform, and
are not saved. Click on the graph to display (a).

c) EGG wave resynthesis display, with all wave-


shapes selected. Time and amplitude are cycle-
normalized. Here, for example, the sinusoidal
purple signal (no vocal fold contact) is actually
much weaker than the others, while the red one
is the strongest. Click on the graph to display (a).

Figure 11. The display modes of the Cluster panel.

Tip: you can press Alt+C to hide or show the Clusters panel.

Tip: you can open a clusters file whose name ends in _clusters.csv by dragging the
file from its folder and dropping it onto the Load Clusters button.

The FonaDyn Handbook 32 Version 2.4.9


3.1.7 The Settings

To reduce screen clutter, some diverse settings that are less frequently used have
been placed in a separate dialog box.

Figure 12. The Settings…


box.

Click the blue text to open


the license text from the
Internet.

Cycle separation method selects Phase tracker or Peak follower (→2.4.2).

Clarity threshold allows you to adjust the threshold for cycle regularity (→2.3.3).
This refers to the regularity of the audio signal, not the EGG signal. The default is
0.96. With soft or breathy voices, or with running speech, you may want to lower this
setting. You can use the Clarity layer in the voice map to see which phonations are
rejected from analysis.

Play the EGG signal on the second output, please. The default is to play the
audio signal on both outputs.

Keep input file name →3.5.2.

Show additional diagnostic features. This enables a number of additional


output file types, allows the tweaking of the Moving EGG display, turns on extra
logging to the SuperCollider post window, and maybe one or two other things.

Write _Gates file →3.5.3.

The FonaDyn Handbook 33 Version 2.4.9


Suppress Gibbs’ ringing in resynthesized EGG shapes →3.1.6.

Record inputs: This is a list of the hardware inputs used for recording. The first
number is the input for the audio signal (the microphone), and the second number is
the input for the EGG signal. Normally these are 0 and 1, respectively, but you can
specify other inputs if that is more convenient. Both brackets [ ] must be present, and
the numbers must be separated by commas. Also, you can record additional audio-
rate input signals, simply by listing more hardware inputs before the end bracket. The
inputs do not need to be contiguous nor in any particular sequence. All the signals
will be recorded in parallel tracks in the _Voice_EGG.wav file, at the same sampling
rate (44100 Hz). When opening such a file for analysis, FonaDyn will read the audio
signal from the first track, and the EGG signal from the second track. Any remaining
tracks will be ignored.

Record extra inputs: This is similar to the previous item, except that an
_Extra.wav output file will be created, synchronized with the _Voice_EGG.wav file, and
containing the specified signals sampled at the rate chosen in the Hz box (→3.2.7).
This is intended for acquiring slow physiological signals in parallel with the audio and
the EGG. It is also possible to sample the _Extra file at full speed (44100 Hz), but that
will consume a lot of disk space.

Colours selects one of the available colour schemes. Since a black background often
works poorly in print, you may prefer the ‘Nordic Light’ colour scheme for screen-
dump illustrations that will be printed.
Save all settings. If checked, all settings, not just the ones in this dialog box, are
saved when FonaDyn is closed. To resume with those saved settings in the next
session, start it using FonaDyn.rerun instead of FonaDyn.run (→3.5.4). The check box
Save all settings works only once: it is not restored on FonaDyn.rerun, since you
would then risk overwriting a good setup by mistake, when FonaDyn is closed the
next time.
A more flexible but also more complex way of setting up (and running) FonaDyn
automatically is to use a command script (→3.5.5).

The FonaDyn Handbook 34 Version 2.4.9


3.1.8 Comparing Voice Maps
The foremost selling point of voice mapping in general is the notion of comparing
voice maps to each other; in particular, before and after an intervention with a voice
patient or student. FonaDyn helps you to do so by optionally displaying up to four
maps simultaneously: BEFORE, NOW, DIFF or TWIN. When displaying more than one
map, FonaDyn adds a row of buttons that control how you interact with these maps.

Figure 13. Controls for laying out multiple voice maps.

Multiple voice maps can be displayed as tiled next to each other, or stacked on top of
each other such that only the selected one is visible. By selecting BEFORE, NOW, DIFF
or TWIN as the ‘top of the stack’, you can readily compare the maps visually to each
other. The button at the top left toggles between the tiled view ( ┌┬┬┐ ) and the
stacked view ( ╒══╗ ).
In the stacked view, each coloured button brings the chosen map to the top of the
stack. When you select a layer in that map, that selection is propagated to the other
maps as well, except TWIN. In the tiled view, all maps are shown. The layer selection
is then propagated only from the map chosen with the buttons at the top.
To save screen space with multiple maps, you may wish to choose Show: One
graph at the very top left, as shown in Figure 13. This hides all the other graphs and
so frees up space in the FonaDyn window. The Show: All Gallery option instead
arranges the other graphs along the bottom.
NOW: The current voice map, into which FonaDyn records new data, is called the
NOW map. It is the same as the single map you normally use.
BEFORE: If you press Alt-B, a new voice map appears that initially is a copy of the
NOW map, complete with all its layers. Or, you can load a different _VRP.csv file into
the BEFORE map. Unlike NOW, BEFORE remains unchanged when FonaDyn is running.
It acts as a reference, typically for showing a pre-intervention status. You can now
record or load post-intervention data into the NOW map. Press Alt-B again to close
the view of the BEFORE map.
DIFF: Once you have two finished voice maps on display, you can press Alt-D to
open a third map, DIFF, in which the cell-by-cell differences between the NOW and
BEFORE maps are displayed.1 The direction of comparison is NOW minus BEFORE.
Green colour signifies an increase in the NOW map, red a decrease. For most layers in
the NOW and BEFORE maps, there is a corresponding difference layer in the DIFF map.
(For the layers Density, Clarity and Clusters, computing the difference would make
little sense.) The DIFF map is computed only when it is created. It is not updated in

1 For the Q metric, the post/pre ratio is displayed instead, because it is more appropriate.

The FonaDyn Handbook 35 Version 2.4.9


real time. Press Alt-D again to close the DIFF map display. There is not yet any facility
for saving difference maps to a _VRP.csv file.
TWIN: Sometimes you may wish to observe how two voice metrics relate to each
other, even as the map is being acquired. Press Alt-T to display a TWIN map that is
updated in the same way as the NOW map, but can show some other metric (layer).
The TWIN map cannot be loaded or saved, because it contains the same data as the
NOW map. Only the displayed layer can be different. In the TWIN map, the layer
selection is always independent of that in the other maps. Press Alt-T again to close
the TWIN map.

Press Action
Alt-T Toggles the display of a TWIN map that shows the same real-time data as the
NOW map, but you can select a different layer (metric).
Alt-B Toggles the display of a BEFORE map that does not change, but is convenient
for visual comparison.
Alt-D Toggles the display of a DIFF map that shows a snapshot of the cell-by-cell
differences between NOW and BEFORE. The DIFF map does not change when
NOW changes. Make a BEFORE map first, or DIFF will be empty.
Alt-V Toggles the display of all voice maps, multiple or not.
╒══╗ Toggles between Stacked or Tiled display of multiple maps.

3.2 Recording

3.2.1 Recording environment


Use a quiet room. FonaDyn uses the ‘clarity’ metric as a criterion for acceptance.
It acts like a gate, such that the EGG analysis will proceed whenever the audio signal
from the microphone is sufficiently periodic, even if it is very soft. The rationale for
this design is that the signal-to-noise ratio of the audio channel is typically much
higher than that of an EGG device; and while a weak EGG signal is often electrically
rather noisy, its lower harmonics can still be analyzed. It follows that not only the
subject’s voice but also other tonal sounds in the recording room may open the EGG
analysis gate, including other voices, a piano, or machinery such as air conditioning,
traffic outside, etc.
The recording environment should follow the established recommendations [16]
for VRP acquisition, with regard to ambient noise and room absorption. If a whirring
computer or any other source of tonal or soft fan noise must be present in the
recording room, then the microphone should be of a cardioid type and be pointed
exactly away from the noise source. In highly absorbent rooms this can be rather
effective, for noise from one direction only. If the subject needs prompting pitches,
auditory stimuli, or a background audio track, these should be presented over closed
circumaural headphones.
Headphones can also be useful for restoring some hearing-of-self in anechoic or
very dampened recording rooms. Some audio interfaces even have a built-in reverb

The FonaDyn Handbook 36 Version 2.4.9


effects unit, which can help, if applied in moderation. Just be sure that the reverb
signal is not routed into the signal that is being recorded.

3.2.2 Normal setup


By default, the two output channels of your audio interface play a copy of what the
microphone is picking up. In this way, you can monitor the incoming audio signal on
headphones, and check that it is free from noise and hum. You may wish to connect a
loudspeaker instead, for listening to earlier recordings together with other people.
The button Playback/Echo turns the audio output On or Off. If you have a
loudspeaker connected, in the same room as the subject, it can be useful to choose Off
prior to recording, to prevent feedback. This must be done before pressing START.
Or, monitor with headphones.
Processing and display are started when you press START, but nothing is saved
to disk, by default. This is useful for checking levels, and for working interactively
with real-time feedback in a voice studio, for instance.

3.2.3 Check the EGG level


Connect the EGG device and strap the electrodes onto the subject’s neck. Turn on
the EGG device and ask the subject to phonate. For the white ‘Moving EGG’ panel,
choose Normalize: Off. Press START, and wait for that button to display STOP.
Adjust the output level of the EGG device such that you get a clearly visible signal
when the subject is phonating. Try also adjusting the electrode position vertically for
maximum signal. Ideally, the displayed curve should exercise the upper half of the
vertical axis, for the strongest signals.
FonaDyn monitors the EGG signal amplitude during recording. If the usable
range is exceeded, the words EGG CLIPPING are flashed for about a second. This
typically happens not so much for strong phonation, but more often if the larynx
moves a lot. It is usually the near-DC drifting that pushes the EGG signal out-of-
range of the A/D converter. If so, reduce the output level of your EGG device. If
necessary, insert a signal attenuator between the EGG device and the input to your
audio interface (sound card).
Once the EGG amplitude appears to be satisfactory, press STOP, and wait until
that button again displays START. Restore Normalize: to On.

3.2.4 Calibrate for SPL


At this stage in recording, it is time to calibrate for sound pressure level. This is
such an important topic that it is presented in a section of its own (→3.3), rather than
expanding upon it here. Please read that section first, and then continue here.

3.2.5 Record live signals


• In the top panel of FonaDyn, check that the Output Directory is the one where
you want new recordings to be stored. If it is not, use the adjacent Browse…
button to select another directory.
• In the Source list, select Record.

The FonaDyn Handbook 37 Version 2.4.9


• Select Record: Ready. An empty text field appears, where the name of the new
file will be shown. You cannot choose the file name yourself, even though this field
is editable. FonaDyn creates a file name based on the current date and time
(→3.3.7). The timestamp makes the filename unique.
• Press START. After a moment, the Record button becomes bright red to
indicate that Recording is in progress.
• Have the subject perform the phonatory production procedure.
• Press STOP. After a moment (possibly quite a few seconds), the Record button
returns to the dull red Ready state.
• FonaDyn has now created the file and given it a name based on the current date
and time. You can copy the resulting file name of the _Voice_EGG.wav file from the
panel, and paste it into your log of the experiment, with a comment on what was
recorded. You can of course also rename the file afterwards, if you wish; but let
the new name end in _Voice_EGG.wav, if possible, because this simplifies the post-
analysis with FonaDyn.
• When done, select Record: Off. It is easy to forget it in the Ready state, in
which case new files will consume your disk space whenever you press START.

3.2.6 Re-recording during analysis


Even when you select a file for analysis rather than the live inputs, it is possible to
re-record the input file. This would seem rather pointless, but for two things. First,
the re-recorded file will receive a new time-stamped file name, with the same time
stamp as any other output files that you may be generating in the same run. This may
make it easier to track different analyses of the same file. Second, the re-recorded file
contains not the raw input EGG and voice signals, but rather the preconditioned
signals, which allows you to inspect the signals as they are after conditioning, if you
wish. The conditioned EGG signal will be about 34 ms delayed relative to the audio
track. To remind you that you are re-recording, the Record button turns orange
rather than red.

3.2.7 Recording additional signals in parallel


FonaDyn itself normally records a stereo WAV file only, with voice and EGG. If you
want to record additional signals in synchrony with voice and EGG, FonaDyn can do
that, too. Typically, one may wish to record also slow physiological signals such as
subglottal pressure, larynx height, or breathing-related signals. These cannot be
recorded by audio interfaces, because the latter block DC, and attenuate AC signals
below about 20 Hz. However, some audio interfaces offer so-called ADAT optical
connections, each of which adds another 8 inputs or outputs. Enthusiasts in the
music community for analog synthesizers have developed DC-coupled ADAT-linked
converters for slow control voltages [17].
Be aware that such ‘consumer’ devices are not generally certified for safe use with
body-contact transducers. Although malfunction is rare, a precaution would be to
power them from batteries rather than from the mains.

The FonaDyn Handbook 38 Version 2.4.9


FonaDyn can acquire such signals, at a much lower sampling rate (default:
100 Hz), thereby saving a lot of disk space. No anti-aliasing is performed, so the
signals should be band-limited to half the sample rate. To activate recording of extra
channels, first activate the total required number of hardware inputs in the
SuperCollider startup file, and restart SCLANG. In FonaDyn, open the Settings dialog.
In the text field Record extra inputs, type a list of the inputs to which you have
connected the extra signals. The list must consist of comma-separated integers, and
must be enclosed in square brackets, like so: [10,11,12,13,14] . The inputs must
refer to active hardware inputs. The listed inputs do not have to be contiguous, nor
need they be in any particular order. The order that you specify is the order in which
the signal tracks will appear in the output file.
The output file will be a multichannel WAV file of 16-bit integer data, with as many
interleaved channels as you have specified in the list. The signals will be synchronized
with those in the _Voice_EGG file. The filename will have the same timestamp as the
_Voice_EGG file, followed by _Extra.wav. The sampling rate is nominally 100 Hz per
channel. You can request a different sampling rate in the popup menu. Only rates
that are an integer divisor of 44100 are allowed, or the extra channels could suffer
from sampling jitter. To turn off the recording of these extra channels, clear the text
field Record extra inputs in the Settings dialog box.
It is also possible to record additional audio-rate channels, simply by listing the
desired inputs in the field Record inputs. Here, the first two numbers (usually
[0,1]) are reserved for the microphone and EGG signals. Record more channels
simply by listing more input numbers before the closing bracket; or fetch the mic and
EGG signals from other inputs by changing [0,1] to something else.
Note also that if the ‘singer mode’ (→3.3.6) is active, then the audio-rate channels
will all be stored with 24 bits per sample rather than 16 bits, consuming 50% more
disk space.
Not all soundfile editors can display multichannel signals with different sampling
rates at the same time. You may want to try Sopran, a freeware by Svante Granqvist
(www.tolvan.com) that does this, and much else.

3.2.8 Using FonaDyn without an EGG device


If you are interested in mapping only the acoustic metrics, or do not have an EGG
device at hand, FonaDyn still wants a periodic signal to work on. The simplest
workaround is to fake an ‘EGG’ signal by copying it from the voice input. To do so,
put this line into the SC startup file:

FonaDyn.config(inputVoice: 0, inputEGG: 0);

This means that both the voice and what FonaDyn thinks is the EGG will be taken
from the first hardware input (0, or substitute another input number, if that is where
you have connected the microphone). While all ‘EGG’ results will then be practically
random and meaningless, the displays of Density, Clarity, Crest factor, Spectrum
Balance and Cepstral Peak Prominence will be correct. Or, if you are curious, you can

The FonaDyn Handbook 39 Version 2.4.9


test what FonaDyn makes of an accelerometer signal or a photoglottographic signal,
instead of the EGG. We haven’t yet had time to do that.

3.3 SPL calibration


FonaDyn assumes a fixed calibration of the signal level in the . WAV files it
analyzes (→2.3.3), and there is no facility for offsetting of the SPL after recording.
This is by design. By calibrating accurately for SPL at the time of recording, you can
be confident that all audio signals recorded in FonaDyn have the same SPL calibra-
tion. In subsequent analyses, this saves a lot of time and reduces the potential for
errors. To facilitate your diligent compliance, this section describes the logic of
calibrating, and also how to use the interactive SPL calibration tool that is provided
with FonaDyn. It is started with the button Calibrate… that appears when you are
about to record.
In order for the SPL axis in the voice maps to be correct, a signal amplitude of ±1
(full scale) must correspond to an instantaneous peak pressure of ±20 Pa. The sound
pressure 1 Pa of a sine wave (=94 dB SPL re 20 µPa) at a distance of 30 cm in front of
the speaker must result in a signal amplitude of 0.050 RMS (0.071 peak) relative to
full scale, in the audio track of the recorded file. The recording microphone does not
actually have to be placed at 30 cm in front of the speaker; for instance, a headset
microphone can also be used. But the signal gain, wherever the microphone is, must
be adjusted so that this condition is met at that frontal position, when all sound that
arrives there is from the voice under measurement.
The choice of calibration method will depend on your choice of microphone
placement, and on the equipment at your disposal. We will here describe four
calibration scenarios A…D, in order of decreasing precision. They are also summar-
ized in Table 1. Other criteria may be more important to you. These scenarios are all
supported by the walk-through SPL calibration tool provided with FonaDyn.
In scenarios A and B, the front mic can be anywhere during calibration, but
during recording it must be placed at a constant 0.3 m in front of the speaker. The
participant must remain still. In scenario C and D, the voice is recorded through a
headset mic. In scenario C, the headset mic path gain is carefully matched to that of
an calibrated front mic. In scenario D, only a headset mic and an SPL meter at 0.3 m
are needed, but the calibration will not be as precise as in A, B or C. Always perform
the calibration for the scenario that you will be using for the recordings. Always
perform a level calibration for each new subject, and for each new placement of the
recording microphone.
In all scenarios, you will need to control the gain of the microphone signal – that is
the objective of the calibration. Depending on your audio interface, this adjustment
might be done with a physical knob that is usually located next to the microphone
connector, or an on-screen ‘knob’ in the interface’s control software, or both. Make
sure that you understand and document all the points in your signal chain at which
the microphone signal gain can be modified. Choose one of them as the primary gain
control, and document and fix the settings for the others.

The FonaDyn Handbook 40 Version 2.4.9


Table 1. SPL calibration scenarios.

Scenario Pros (+) and cons (–) of this scenario


A A front mic, with a matching level + simple, stable, accurate calibration
calibrator device that is sealed to the+ sealed attachment eliminates ambient
microphone capsule, and produces noise during calibration
94 or 114 dB. + a dB meter is not needed
- voice must be recorded through the
front mic, keeping a constant distance
- fairly high cost of calibrator device
B A front mic that is recorded with a + almost as good as A
tone generator playing over a near + a calibrator device is not needed
loudspeaker and a dB meter held (but the dB meter’s calibration must
close to the microphone be valid)
- more sensitive to ambient noise
- the voice must be recorded through
the front mic, keeping a constant
distance
C A headset mic, the gain of which is + the exact mouth-to-mic distance of the
adjusted to give the same voice headset need not be known
signal strength for a sustained /a/ as + the distance is held constant, so the
that arriving through a front mic participant is free to move the head
whose gain has already been when recording
calibrated as in A or B. The front mic - requires an extra microphone, and
must be at 0.3 m when adjusting the two mic inputs during calibration
headset mic gain.
D A headset mic, the gain of which is + low cost
adjusted to give the same on-screen - with a sustained vowel, it is difficult
voice sound level for a sustained /a/ to get a stable reading and to match
as that shown by a dB meter at accurately the on-screen SPL with
0.3 m in front of the speaker. the SPL on the dB meter. Not
recommended.

To run any of the scenarios, press the Calibrate… button. A window appears like
the one on the next page, with an oscilloscope, a spectrum analyzer, several buttons,
and a terse version of the instructions that follow here.

The FonaDyn Handbook 41 Version 2.4.9


Figure 14. The calibration tool, scenario A, with the input signal copied to four of six
inputs, and the gain on input zero correctly set. Note the sine wave coming from the
calibrator device at 94 dB RMS, 97 dB peak. The level meter tick marks are at
multiples of 10 dB.

3.3.1 A: Calibrator device that fits the microphone


This scenario is applicable if you are recording voice with a fixed microphone
distance of 0.3 m and you have a calibrator device that fits the microphone. There are
SPL calibrator tone generator devices that attach hermetically to laboratory micro-
phones. Such a device produces a 1 kHz sine wave at 114 dB (10 Pa RMS pressure) or
94 dB (1 Pa RMS pressure), when properly attached to the microphone.
1. From the drop-down list, select Scenario A.
2. Connect the front microphone to the first microphone input on your audio
interface.

The FonaDyn Handbook 42 Version 2.4.9


3. Set the calibrator to 94 dB, turn it on, and listen for the tone (as a battery
check). Then turn it off again.
4. Attach the calibrator to the lab microphone, as exemplified in the on-screen
picture.
5. Press Next. A level meter appears on the screen, as a green vertical bar.
6. Turn on the calibrator. The level meter should quickly rise to a constant value.
7. On your audio interface, carefully adjust the microphone signal gain so that the
green level meter displays 94 dB.
8. Document the resulting gain setting(s) and then leave them untouched.
9. Stand the microphone at 30 cm in front of the participant’s mouth.
You have now calibrated the gain for recording voices using that microphone at a
constant 30 cm distance. With a little practice, this procedure takes less than one
minute.

3.3.2 B: Front microphone and SPL meter


In this scenario, you use a SPL (dB) meter, a small powered loudspeaker, and a
calibration noise that is provided by the computer.
1. From the drop-down list, select Scenario B.
2. Connect the second output channel of your audio interface to the powered
loudspeaker, and then turn the loudspeaker on.
3. Switch the dB meter to ‘Slow’ and to ‘A-weighting’.

Choosing ‘A’-weighting at this point reduces the influence of background noise


during the calibration. At the frequency peak of the calibration noise, there is no
difference between the A-weighting scale and a flat frequency response. The
levels shown in FonaDyn are not A-weighted.

4. Place the microphone 0.3 m in front of where the subject’s mouth will be, and
place the dB meter right next to it, as shown in the picture. The tip of the dB
meter should be close to the microphone (within 3 centimeters). The dB meter
does not need to point in the same direction as the microphone – dB meters are
omnidirectional. Rather, you must be able to see the dB meter display and the
computer screen at the same time. A stand for the dB meter will help.
5. Place the loudspeaker at least 0.5 m from, and facing the microphone+dB
meter combo. The exact distance is not critical. If the microphone has an
omnidirectional pickup pattern, the loudspeaker can be in any direction. If the
microphone has a cardioid pickup pattern, the loudspeaker should be
somewhere in front of the microphone.
6. Press Next. An on-screen Volume control appears, and a noise (band-passed
around 1 kHz) starts playing on the loudspeaker.

In any room with reverberation, a stationary sine wave would create an invisible
standing wave pattern, with narrow nodes (silent spots) in positions that are
impossible to predict. Even the 2 cm between the dB meter and the microphone
can make a difference. Using a narrow-band noise instead of a sine tone

The FonaDyn Handbook 43 Version 2.4.9


minimizes the influence of such nodes, but adds a little instability to the sound
level. To stabilize the reading, the on-screen level meter is very slow.

7. Adjust the volume of the noise to be moderately loud, but not so loud that it
sounds distorted or discomforting to the ear. The actual dB level of the noise is
not important.
8. Press Next. An on-screen level meter appears that shows the level of the
incoming microphone signal.
9. Now be quiet, so that only the calibration noise reaches the microphone. On
your audio interface, or on its software control panel, adjust the input gain for
the mic signal so that the on-screen level meter shows the same value as the
value displayed on the external dB meter.
10. Document the resulting gain setting(s).

You have now calibrated the gain for recording voices using that microphone at a
constant 30 cm distance. The loudspeaker and the dB meter can be removed from the
scene. With a little practice, this procedure takes less than three minutes.

3.3.3 C: Headset microphone with a front reference microphone


A headset microphone keeps the microphone distance constant, allowing the
participant to move during recording, and it improves the ratio of voice to ambient
noise. However, the actual microphone distance is not the standard 0.30 m, and can
be tricky to measure accurately. In this scenario, you wish to use a headset micro-
phone, but with a level calibrated to that of a reference microphone at 0.30 m. The
reference microphone will need to use an spare microphone input on the same audio
interface.
1. Connect the headset microphone to the first microphone input on your audio
interface, and the front reference microphone to the spare microphone input.
Place the headset mic on the participant.
2. Run the command FonaDyn.calibrate.
3. Calibrate the gain of the front reference microphone as for scenario (A) or (B),
but on the spare microphone input instead of on the first one.
4. From the drop-down list, select Scenario C. Two on-screen level meters
appear, one for each microphone. Also, a third meter showing the level
difference appears.
5. Set the two Input fields so that the headset mic signal (first input) is on meter
one, and the front reference mic signal (spare input) is on meter two.
6. Have the participant vocalize on a stable, sustained /a/ vowel, keeping the
participant’s mouth at 0.30 m in front of the reference microphone. The
participant may not move, but the particular voice level or pitch are not
important.
7. While the participant is vocalizing, adjust the gain of the headset microphone
so that the on-screen level difference becomes as small as possible. The yellow
bar turns green when the level difference is smaller than 0.5 dB. This
balancing operation gives greater accuracy than can be had in Scenario D.

The FonaDyn Handbook 44 Version 2.4.9


8. Document the resulting gain setting(s).

You have now calibrated the gain for recording voice using the headset microphone
as worn by this participant on this occasion. The participant is free to move the head
during recording. The recording will be at a level corresponding to that of a front
microphone fixed at 30 cm. With a little practice, this procedure (including step 2)
takes less than five minutes. The placement of the headset on the participant’s head
must remain the same during the subsequent recording.

3.3.4 D: Headset microphone and dB meter only


Because head-mounted microphones are very close to the mouth, their pick-up is
also very sensitive to small variations in the small distance. You will need to make a
new calibration every time the headset microphone is adjusted or hung anew on the
participant. The participant’s own voice is used as the sound source.
1. Run the command FonaDyn.calibrate.
2. From the drop-down list, select Scenario D.
3. Have the subject wear the microphone boom and adjust the capsule so that it
is at least 7 cm from the center of the mouth, and to one side.
4. Switch the dB meter to ‘Slow’, and to ‘C-weighting’ or ‘Linear’. Stand the dB
meter so that its tip is 30 cm in front of the participant’s mouth, and orient the
meter so that you can read it while you are sitting at the computer.
Tip: to read the dB meter more easily while at the computer, you can
instead set up a USB webcam aimed at the dB meter, and use a webcam
app such as AMCap to display temporarily an on-screen image of the dB
meter, next to the on-screen level meter.
5. Press Next. An on-screen level meter appears.
6. Ask the participant to take a big breath and then sustain an /a/ vowel at a level
around 80 dB, as steady as possible. This may require some prior practice.
7. On your audio interface, or in its control panel app, adjust the input gain for
the mic signal such that the level displayed on-screen becomes the same as the
value displayed on the dB meter (to within the nearest decibel).
8. Document the resulting gain setting(s).

You have now made an approximate calibration of the gain for recording voice using
the headset microphone as worn by this participant on this occasion. With a little
practice, this procedure takes less than three minutes. The placement of the headset
on the participant’s head must remain constant during the subsequent recording.

The FonaDyn Handbook 45 Version 2.4.9


3.3.5 Verifying a level calibration
To verify a level calibration from scenario A, make a recording of the calibration
tone while the calibrator is attached and active, and then inspect the tone in an audio
wave editor. A 114 dB SPL sine wave should have a peak amplitude of 0.71 relative to
the maximum (clipping) amplitude in the audio track, or 0.071 for 94 dB SPL. If this
is the case, FonaDyn will display the audio SPL correctly in dB relative to 20 µPa, for
that microphone, at 30 cm distance. If the participant maintains the distance of 0.3 m
to this microphone, the standards are respected.
You can also compare your recorded calibrator tones with the supplied files
114dB-100--800Hz_Voice_EGG.wav or 94dB-110--1760Hz_Voice_EGG.wav, which
contain sine waves in the audio channel at the correct amplitude for the indicated
sound pressure levels.

3.3.6 Recording very loud voices


Very loud voices, such as operatic sopranos, can easily reach higher SPLs than
117 dB at 0.3 m. To accommodate this, FonaDyn can optionally handle a maximum
SPL of 140 dB. You request this “singerMode” in the file startup.scd, as follows:

// The SC class library must be recompiled when .config has changed.


FonaDyn.config(singerMode: true);

When this setting is in effect, several things change, as follows:


1. On voice maps, the SPL scale extends to 140 dB, and the horizontal line at 120 dB
SPL is plotted as fatter.
2. The voice signal amplitude as shown in soundfile editors will be 10 times smaller,
to give the extra 20 dB of headroom.
3. Recordings are saved to 24-bit files (*_Voice_EGG.wav) rather than 16-bit.
4. The full-scale amplitude in the audio track of *_Voice_EGG.wav files is assumed to
represent ±200 Pa (137 dB SPL for a sine wave), rather than ±20 Pa.
5. The FonaDyn.calibrate tool automatically adjusts to the larger range.

When the “singer mode” is not in effect (and by default it is not), then the following
applies:
6. FonaDyn will refuse to open voice map files (*_VRP.csv) containing SPLs higher
than 120 dB.
7. On opening 24-bit .wav files for analysis, FonaDyn will automatically switch to
“singer mode.”

The FonaDyn Handbook 46 Version 2.4.9


3.3.7 Examining live signals
The SuperCollider IDE (Integrated Development Environment) has several ‘widgets’
that can be helpful when connecting and testing your setup. On the Server menu,
you will find the following items.
• Show Server Meter. This opens a live VU bar meter showing the live signal
activity on all active hardware inputs and outputs. If the number of inputs is not
what you expected, modify the SC startup file (on the File menu) and reboot the
interpreter. All inputs must be on the same hardware device, or on multiple
devices that are synchronized in hardware.
• Show Scope. This opens a simple oscilloscope on the outputs and inputs. The
first number is the first output, the second number is the number of outputs (and
inputs) shown. The numbering is zero-based. More information on the
oscilloscope is available in the SuperCollider documentation. For instance, the
shortcut key ‘i' sets the scope to display all available inputs. This oscilloscope
does not have a triggering function.
• Show FreqScope. This opens a single-channel spectrum analyzer. You will need
to select the desired audio bus for analysis. More information on the spectrum
analyzer is available in the SuperCollider documentation.

Customized variants of these three widgets are combined in the accompanying


SPL calibration tool (→3.3). To run that tool in a window of its own, execute
FonaDyn.calibrate, typically when FonaDyn is closed. On a fast computer, you can
run FonaDyn first and then the SPL calibration tool at the same time. This lets you
see FonaDyn’s input and output signals ‘live’, which is very useful for checking signal
quality.
Internally, signals in SuperCollider run on so-called ‘buses’, rather like in a
conventional mixing desk. The first number box at the top left of the oscilloscope
controls the ‘bus’ number of the first signal. The second number controls the number
of consecutive bus signals shown. The first buses, starting at zero, are the output
buses. Their traces are coloured white. The following buses are the input buses,
which, like the level meters, are coloured in a rainbow scheme that depends on how
many inputs there are. The remaining buses, shown in gray, carry signals that are
internal to whichever program the SCSynth server is running at the moment.
The spectrum analyzer shows the power spectrum of the topmost signal in the
oscilloscope. Its dB scale is correct if calibration has been performed successfully.
There is a choice of linear or log frequency axis. The frequency axis can be zoomed
using the horizontal slider, but only if the whole window is stretched wide (this is a
bug in SC).

The FonaDyn Handbook 47 Version 2.4.9


3.4 Listening from maps

3.4.1 Background
A voice map contains no temporal information. Each cell in the map contains the
value of a metric, averaged over an entire recording. Still, it can be interesting to
know what the voice sounded like when the participant was phonating at a particular
place on the voice map. Uniquely, FonaDyn can play back from the voice map – if you
have first made a matching _Log.aiff file.

Figure 15. When ‘map listening’ is enabled, you can shift-left-click in the map to hear
the sounds at the corresponding fo and SPL. The relative positions of those sounds
are shown in the audio track window, and the rectangle cursor shows the parts that
are selected.

A Log file contains cycle-by-cycle values of the time, fo, SPL, all the metrics, the
cluster number, and a few other things. With all this information, FonaDyn can back-
track and find sounds at a specified fo and SPL in the signal file, and even match a
given EGG waveshape cluster.

3.4.2 Creating the Log file


To create a suitable Log file, follow these steps.
1. Make sure that your signal file’s name ends in _Voice_EGG.wav.
2. Choose the signal file in Source: From file.
3. Set the Output Directory to be that of the signal file.

The FonaDyn Handbook 48 Version 2.4.9


4. If you want to apply EGG clustering for the particular individual, run a cluster
analysis of the signal file, as described in section 3.5.6, re-order the clusters if
so desired (→3.6.5), and choose Save Clusters.
5. Choose Load Clusters and select the appropriate _clusters.csv file. Its
full name does not need to match.
6. Set Learning: Off. This also enables the button Analysis Log. (For the
clusters to be consistent throughout the Log file, the latter must be created
with Learning:off, keeping the cluster centroids constant, for classification.)
7. In the Settings…, check the option Keep input file name up to
_Voice_EGG.wav, then choose OK.
8. Press START and let the analysis run to completion. If you stop early, the Log
file will be too short. The Log file is created and saved automatically, with a
name matching that of the signal file. When the analysis has finished, the
audio signal will be displayed.
9. Choose Save Map to save also the new voice map. The voice map name does
not need to match (but it may help to pair it with the name of the clusters file).

3.4.3 Enabling listening from maps


The map-listening feature is still somewhat experimental, so we have made it
optional for the time being. To enable listening from maps, click the dark gray
Listen: off button at the top right of the voice map panel (this can be done only with
the mouse). The button becomes a lighter gray. With map-playing thus enabled,
FonaDyn performs a number of checks whenever you choose a new signal file for
analysis, as follows.
• The name of the chosen signal file must end in _Voice_EGG.wav.
• There must exist in the same directory a file with the same name but ending in
_Log.aiff.
• The _Log file must have a file system ‘modified’ time that is more recent than
that of the signal file. (So if you edit the signal file, you must also rebuild the
Log file.)
If any one of the above is false, the button will display Listen: no. If all of the above
are true, the button will display Listen: yes, and a new graph window will open, with
a waveform display of a copy of the audio channel of the signal file (Figure 15). To
disable listening, click the button again.

3.4.4 Playing from the voice map


When you left-click with the mouse on the voice map, the corresponding portions of
the audio waveform are highlighted. This mode of presentation was inspired by Per
Fallgren’s ingenious tool for browsing large amounts of audio [21]. If the voice map is
showing a cluster map, then FonaDyn matches not only to fo and SPL, but also to the
cluster number. If you shift-left-click, the matching portions will also be played. Press
the space bar to hear them again. To stop a playback, press the space bar one more
time.

The FonaDyn Handbook 49 Version 2.4.9


3.4.5 What is played?
The Log file information is cycle-by-cycle, but it would make no sense to play
back the sound of a single matching cycle. Instead, FonaDyn creates a little sound
clip of at least 200 ms surrounding the matching cycle, and also merges
consecutive matching cycles into the same clip. The 200 ms minimum duration
includes 70 ms fade-in and fade-out, so as to avoid clicks; that is, onsets and offsets
are gradual. Overlapping clips are merged. Non-overlapping clips are cross-faded on
playback, to make the sound as smooth as possible. The clips are marked in the audio
waveform display by vertical background bars (Figure 15).
The search for matching sounds takes the click position and brackets it; the
default tolerance is ±1.5 dB and ±0.75 semitones. This region is shown on the voice
map as a rectangle cursor. It is the part of the voice field that is searched for playable
clips. This ‘search rectangle’ can be freely positioned; it does not snap to the cell
boundaries. Clicking in different places even within the same voice map cell will give
slightly different results. You can also enlarge or shrink the search rectangle, using
Ctrl+mousewheel actions.
Because a time window that spans at least 200 ms is likely to contain cycles also
from adjacent cells or from several clusters, the matches are only approximate. The
better the match, the more transparent the search rectangle will be. When the
match is poor, the rectangle darkens. A clear rectangle means that the sounds that
you are hearing are very representative of the sounds that created the voice map at
that location. A dark rectangle means that other sounds might be mixed in with what
you hear.
You can zoom the audio window on the time axis, by doing a shift-right-
click-vertical-drag with the mouse. A zoomed window can be panned horizontally
using right-click-horizontal-drag. In the waveform window, the audio peak ampli-
tudes are shown in orange, and the RMS amplitudes in yellow. For convenient
listening, the copied audio has been normalized, so as to remove the level overhead
that is required for maintaining a calibrated SPL.
You can hide or show the audio window by typing Alt-L (for Listen). When the
audio window is hidden, you can still listen via the voice map.
If you are displaying multiple voice maps (→3.1.8), listening can be invoked
only from a NOW map or a TWIN map. The BEFORE and DIFF map types refer to (a
comparison with) a separate signal file.

The FonaDyn Handbook 50 Version 2.4.9


3.5 Files
FonaDyn can export and import several types of data file that can be used for
post-processing and display with software such as Matlab or Microsoft Excel. Some
Matlab examples are provided in the ‘FonaDyn Extras’ folder. For more information
on the signal file formats, see also the online documentation for the class
VRPViewMainMenuOutput.

3.5.1 Files in general


The Browse buttons open a file system dialog for opening or saving files. When
FonaDyn is started, the suggested directory will be the default directory for
recordings as shown in the top line. Subsequent uses of Load or Save of files will
start in the directory you last used, for that kind of file. Note also that the Open File
dialog usually has at the top a drop-down list of recent locations, which can speed up
your navigation of the file system.
You can also specify a persistent default directory for recordings, which is useful,
because the standard location is rarely the one you want. To do so, add the following
line in the SC startup file, with your desired location in double quotes:

thisProcess.platform.recordingsDir_("C:/Recordings");

In SC code such as this, always use forward slashes ( / ) in pathnames, even on


Windows. Once you have restarted SuperCollider, FonaDyn will display this path in
the field Output Directory. Recordings and log files will be saved in that directory.

Figure 16.
The usual sequences of
activities in FonaDyn,
and the types of files
they produce and
require.

The FonaDyn Handbook 51 Version 2.4.9


3.5.2 Signal files
When analyzing, FonaDyn processes either live inputs or an existing recording.
Input files do not need to have a “.wav” extension; many other sound-file formats are
supported.2 The number of bits per sample can be 16 or more. The channel count is
normally two, with the voice audio in the first channel and the EGG in the second.
The sampling rate per channel must be 44.1 kHz. Note that if in the input signal file
the number of bits per sample is 24, then FonaDyn will switch to “singer mode”
(→3.3.6). At 16 bits per sample, which is the default, two-channel signal files
consume about 10 MB of disk space per minute. At 24 bits per sample, this rises to
about 15 MB per minute.
By default, the output files that contain signals are automatically given a file
name beginning with a time stamp YYMMDD_HHMMSS_ (shown as * below). The time
is that at the start of recording; or, if analyzing a file, from the starting time of the
analysis (not the time-stamp of the analyzed recording – that would risk confusion if
the same file were analyzed repeatedly with different settings). All forms of file output
are optional, and all can be written in parallel. To the time stamp is appended a string
suffix indicating the type of the output file.
If instead you prefer the output files to receive the same base name as the input
file, go to the Settings… box (→3.1.7) and check that option. The output files will
then inherit the original time stamp (or other name), which makes it easier to know
which files are derived from which input signals. This will work only for input files
whose names end in _Voice_EGG.wav.
Recordings are made into 2-channel files at 44.1 kHz per channel and 16 bits
resolution, named *_Voice_EGG.wav. Other formats are possible (→3.2.7, →3.3.6).
Analysis Log. This is a multichannel file called *_Log.aiff . It contains one
frame of data for every EGG cycle. Each frame has these tracks: time (s), fo
(semitones), signal level (dBFS), clarity (0…1), crest factor, spectrum balance (dB),
CPPs (dB), cluster number, SampEn, Ic, QΔ, Qci, and the levels (Bels) and phases
(radians) of all analyzed harmonics of the EGG signal. See also the online
documentation for the class VRPViewMainMenuOutput. The contents of Log files can be
easily plotted in Matlab using the function FonaDynPlotLogFile.m.
In the Log file, the harmonic levels are relative to full scale. The phases are
absolute, i.e., relative not to the fundamental but to the cycle detection trigger point.
The frame rate (or “sampling rate”) in this file is selectable as cycle-synchronous,
with new data for every EGG cycle, or one of 50, 100 or 200 Hz. Frames of accepted
cycles only (above the clarity threshold) are written to this file. That is why the time
track is helpful; without it, the absolute times of each cycle would be harder to
reconstruct (see below). The data are stored as 32-bit float values. These values are
not scaled down to ±1.0, as is the convention for normal audio files. Therefore, the
signals in many of the tracks will appear to be out-of-range, if a *_Log.aiff file is
opened in a multitrack audio editor.

2Although some features of FonaDyn do require the signal file name to end in “_Voice_EGG.wav”, you
can open other file types as well, and make voice maps of them.

The FonaDyn Handbook 52 Version 2.4.9


If a file of type _Log.aiff is opened in other software, its sampling rate will be
appear to be 44100 Hz, but this is not correct. The effective ‘sampling rate’ is the fo of
the analyzed signal. There is no standardized way of storing that meta-information
into an .AIFF file.
Physiological signals can be saved synchronously into multichannel
*_Extra.wav files, at up to 500 Hz per channel and 16 bits resolution (→3.2.7).

Figure 17.
Summary of file
types handled by
FonaDyn 2.4.9.

3.5.3 Result files


Cluster data and voice map data are saved as text files. Their file names are not
given an automatic time stamp; rather, you must choose a full name yourself. You will
probably want to encode the relevant settings into the file name in your own way. The
same applies for image files.
Cluster data. When an analysis is completed, the centroids of the resulting
clusters can be saved and then reloaded. Typically, you would do this to continue
learning with more input files, or to classify other signals using the same cluster data,
or to use the cluster data in analyses outside of FonaDyn.
Figure 18 shows an example of a *_clusters.csv file opened in a spreadsheet,
with added comments. Matlab examples are provided that show how to resynthesize
EGG cycle shapes from this data.

The FonaDyn Handbook 53 Version 2.4.9


Figure 18. Example of cluster data file, opened in a spreadsheet app. This example is for
5 clusters and 4 harmonics. The rows 7-15 and the colouring are explanatory only –
they do not appear in the *_clusters.csv file.

Voice map data. When an analysis is completed, the data underlying the voice
map can be saved to a CSV file (press Save Map). An example of a *_VRP.csv file,
opened as a spreadsheet, is shown in Figure 19. Each row corresponds to an occupied
cell in the voice map graph. The first two columns give the cell coordinates. The
following columns each correspond to one layer in the voice map. Matlab examples
are provided that show how to plot customized voice map charts from this data.

Figure 19. An example of a _VRP.csv file, opened in a spreadsheet app. Each row holds
data for one cell in the voice map. Columns A and B give the cell coordinates in semitones
and dB, column C the total number of cycles in the cell, D the most recent value of the clarity
metric, E the average of the crest factor of the audio signal, F the average level difference of
the spectrum balance (dB), G the mean CPP-smoothed (dB), H the mean value of the
SampEn metric, I, J and K the means of the metrics QΔ (a.k.a. dEGGmax), Qci and Ic, L the
number of the cluster with the largest number of cycles, M and onwards the cycle counts
per cluster. Only non-empty cells are included in this file.

In these voice map files, the order of the columns and the rows does not matter;
except that the ‘ClusterN’ columns must be the last columns and in ascending order.
Columns that are expected but missing will be displayed as a blank layer. Extra
columns are allowed in the file, and will not be displayed by FonaDyn. This could be
the case for files created with a different version of FonaDyn, or with other software.

The FonaDyn Handbook 54 Version 2.4.9


For instance, you can add columns for your own purposes, and FonaDyn can still
display the map data.
The cycle counts in these files are absolute, that is, they are not scaled by fo . One
second of phonation results in 100 cycles at 100 Hz, but 400 cycles at 400 Hz. You
can obtain the approximate phonated time by dividing these cycle counts by the
𝑀𝐼𝐷𝐼−57
corresponding fo , where 𝑓𝑜 = 220 ∙ 2( 12 ) . In these voice map files, fo is quantized
to whole semitones (6% increments), so the maximum error in the duration for one
cell would be ±3% .
As the column separator in CSV files, FonaDyn uses the semicolon (;). If this is
inconvenient, you can change it in the source code file VRPMain.sc, although such a
change is not recommended. It is not possible in FonaDyn to change the decimal
character, which is always the period (.) .
In Windows, the Region settings in the Control Panel allow you to choose the
decimal character, which is then applied system-wide by all apps that heed this
setting (FonaDyn does not). In addition, Microsoft Excel allows you to specify its own
decimal character, independently of the Control Panel setting. Not exactly simple, but
workable.

3.5.4 Settings file


If Settings | Save all settings is checked when FonaDyn is closed, all settings
in the user interface are saved in a file with the name FonaDynSettings.SCarchive, in
the Platform.userAppSupportDir directory. Only one set of settings is saved. This file
is overwritten when the session is ended. If you start FonaDyn with FonaDyn.run, the
default settings are used. If you start FonaDyn with FonaDyn.rerun, the settings most
recently saved from a previous session are used again.
The state of the Save all settings box itself is not restored on FonaDyn.rerun,
because then you would risk overwriting a good setup by mistake when FonaDyn is
closed. If you want to save a complete set of settings for posterity, copy the above file
to another location and give it another name. To reinstate those settings, copy the file
back. The settings file is in editable text, although its syntax is a bit obscure. With
some knowledge of SClang, you can edit it, to manipulate FonaDyn’s settings
externally.
An alternative way of auto-initializing FonaDyn is described in the next section.

The FonaDyn Handbook 55 Version 2.4.9


3.5.5 Script files
Script files have two main uses: to bring FonaDyn’s many settings into a known
state for an experiment, and to analyze automatically a whole batch of signal files.
You can create a script file with any ‘flat text’ editor, or by using for example a
spreadsheet or Matlab. A ‘script’ is a text file that FonaDyn reads line by line. An
example is shown in the table on the next page. Each line can contain the setting of a
control variable, an action command, or commenting text. Settings can be changed
between input files. The lines in the script are echoed to the SCIDE post window as
they are executed. Some examples are shown in the table above. A more
comprehensive list of control settings is given in the file FonaDyn-v2.4-Script-
Syntax.pdf. A script that contains no RUN command will set FonaDyn into the
specified state, without running any analysis. Scripts are executed from top to
bottom. There is no syntax for breaking the order, such as jump or loop statements.
If you will be using FonaDyn repetitively, for acquiring data from many
participants, for example, it can help to auto-initialize the relevant settings whenever
a new FonaDyn session is started with FonaDyn.run. To do this, you can use an auto-
saved state and FonaDyn.rerun as described in the previous section; but only one
state can be conveniently saved in that way. An alternative is to create a named text
script and have it execute every time FonaDyn is started. To do so, add in the startup
file the statement

FonaDyn.config(runScript: "d:/full/path/to/the/scriptfile.txt");

When FonaDyn is started, it will automatically run the given script. Therefore, don’t
include any RUN statement (which would start an analysis as well).

The FonaDyn Handbook 56 Version 2.4.9


Script line text Meaning
// Example of a simple script file Double slashes start comments
io.keepData=false Lines starting in lower case are inter-
preted as assignment statements. Here:
set the state of the Keep data check box
(→3.1.4).
general.output_directory="C:/Recordings" Sets the directory for recordings and log
files. Use forward slashes / to separate
directory names, even on Windows.
io.filePathInput="L:/mystudy/recs/fileA_Voice_EGG.wav" Set the path to the next file to
analyse.
// Always give the full absolute path (not relative),
// in double quotes. Long paths are OK.
LOAD "L:/mystudy/recs/press-flow_clusters.csv" Lines starting in upper case are interpret-
ed as action statements: LOAD, RUN,
// Only files named *_clusters.csv or *_VRP.csvHOLD, or SAVE. Here: Load an existing
// can be loaded.
cluster centroids file, for classifying EGG
// If there is an error loading a file,
// the script run is aborted.
wave shapes. This file will determine the
number of clusters and harmonics to use.
cluster.learn=false // classify instead of learn Trailing comments are allowed.
HOLD // Press START to continue Pause to let the user check the display
panel. When you press ►START, the first
// You can interact with the control panel
(or next) file will be processed. When it
// while the script is running – with care
has completed, the script will regain
control. If you press ■ STOP, the script
run will be aborted.
io.keepData=true Accumulate the data from the
io.filePathInput="L:/mystudy/recs/fileB_Voice_EGG.wav" next file into the same voice map.
RUN Proceed with the next file, without the
user having to press ►START.
SAVE "L:/mystudy/maps/AB_VRP.csv" Save the finished voice map or cluster
SAVE "L:/mystudy/maps/AB_clusters.csv" centroid data.
// Creating *_Log.aiff files How to set up for saving a *_Log.aiff file.
cluster.learn=false Possible frame rates are cycle rate, 50, 100
io. enabledWriteLog=true or 200 Hz.
io.keepInputName=true // false: time-stamp it
io.writeLogFrameRate=0 // 0 for cycle rate

The FonaDyn Handbook 57 Version 2.4.9


3.5.6 Diagnostic output files
Buttons to create these files are shown only if you have checked Show
additional diagnostic features in the Settings. More detailed info can be found
in the online help for the class VRPViewMainMenuOutput.
Cycle Detection Log. This is a two-channel, 16-bit file (*_CycleDetection.wav),
containing the conditioned EGG signal, and the corresponding cycle trigger pulses.
This file affords post-inspection of whether or not the cycle triggering was accurate.
Output Points. This file type (*_Points.aiff) contains twice as many tracks as
there are harmonics. It contains the cycle-synchronous level and phase differences,
for accepted cycles only. These are the data that are input to the clustering algorithm
(→2.8). The last delta-level track contains the relative level of the residual high
frequency energy. The last delta-phase track contains twice the phase of the
fundamental, relative to the cycle trigger (for internal use).
Sample Entropy. This is a single-channel, cycle-synchronous file
*_SampEn.aiff. Not scaled.
Frequency and Amplitude. This file type does not have any on/off button, but
such a file is always written when either the SampEn measurement is written, or the
Points are written. The reason is that you often want to see these metrics together.
The file is a cycle-synchronous AIFF file with floats as samples. These files are called
*_FreqAmp.aiff.
Gates Log. This file is called *_Gates.wav. It contains five tracks of audio-rate
16-bit samples, with the raw EGG, the conditioned EGG (delayed by 34.1 ms), and
three trigger tracks. It shows exactly where all conditioned EGG pulses were
segmented and also which of the pulses that were associated with phonation regular
enough to be retained for further analysis. This enables close scrutiny of the data
input to the DFT analysis, as well as the extraction (e.g., in Matlab) of cycle-by-cycle
signal data with absolute times. Because it becomes big, this file is written only if you
have checked that option in the Settings… box, and only if there is some other
simultaneous cycle-rate output being written.

The FonaDyn Handbook 58 Version 2.4.9


3.6 EGG Shape Analysis
This section applies to the analysis of both live and prerecorded signals.

3.6.1 Pre-clustered sets of waveshapes


< to be supplied >

3.6.2 Parameters for clustering


Ideally, choosing the number of clusters should be as easy as choosing the
number of phonation types that you want FonaDyn to identify. To discriminate
between modal and falsetto singing, for example, two clusters could suffice, in
principle. But there is no way of knowing beforehand if the EGG waveform really does
change in a way that is so cleanly separable. For example, you may find that you get
better discrimination with three clusters, of which two are needed to catch all the
falsetto-mode cycles [5].
Therefore, a good strategy may be first to specify more clusters than you actually
need, and let FonaDyn try. The most common wave-shapes will tend to produce a few
well-populated clusters, while the other clusters will contain much fewer cycles. It
helps to edit your recordings first, so as to eliminate pauses and ambient sounds. If
your signal contains pauses or other non-voice events, these may give rise to one or
two ‘trash’ clusters, holding weird pulse shapes. You can save and edit the cluster data
files to remove such trash clusters. An example of this method is given in [18].
The number of harmonics will depend on your research question. If you want to
see a close representation of the actual EGG pulse shapes, 10 harmonics are usually
needed. If you think that the feature you are investigating is a low-frequency
phenomenon, or if you are mostly interested in the SampEn metric, then fewer
harmonics will usually be enough. The default of 10 harmonics seems to be
appropriate for most cases.

3.6.3 Initializing the clusters


The outcome of the clustering is very sensitive to the first few cycles that are
detected. There is usually some quiet before the subject starts phonating, which may
cause some spurious ‘cycles’ to be detected in the background noise. Then, some of
the initial centroids will be too far apart in their 3N-dimensional space to describe
EGG waveforms, and most subsequent real EGG cycles will tend to be ‘captured’ by
the one or two ‘random’ centroids that happen to best describe EGG signals.
One strategy can be to edit the signal files such that phonation starts immediately
at medium voice amplitude. In this way, the analysis will start out ‘seeded’ with
similar waveshapes in all clusters, which soon diverge as the signal changes.
Alternatively, to improve the consistency of repeated runs, FonaDyn can wait
until some stable phonation is occurring and then automatically initialize the clusters.
To use this feature, then before pressing START, press the button Reset Counts
so that it displays Auto Reset instead. When FonaDyn has seen 125 ms of
continuous phonation, above the ‘clarity’ threshold, it will clear the clusters data at
that point and start the clustering afresh.

The FonaDyn Handbook 59 Version 2.4.9


If the cluster centroids graph looks scattered and/or random, the clustering has
probably degenerated from spurious signals. Wait until phonation is stable and mid-
range, and then press the button Reset counts. From the EGG waveform at that
moment, this will generate equal ‘seed’ centroids that soon drift apart from one
another. The end result will depend slightly on exactly when Reset counts is
pressed. The final outcome will however become quite stable if you record the signals
and make a second or even a third pass over the data, using the setting Init: Pre-
learned (without pressing Reset counts or Unload in between).

3.6.4 Clustering in multiple passes


Because FonaDyn is a real-time program, it can not know anything at the START
about what the data set as a whole is going to look like. This applies to recordings as
well as to live signals. FonaDyn therefore has to adapt the clusters to the data as it
goes along. This means that, on the first pass, the cluster colours early in the
recording will not correspond to quite the same wave-shapes as late in the recording.
Over time, the clusters will become stable, but initially, it helps if the range of
variation in the early part of the signal is representative of that in the rest. Therefore
it is a good idea to try to include short excerpts from the full range of conditions as
early as possible in a recording, and also to re-run a second or even third pass over
the same signal, using the cluster data from the first pass for initialization, as
explained below.

3.6.5 Rearranging clusters


Another issue of initialization is that the algorithm’s allocation of cluster numbers
to different EGG waveshapes is necessarily arbitrary. Since each cluster number is
assigned a different display colour, this means that the cluster colours are unlikely to
be the same every time, when a recording protocol is repeated in the learning phase.
Once the learning can be turned off, as for classifying signals, this is no longer a
problem.

Figure 20.
The sequence of colours is given by the
number of clusters. You can rearrange
the cluster mapping to colours by
swapping the positions of two clusters
at a time. Since this is a swap, repeating
Ctrl-right click Ctrl-left click a click in the same place will undo the
click change.

To make your figures consistent, you may wish to re-arrange the cluster order.
This can be done only when FonaDyn has stopped. Ctrl-click left or right on the
vertical cycle-count bars (Figure 20). Two adjacent clusters are then swapped, as
shown here. Repeat this until you have achieved the desired order. The first and last

The FonaDyn Handbook 60 Version 2.4.9


clusters are also treated as adjacent; the order ‘wraps around.’ The cluster colors in
the current voice map are also updated. Cluster data are saved to files of type
*_clusters.csv.
Another way to rearrange clusters is to edit the .csv file directly. Use a
spreadsheet program to rearrange the data clusters into the order that suits your
application. A tip: first, colour the text on each row to a sequence of colours similar to
those shown in FonaDyn’s bar graph (Figure 20). It is then somewhat easier to
rearrange manually those rows into the order that you want (for instance, from weak
to strong phonation). With this method, you can also remove a cluster that you don’t
want, simply by deleting that line. When done, save the _clusters.csv file, and
reload it into FonaDyn.

3.6.6 Detecting specific EGG wave shapes


Editing _clusters.csv files also comes in handy for constructing cluster files that
capture rare but interesting pulse shapes. If you want FonaDyn to look for a specific
EGG wave shape, proceed as follows. Using an audio editor, or some numerical
procedure, create a *_Voice_EGG.wav file that contains mostly pulses of that
waveshape. Train FonaDyn on that waveshape, so that it looks right when resynthe-
sized, and then save the *_clusters.csv file. Use a spreadsheet application or flat
text editor to copy the row for the corresponding centroid, and append that row into
another existing clusters file. Save the new spreadsheet as a new *_clusters.csv file.
Repeat this for several different waveshapes, if you like. Then load your new clusters
file for classifying signals, with Init: Pre-learned and Learning: off. Adjust the
voice map display to show only the cluster of your rare waveshape. When it occurs in
the input signal, it will appear on the voice map. FonaDyn can manage up to 20
clusters (wave shapes) in one file.

3.6.7 Analyzing from multiple takes or multiple files


By default, FonaDyn clears the current cluster data and the voice map data when
you press START. If you want instead to continue accumulating recordings into the
current voice map, check the box Keep data that is to the left of the START button.
The Keep data option could be appropriate for a long acquisition protocol, during
which a subject needs to pause, or for analysing a batch of multiple input files for
which the results should be combined into one voice map. If you are analyzing a
batch of files, you will want the data to be cleared on START, but not for the following
files. If so, then defer checking Keep data until after you have pressed START. Or,
control more complex batches using a script file (→3.5.5).
In most cases, you will also want to choose the clustering setting Init: Pre-
learned. Choose Learning: On to continue the same clustering operation across
input takes or files. Choose Learning: Off to classify all the input files using one
fixed clustering (which must first be Loaded). When Init: Pre-learned is chosen,
the Auto Reset function is disabled – otherwise it would clear the learned data.
The Load Map button lets you load a previously saved voice map for inspection.
Then, check the box Keep data before starting, and you can continue accumulating

The FonaDyn Handbook 61 Version 2.4.9


voice recordings into the loaded map. The current cluster data must be consistent
with the existing map, or the results will be meaningless; use Save Cluster/Load
Cluster to ensure it.

3.6.8 SampEn parameters


The SampEn analysis is still somewhat experimental, so we have given you access
to all the settings, to try out. It has three parameters: Tolerance, Window and Length,
which can be set to different values for the harmonic levels and the phases (→2.7).

3.7 Using FonaDyn with RME audio interfaces


The digital audio interfaces from RME are very versatile and ideal for multi-
channel acquisition. Several tips for using them with FonaDyn are given in the
separate document Using FonaDyn with RME audio interfaces.
The models Fireface 400, UCX and UCX II as well as the Babyface all have their
microphone preamps on inputs 1 and 2. This is what FonaDyn expects. The larger
models Fireface 802, and the UFX models however, all have their microphone
preamps on inputs 9-12; while those on the model 800 are on inputs 7-10. You can
easily re-route FonaDyn’s two input channels, using the FonaDyn.config function
(→1.2.8) or the Record inputs list (→3.1.7).

The FonaDyn Handbook 62 Version 2.4.9


3.8 Known problems
FonaDyn version 2.4.9 has the following known issues.

1) CPU load. The higher the fundamental frequency, the more EGG cycles have to
be handled every second. In the high soprano range, FonaDyn may struggle
noticeably to keep up, unless the computer is very fast. The display may update
less smoothly, in fits and starts. The signal processing parts typically take less
than 30% of the total. Instead, most of FonaDyn’s CPU load concerns the
graphics. You can reduce the load by hiding those graphic displays that are not
needed, especially the Plots window. If your computer has a power-saving speed
setting, make sure that it is set for highest performance, not for power saving. On
Windows laptops, this setting is reached by clicking on the battery icon in the task
bar. Running video and/or screen-sharing software such as Skype or Zoom at the
same time may not be possible.
2) After a successful but longish run, it may take up to a minute for FonaDyn to go
from “Stopping...” to the really stopped state – just be patient. If the ►START
button refuses to re-appear, it may be possible to save cluster data and voice map
data, even when FonaDyn is in this waiting state. Then, close the main window,
and invoke FonaDyn.run once more, perhaps also with action (4) below.
3) Occasionally, with file input and/or output, FonaDyn will fail to start or stop
properly. This can be due to a format error in the input file (did you choose the
right file?), but more likely to inter-process handshaking errors, or network
delays. Working with all files on a local hard disk is usually best.
4) It may happen that the server process is orphaned, that is, SCSYNTH loses its
connection with SCLANG. The solution is go to the SCIDE window and invoke the
command Kill All Servers from the Server menu (or evaluate
Server.killAll), and then start again.
5) Specifying the highest numbers of clusters and harmonics (up to 20, 20) may
cause SCSYNTH to complain that the connection diagram becomes too complex. (To
appreciate this, you might look at the file FonaDyn-code-diagrams.pdf.) If you
really want that many, this can be allowed by saving into the SC startup file the
line
Server.local.options.numWireBufs = 256; // the default is 64

Now, restart SuperCollider, and you will probably be able to run up to the
maximum. The rainbow graphs are pretty, but remember that with 20 harmonics,
you are limited to fo values of less than 10000/20 = 500 Hz (→2.4.1).
6) Under the Windows Defender anti-virus system, the first time you run FonaDyn
in a session, the server process SCSYNTH in SuperCollider may take a long time to
boot (one or two minutes). You have to wait until the numbers at the bottom right
turn green, which means that SCSYNTH is up and running. On subsequent starts,
there is no delay. To avoid this delay, exclude the processes SCSYNTH.EXE and
SCLANG.EXE from anti-virus monitoring. This will also improve performance.

The FonaDyn Handbook 63 Version 2.4.9


Acknowledgments

FonaDyn is written in SuperCollider [3], an interactive real-time audio and music


processing environment, first created by James McCarthy. SuperCollider is cross-
platform (Windows, Mac, Linux, iPhone, and more) and open-source freeware. A
great deal of the theory behind FonaDyn was worked out by Andreas Selamtzis, in his
doctoral thesis. The foundation version of FonaDyn was written in 2015 by Dennis
Johansson for his M.Sc. degree project in Computer Science [15]. Isak Nilsson, in the
course of his M.Sc. degree project [18], ran tests on MacOS, recompiled
PitchDetection UGen for FFTW on MacOS, and contributed supporting Matlab code.
FonaDyn continues to be developed by Ternström and co-workers. The work was
partially funded by the Swedish Research Council (Vetenskapsrådet), project 2010-
4565.

References

[1] Ternström S, Pabon P, Södersten M (2016). The Voice Range Profile: its
function, applications, pitfalls and potential. Acta Acustica united with
Acustica, 102(2), 268-283.
[2] Ruviaro, B. A Gentle Introduction to SuperCollider. CCRMA 2015
https://ccrma.stanford.edu/~ruviaro/texts/A_Gentle_Introduction_To_Super
Collider.pdf
[3] SuperCollider website: http://supercollider.github.io/
[4] Ternström S (2019). Normalized time-domain parameters for electroglotto-
graphic waveforms. J Acoust Soc Am.;146(1):EL65-EL70. doi:10.1121/1.5117174
(included in the folder FonaDynExtras)
[5] Selamtzis A, Ternström S (2014). Analysis of vibratory states in phonation using
spectral features of the electroglottographic signal. J. Acoust. Soc. Am., 136(5),
2773-2783.
[6] McLeod P, Wyvill G (2005). A Smarter Way to Find Pitch. Proc Int’l Computer
Music Conf; ICMC 2005, 138-141. [An implementation of the above, called
“Tartini”, is included with the ‘SC3-plugins’ library of signal function blocks.]
Permalink: http://hdl.handle.net/2027/spo.bbp2372.2005.107.
[7] Pabon JPH (1991): Objective acoustic voice-quality parameters in the computer
phonetogram. J. Voice 5(3) 203-216.
[8] Dolanský LO (1955). An Instantaneous Pitch-Period Indicator. J. Acoust. Soc.
Am. 27, 67-72 (1955); http://dx.doi.org/10.1121/1.1907499
[9] Herbst C, Ternström S. A comparison of different methods to measure the EGG
contact quotient. Logopedics Phoniatrics Vocology, 2006; 31(3):126-138.
doi:10.1080/14015430500376580.
[10] McFee B (2012). More like this: machine learning approaches to music
similarity. PhD thesis, University of California at San Diego, 186 p (algorithm
B.1, p. 152), http://bmcfee.github.io/papers/bmcfee_dissertation.pdf. [An im-
plementation of the above, called “KMeansRT”, is included with the ‘SC3-
plugins’ library of signal function blocks. FonaDyn supplements this with

The FonaDyn Handbook 64 Version 2.4.9


“KMeansRTv2”, which provides the option of continued learning or classifying
with a pre-learned vector of centroids.]
[11] Richman JS, Randall Moorman J (2000). Physiological time-series analysis
using approximate entropy and sample entropy. American Journal of
Physiology, 278 (6), 2039-2049.
[12] Yu-Hsiang Pan, Yung-Hung Wang, Sheng-Fu Liang, Kuo-Tien Lee (2011). Fast
computation of sample entropy and approximate entropy in biomedicine.
Computer Methods and Programs in Biomedicine, 104 (3), 382-396.
[13] Delgado-Bonal A, Marshak A (2019). Approximate Entropy and Sample
Entropy: A Comprehensive Tutorial. Entropy 21, 541; doi:10.3390/e21060541.
[14] Fabris C, De Colle W, Sparacino G (2013). Voice disorders assessed by (cross-)
Sample Entropy of electroglottogram and microphone signals. Biomedical
Signal Processing and Control, 8(6), 920-926, ISSN 1746-8094,
http://dx.doi.org/10.1016/j.bspc.2013.08.010.
(http://www.sciencedirect.com/science/article/pii/S1746809413001237 )
[15] Johansson D (2015). Real-time analysis, in SuperCollider, of spectral features
of electroglottographic signals. M.Sc. degree thesis in computer science, KTH
Royal Institute of Technology, Stockholm, Sweden. Available online at this link
(October 2021).
[16] Schutte HK, Seidner W (1983). Recommendation by the Union of European
Phoniatricians (UEP): Standardizing Voice Area Measurement/
/Phonetography. Folia Phoniatrica 1983;35:286–288, (DOI:10.1159/
/000265703)
[17] http://www.expert-sleepers.co.uk/ You will need the modules ES-3, ES-6,
optionally the ES-7, two TosLink optical cables, an extra ‘row power’ supply
module and a rack or case in which to mount the modules. It is possible to use
an ES-9 instead of an RME and the above modules. If so, you will need a high-
quality mic pre-amp as well. Also, the control software for the ES-9 is far less
sophisticated than is RME’s TotalMix.
[18] Nilsson I (2016). Electroglottography in real-time feedback for healthy
singing. M.Sc. degree thesis in computer science and communication, KTH
Royal Institute of Technology, Stockholm, Sweden. Available online at this link
(November 2021).
[19] Fraile R, Godino-Llorente JI. Cepstral peak prominence: A comprehensive
analysis. Biomedical Signal Processing and Control. 2014; 14(1): 42-54.
doi:10.1016/j.bspc.2014.07.001
[20] Awan SN, Solomon NP, Helou LB, Stojadinovic A (2013). Spectral-cepstral
estimation of dysphoria severity: External validation. Ann Otol Rhinol
Laryngol. 2013;122(1):40-48. doi:10.1177/000348941312200108.
[21] Fallgren P, Malisz Z, Edlund J (2019). How to Annotate 100 Hours in 45
Minutes. Proc. Interspeech 2019, 341-345, doi:10.21437/Interspeech.2019-
1648.

The FonaDyn Handbook 65 Version 2.4.9


Some other relevant sources, including examples of research done with FonaDyn
[22] Selamtzis A (2014). Electroglottographic analysis of phonatory dynamics and
states. Licentiate thesis in Speech and Music Communication, Stockholm: KTH
Royal Institute of Technology, 2014. , vii, 31 p. Available online at
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-145692
[23] Selamtzis A, Ternström S (2017). Investigation of the relationship between the
electroglottogram waveform, fundamental frequency and sound pressure level
using clustering. J. Voice, 31 (4), July 2017, 393-400, available online at
http://dx.doi.org/10.1016/j.jvoice.2016.11.003.
[24] Roubeau B, Henrich N, Castellengo M (2009). Laryngeal Vibratory Mechan-
isms: The Notion of Vocal Register Revisited. J. Voice, 23 (4), July 2009, 425–
438.
[25] Matlab © The MathWorks, Inc. www.mathworks.com
[26] Herbst C (2004). MovingEGG. Available online at this link. (Accessed
November 2021).
[27] Švec JG, Granqvist S (2010). Guidelines for selecting microphones for human
voice production research. American Journal of Speech-Language Pathology,
19, 356–368, November 2010.
[28] Ternström S, D’Amario S, Selamtzis A (2018). Effects of the lung volume on the
electroglottographic waveform in trained female singers. J. Voice 34 (3) pp.
485.e1-485.e21. (2020), Open Access.
https://doi.org/10.1016/j.jvoice.2018.09.006
[29] Lã FMB, Ternström S (2020). Flow ball-assisted voice training: Immediate
effects on vocal fold contacting. Biomed. Signal Process. Control, 64.
2020;62:102064. https://doi.org/10.1016/j.bspc.2020.102064
[30] Patel RR, Ternström S. Quantitative and Qualitative Electroglottographic Wave
Shape Differences in Children and Adults Using Voice Map–Based Analysis.
J Speech, Lang Hear Res. 2021;64(8):2977-2995.
https://doi.org/10.1044/2021_jslhr-20-00717 .

The FonaDyn Handbook 66 Version 2.4.9

You might also like