MPEG-H The New Standard For Spatial Audio Coding
MPEG-H The New Standard For Spatial Audio Coding
This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least
two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been
reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES
takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved.
Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio
Engineering Society.
ABSTRACT
Recently, a new generation of spatial audio formats were introduced that include elevated loudspeakers and surpass
traditional surround sound formats, such as 5.1, in terms of spatial realism. To facilitate high-quality bitrate-efficient
distribution and flexible reproduction of 3D sound, the MPEG standardization group recently started the MPEG-H
Audio Coding development for the universal carriage of encoded 3D sound from channel-based, object-based and
HOA-based input. High quality reproduction is supported for many output formats from 22.2 and beyond down to
5.1, stereo and binaural reproduction - independently of the original encoding format, thus overcoming
incompatibility between various 3D formats. The paper describes the current status of the standardization project
and provides an overview of the system architecture, its capabilities and performance.
• How can the sound engineer/Tonmeister make 2. PREVIOUS MPEG AUDIO MULTI-
best possible use of 3D loudspeaker setups? CHANNEL CODING TECHNOLOGY
The answer to this may very well require a
learning process similar to that at the transition The first commercially-used multi-channel audio coder
from stereo to 5.1. standardized by MPEG in 1997 is MPEG-2 Advanced
Audio Coding (AAC) [16,17], delivering EBU
• In contrast to the traditional 2D surround broadcast quality at a bitrate of 320 kbit/s for a 5.1
world, where 5.1 is an established standard for signal. A significant step forward was the definition of
content production, distribution and rendering, MPEG-4 High Efficiency AAC (HE-AAC) [18] in
there is a plethora of concurrent proposals for 2002/2004, which combines AAC technology with
3D loudspeaker setups competing in the bandwidth extension and parametric stereo coding, and
market. It currently seems quite unclear, thus allows for full audio bandwidth also at lower data
whether one predominant format will evolve rates. For carriage of 5.1 content, HE-AAC delivers
which eventually can serve – similar to 5.1 for quality comparable to that of AAC at a bitrate of 160
2D – as a common denominator for content kbit/s [19]. Later MPEG standardizations provided
production, digital media and consumer generalized means for parametric coding of multi-
electronics to create a thriving new market. channel spatial sound: MPEG-D MPEG Surround
(MPS, 2006) [20,21] and MPEG-D Spatial Audio
• How can 3D audio content be distributed Object Coding (SAOC, 2010) [22,23] allow for the
efficiently and with the highest quality, such highly efficient carriage of multi-channel sound and
that existing distribution channels (possibly object signals, respectively. Both codecs can be
including wireless links) and media can carry operated at lower rates (e.g. 48 kbit/s for a 5.1 signal).
the new content? Finally, MPEG-D Unified Speech and Audio Coding
(USAC, 2012) [24,25] combined enhanced AAC coding
• How can consumers and consumer electronics with state-of-the-art full-band speech coding into an
manufacturers accept these new formats, given extremely efficient system, allowing carriage of e.g.
that many consumers may be willing to install good quality mono signals at bitrates as low as 8 kbit/s.
just a single 3D loudspeaker setup in their Incorporating advances in joint stereo coding, USAC is
living room with a limited number of speakers. capable of delivering further enhanced performance
Can they, nonetheless, enjoy content that was compared to HE-AAC also for multi-channel signals.
produced for, say, 22.2 channels? For the definition of MPEG-H 3D Audio, it was
strongly encouraged to re-use these existing MPEG
Based on such considerations, the ISO/MPEG technology components to address the coding (and,
standardization group has initiated a new work item to partially, rendering) aspect of the envisioned system. In
address aspects of bitrate-efficient distribution, this way, it was possible to focus the MPEG-H 3D
interoperability and optimal rendering by the new Audio development effort primarily on delivering the
ISO/MPEG-H 3D Audio standard. missing functionalities rather than on addressing basic
coding/compression issues.
This paper describes a snapshot of MPEG-H 3D Audio
Reference Model technology [15] as of the writing of
3. THE MPEG-H 3D AUDIO WORK ITEM
this paper, i.e. after the 109th MPEG meeting in July
2014. It is structured as follows: given that coding of
Dating back to early 2011, initial discussions on 3D
multi-channel/surround sound has been present in
Audio at MPEG were triggered by the investigation of
MPEG Audio for a considerable time, the existing
video coding for devices whose capabilities are beyond
MPEG Audio technology in this field is briefly
those of current HD displays, i.e. Ultra-HD (UHD)
introduced. Then, the MPEG-H 3D Audio work item is
displays with 4K or 8K horizontal resolution. With such
explained and the MPEG-H Reference Model
displays a much closer viewing distance is feasible and
architecture and technology are outlined. Finally, we
the display may fill 55 to 100 degrees of the user’s field
show results of some recent performance evaluation of
of view such that there is a greatly enhanced sense of
the new technology, followed by a number of expected
visual envelopment. To complement this technology
or possible further developments of the Reference
vision with an appropriate audio component, the notion
Model.
of 3D audio, including elevated (and possibly lower)
speakers was explored, eventually leading to a ‘Call For • Channel-based: Traditionally, spatial audio
Proposals’ (CfP) for such 3D Audio technologies in content (starting from simple two channel
January 2013 [26]. The CfP document specified stereo) has been delivered as a set of channel
requirements and application scenarios for the new signals which are designated to be reproduced
technology together with a development timeline and a by loudspeakers in a precisely defined, fixed
number of operating points at which the submitted target location relative to the listener.
technologies should demonstrate their performance,
ranging from 1.2 Mbit/s down to 256 kbit/s for a 22.2 • Object-based: More recently, the merits of
input. The output was to be rendered on various object-based representation of a sound scene
loudspeaker setups from 22.2 down to 5.1, plus have been embraced by sound producers, e.g.
binauralized rendering for virtualized headphone to convey sound effects like the fly-over of a
playback. The CfP also specified that evaluation of plane or space ship. Audio objects are signals
submissions would be conducted independently for two that are to be reproduced as to originate from a
accepted input content types, i.e. ‘channel and object specific target location that is specified by
(CO) based input’ and ‘Higher Order Ambisonics associated side information. In contrast to
(HOA)’. At the 105th MPEG meeting in July/August channel signals, the actual placement of audio
2013, Reference Model technology was selected from objects can vary over time and is not
the received submissions (4 for CO and 3 for HOA) necessarily pre-defined during the sound
based on their technical merits to serve as the baseline production process but by rendering it to the
for further collaborative technical refinement of the target loudspeaker setup at the time of
specification. Specifically, the winning technology reproduction. This may also include user
came from Fraunhofer IIS (CO part) and interactivity.
Technicolor/Orange Labs (HOA part). In a next step,
both parts were subsequently merged into a single • Higher Order Ambisonics (HOA) is an
harmonized system. Further improvements are on the alternative approach to capture a 3D sound
way, e.g. binaural rendering. The final stage of the field by transmitting a number of ‘coefficient
specification, i.e. International standard, is anticipated to signals’ that have no direct relationship to
be issued at the 111th MPEG meeting in February of channels or objects.
2015.
The following text discusses the role of these format
types in the context of MPEG-H 3D Audio.
4. THE MPEG-H REFERENCE MODEL
HOA:
Audio Objects:
The concept of higher order ambisonics (HOA)
Using audio objects or embedding of objects as provides a way to capture a sound field with a multi-
additional audio tracks inside channel-based audio capsule microphone. Manipulating and rendering of
productions and broadcast opens up a range of new such signals requires a simple matrix operation, which
applications. Inside an MPEG-H 3D audio bitstream, will not be discussed in detail in this publication. In
objects can be embedded that can be selected by the addition to channels and objects, also HOA content can
user during playback. Objects allow consumers to have be carried in MPEG-H 3D Audio.
personalized playback options ranging from simple
adjustments (such as increasing or decreasing the level
of announcer’s commentary or actor’s dialogue relative
to the other audio elements) to conceivable future 4.1.2. Flexibility with regard to reproduction
broadcasts where several audio elements may be
adjusted in level or position to tailor the audio playback For audio production and monitoring, the setup of
experience to the user’s liking, as illustrated in the loudspeakers is well defined and established in practice
following Figure. for stereo and 5.1. However, in consumer homes,
loudspeaker setups are typically “unconventional” in
terms of non-ideal placement and differ regarding the
number of speakers. Within MPEG-H 3D Audio,
flexible rendering to different speaker layouts is
implemented by a format converter that adapts the
content format to the actual real-world speaker setup
available on the playback side to provide an optimum
user experience under the given user conditions. For
well-defined formats, specific downmix metadata can
be set on the encoder to ensure downmix quality, e.g.
when playing back 9.1 content on a 5.1 or stereo
playback system.
MPEG-H 3D audio decoder for dedicated rendering on by tools that especially exploit the perceptual effects of
headphones with the aim of conveying the spatial 3D reproduction and thereby further enhance the coding
impression of immersive audio production also on efficiency.
headphones.
The most prominent enhancements are:
Figure 3 shows an overview of an MPEG-H 3D Audio
decoder, illustrating all major building blocks of the • A Quad Channel Element that jointly codes a
system: quadruple of input channels. In a 3D context,
inter-channel redundancies and irrelevancies
• As a first step, all transmitted audio signals, be can be exploited in both horizontal and vertical
they channels, objects or HOA components, directions. Parametric coding of vertically
are decoded by an extended USAC stage aligned channel pairs can be carried out while
(USAC-3D). binaural unmasking effects [27] can be avoided
in the horizontal plane.
• Channel signals are mapped to the target
reproduction loudspeaker setup using a format • An enhanced noise filling is provided through
converter. Intelligent Gap Filling (IGF). IGF is a tool that
parametrically restores portions of the
• Object signals are rendered to the target transmitted spectrum using suitable
reproduction loudspeaker setup by the object information from spectral tiles that are adjacent
renderer using the associated object metadata. in frequency and time. The assignment and the
processing of these spectral tiles is controlled
• Alternatively, signals coded via an extended by the encoder based on an input signal
Spatial Audio Object Coding (SAOC-3D), i.e. analysis. Hereby, spectral gaps can be filled
parametrically coded channel signals and audio with spectral coefficients that perceptually
objects, are rendered to the target reproduction have a better match than pseudo random noise
loudspeaker setup using the associated sequences of conventional noise filling would
metadata. provide.
• Higher Order Ambisonics content is rendered Apart from these enhancements in coding efficiency, the
to the target reproduction loudspeaker setup USAC-3D core is equipped with new signaling
using the associated HOA metadata. mechanisms for 3D content/loudspeaker layouts and for
the type of signals in the compressed stream (audio
In the following, the main technical components of the channel vs. audio object vs. HOA signal).
MPEG-H 3D Audio decoder/renderer are described.
Another new aspect in the design of the compressed
audio payload is an improved behavior for
instantaneous rate switching or fast cue-in as it appears
4.2. USAC-3D Core Coder and Extensions in the context of MPEG Dynamic Adaptive Streaming
(DASH) [28]. For this purpose, so-called ‘immediate
The MPEG-H 3D Audio codec architecture is built upon playout frames’ have been added to the syntax that
a perceptual codec for compression of the different enable gapless transitions from one stream to the other.
input signal classes, based on MPEG Unified Speech This is particularly advantageous for adaptive streaming
and Audio Coding (USAC) [24]. USAC is the state-of- over IP networks.
the-art MPEG codec for compression of mono to multi-
channel audio signals at rates of 8 kbit/s per channel and
higher. For the new requirements that arose in the
context of 3D audio, this technology has been extended
mapping to the loudspeaker channels that are available 4.3.2. Object renderer
in the reproduction setup.
In MPEG-H 3D, transmitted metadata allows for
The rules have been designed individually for each rendering audio objects into predefined spatial
potential input channel incorporating expert knowledge, positions. Time-varying position data enables the
e.g to avoid excessive use of phantom sources when rendering of objects on arbitrary trajectories.
rendering to the available target loudspeakers. Thus the Additionally, time-varying gains can be signaled
rules-based generation of downmix coefficient allows individually for each audio object. An overview of
for a flexible system that can adapt to different MPEG-H audio metadata is provided in [32].
input/output configurations, while at the same time
ensuring a high output signal quality by making use of The object renderer applies Vector Base Amplitude
the expert knowledge contained in the mapping rules. Panning (VBAP, [29]) to render the transmitted audio
Note that the initialization algorithm compensates for objects to the given output channel configuration. As
non-standard loudspeaker positions of the reproduction input the renderer expects
setup, aiming at the best reproduction quality even for
asymmetric loudspeaker setups. - Geometry data of the target rendering setup.
Active downmix algorithm - One decoded audio stream per transmitted audio
object.
Once the downmix coefficients have been derived, they
are applied to the input signals in the actual downmix - Decoded object metadata associated with the
process. MPEG-H 3D Audio uses an advanced active transmitted objects, e.g. time-varying position data
downmix algorithm to avoid downmix artefacts like and gains.
signal cancellations or comb-filtering that can occur
when combining (partially) correlated input signals in a As presented in the following, VBAP relies on a
passive downmix, i.e. when linearly combining the triangulation of the 3D surface surrounding the listener.
input signals, weighted with static gains. Note that high The MPEG-H 3D Audio object renderer thus provides
signal correlations between 3D audio signals are quite an automatic triangulation algorithm for arbitrary target
common in practice since a large portion of 3D content configurations. Since not all target loudspeaker setups
is typically derived from 2D legacy content (or 3D are complete 3D setups, e.g. most setups lack
content with smaller loudspeaker setups) e.g. by filling loudspeakers below the horizontal plane, the
the additional 3D channels with delayed and filtered triangulation introduces imaginary loudspeakers to
copies of the original signals. provide complete 3D triangle meshes for any setup to
the VBAP algorithm.
The active downmix in the MPEG-H 3D Audio decoder
adapts to the input signals in two ways to avoid the The MPEG-H 3D Audio object rendering algorithm
issues outlined above for passive downmix algorithms: performs the following steps to render the transmitted
Firstly, it measures the correlation properties between audio objects to the selected target setup:
input channels that are subsequently combined in the
downmix process and aligns the phases of individual • Search for the triangle the current object
input channels if necessary. Secondly, it applies a position falls into.
frequency dependent energy-normalization to the
downmix gains that preserves the energy of the input • Build a vector base L = [l1, l2, l3] out of the
signals that have been weighted by the downmix three unit vectors pointing towards the vertices
coefficients. The active downmix algorithm is designed of the selected loudspeaker triangle.
such that it leaves uncorrelated input signals untouched,
thus eliminating the artefacts that occur in passive • Compute the panning gains vector G = [g1, g2,
downmixes with only minimum signal adjustments. g3]T for transmitted object position P according
to P = L*G à G = L-1* P
4.3.3. SAOC-3D decoding and rendering The spatial coding block for the HOA representation
applies two basic principles: decomposition of the input
field and decorrelation of the signals prior to
In order to serve as a technology component for 3D
transmission in the core coder, both of which are
audio coding, the original Spatial Audio Object Coding
described in the following.
(SAOC) codec [22, 23] has been enhanced into SAOC-
3D with the following extensions:
• While SAOC supports only up to two 4.4.1. Decomposition of the sound field in
downmix channels, SAOC-3D supports more encoder
(in principle an arbitrary number of) downmix
channels. In the HOA encoder the sound field determined by the
HOA coefficients is decomposed into predominant and
• While rendering to multi-channel output has ambient sound components. At the same time,
been possible with SAOC only by using parametric side-information is generated that signals the
MPEG Surround (MPS) as a rendering engine, time-varying activity of the different sound-field
SAOC-3D performs direct decoding/rendering components to the decoder.
to multichannel/3D output with arbitrary
output speaker setups. This includes a revised Predominant components mainly contain directional
approach towards decorrelation of output sounds and are coded as plane wave contributions that
signals. travel through the wave field of interest in a certain
direction. The number of predominant components can
• Some SAOC tools that have been found vary over time as well as their direction. They are
unnecessary within the MPEG-H 3D Audio transmitted as audio streams together with the
system have been excluded. As an example, associated time-variant parametric information
residual coding has not been retained, since (direction of the directional components, activity of the
carriage of signals with very high quality can directional components in the field).
already be achieved through encoding them as
discrete channel or object signals. The remaining part of the HOA input, which has not
been captured by the predominant component, is the
4.4. HOA Decoding and Rendering ambient component of the sound field to code. It mostly
contains non-directional sound components. Details of
Higher order ambisonics (HOA) builds on the idea of a the spatial properties of this part of the field are
field based representation of an audio scene. More considered less important. Therefore the spatial
mathematically stated, it is based on a truncated resolution of the ambient component is typically
expansion of the wave field into spherical harmonics, reduced by limiting the HOA order to improve the
which determines the acoustic wave field quantities coding efficiency.
within a certain source free region around the listener’s
The predominant sound components are represented as The HOA rendering matrix has to be generated at the
plane wave signals with associated directions. Thus time of initialization or when the HOA order or the
sound events emanating from uncorrelated sound reproduction setup change. It is a matrix that mixes the
sources in different directions lead to uncorrelated audio contribution of each HOA component to the available
streams to transmit. loudspeakers using mixing gains that result in the best
field approximation of that HOA component in a region
However, the HOA representation of the ambient around the listener. One main design characteristic of
component may exhibit high correlations between the the HOA rendering matrix is the energy preservation.
HOA coefficients. This can lead to undesired spatial This describes the characteristics that the HOA signal’s
unmasking of the coding noise since the quantization loudness is preserved independent of the speaker setup
noise introduced by the perceptual coder is uncorrelated and that constant amplitude spatial sweeps can be
between the coder channels, thus resulting in different perceived equally loud after rendering.
spatial properties of the desired signal and the
quantization noise during reproduction. The HOA
representation is therefore decorrelated by transforming
4.5. Loudness and Dynamic Range Processing
it into a different spatial domain to avoid the spatial
unmasking of the coding noise. Note that this spatial 4.5.1. Loudness normalization
decorrelation step and its inverse operation in the
decoder is equivalent to the mid-side coding principle One of the essential features for a next generation audio
applied to stereo coding of correlated signals, e.g. when delivery is proper loudness signaling and normalization.
coding a phantom source using a stereo audio coder. Within MPEG-H 3D Audio, comprehensive loudness
related measures according to ITU-R BS.1770-3 [30] or
EBU R128 [31] are embedded into the stream for
4.4.3. MPEG-H 3D Audio decoder HOA loudness normalization. The decoder normalizes the
rendering audio signal to map the program loudness to the desired
target loudness for playback. Downmixing and dynamic
The MPEG-H 3D Audio decoder transmitted HOA range control may change the loudness of the signal.
content is first decoded into a HOA representation by Dedicated program loudness metadata can be included
the following processing steps: in the MPEG-H bitstream to ensure correct loudness
normalization for these cases.
• Multichannel USAC 3D core decoding. 4.5.2. Dynamic range control
• Inverse decorrelation of ambient sound, i.e. Looking at different target playback devices and
transformation from decorrelated listening environments, the control of the dynamic range
representation to a HOA coefficients is vital. In the framework of dynamic range control
representation. (DRC) in MPEG, different DRC gain sequences can be
signaled that allow encoder-controlled dynamic range
• Synthesis of a HOA coefficients representation processing in the playback device. Multiple individual
of the predominant sound components. DRC gain sequences can be signaled with high
resolution for a variety of playback devices and
• HOA composition (superposition of HOA listening conditions, including home and mobile use
representations of predominant and ambient cases. The MPEG DRC concept also provides improved
components). clipping prevention and peak limiting.
Due to the increasing interest in MPEG-H 3D Audio in [5] Silzle, A. and Bachmann, T.: How to Find Future
broadcast application standards like ATSC and DVB, Audio Formats? VDT-Symposium, 2009,
the timeline of Version 1 is designed such that the Hohenkammer, Germany.
specification is expected to be International Standard by
February 2015. [6] Gerzon, M.A., Perophony: With-Height Sound
Reproduction. J. Audio Eng. Soc., 1973. Issue
21(1): p. 3-10.
7. CONCLUSIONS
[7] Ahrens, J., Analytic Methods of Sound Field
Synthesis. T-Labs Series in Telecommunication
In order to facilitate high-quality bitrate-efficient Services. 2012, Springer, Berlin, Heidelberg. ISBN
distribution and flexible reproduction of 3D sound, the 978-3-642-25742-1.
MPEG standardization group recently started the
development effort of MPEG-H Audio Coding which
[8] Chabanne, C., McCallus, M., Robinson, C.,
allows for the universal carriage of encoded 3D sound
Tsingos, N.: Surround Sound with Height in Games
from channel-based, object-based and HOA-based
Using Dolby Pro Logic IIz, 129th AES Convention,
sound formats. Reproduction is supported for many
Paper Number 8248, San Francisco, CA, USA,
output setups ranging from 22.2 and beyond down to
November 2010.
5.1, stereo and binaural reproduction. Depending on the
available output setup, the encoded material is rendered
[9] Daele, B. V.: The Immersive Sound Format:
to yield highest spatial audio quality, thus overcoming
Requirements and Challenges for Tools and
the incompatibility between various 3D (re)production
Workflow, International Conference on Spatial
formats. Moreover, MPEG-H Audio is a unified system
Audio (ICSA), 2014, Erlangen, Germany.
for carriage of channel-oriented, object-oriented and
Higher Order Ambisonics based high quality content.
[10] Hamasaki, K., Matsui, K., Sawaya, I., and Okubo,
This paper described the current status of the
H.: The 22.2 Multichannel Sounds and its
standardization project and provided an overview of the
Reproduction at Home and Personal Environment,
system architecture, its technology, capabilities and
AES 43rd International Conference on Audio for
current performance. Further improvements and
Wirelessly Networked Personal Devices, Pohang,
extensions, such as the ability to operate at very low
Korea, September 2011.
data rates, or the integration into transport systems are
on the way.
[11] Silzle, A., et al.: Investigation on the Quality of 3D
Sound Reproduction. International Conference on
Spatial Audio (ICSA). 2011. Detmold, Germany.
[12] Hiyama, K., Komiyama, S., and Hamasaki, K.: The L., Falch, C., Hölzer, A., Valero, M.L., Resch, B.,
minimum number of loudspeakers and its Mundt, H., and Oh, H.: MPEG Spatial Audio
arrangement for reproducing the spatial impression Object Coding – The ISO/MPEG Standard for
of diffuse sound field. 113th AES convention. Efficient Coding of Interactive Audio Scenes,
2002. Los Angeles, USA. Journal of the AES, Vol. 60, No. 9, September
2012, pp. 655-673.
[13] Hamasaki, K., et al.: Effectiveness of Height
Information for Reproducing Presence and Reality [23] ISO/IEC 23003-1:2010, MPEG-D (MPEG audio
in Multichannel Audio System. 120th AES technologies), Part 2: Spatial Audio Object Coding,
Convention. 2006. Paris, France. 2010.
[14] Kim, S., Lee, Y.W., and Pulkki, V.: New 10.2- [24] Neuendorf, M.; Multrus, M.; Rettelbach, N. et al:
channel Vertical Surround System (10.2-VSS); The ISO/MPEG Unified Speech and Audio Coding
Comparison study of perceived audio quality in Standard - Consistent High Quality for All Content
various multichannel sound systems with height Types and at All Bit Rates, Journal of the AES,
loudspeakers. 129th AES Convention. 2010. San Vol. 61, No. 12, December 2013, pp. 956-977.
Francisco, USA.
[25] ISO/IEC 23003-1:2012, MPEG-D (MPEG audio
[15] ISO/IEC JTC1/SC29/WG11 N14747, Text of technologies), Part 3: Unified Speech and Audio
ISO/MPEG 23008-3/DIS 3D Audio, Sapporo, July Coding, 2012.
2014.
[26] ISO/IEC JTC1/SC29/WG11 N13411: Call for
[16] Bosi, M., Brandenburg, K., Quackenbush, S.: Proposal for 3D Audio, Geneva, January 2013.
ISO/IEC MPEG-2 Advanced Audio Coding,
Journal of the AES, Vol. 45/10, October 1997; pp. [27] Blauert, J. Spatial hearing: The psychophysics of
789-814. human sound localization, revised edition; MIT
Press, 1997.
[17] ISO/IEC JTC1/SC29/WG11 MPEG, International
Standard ISO/IEC 13818-7, Generic Coding of [28] ISO/IEC 23009-1:2012(E), Information technology
Moving Pictures and Associated Audio: Advanced - Dynamic adaptive streaming over HTTP (DASH)
Audio Coding, 1997. - Part 1: Media presentation description and
segment formats, 2012.
[18] Herre, J., Dietz, M.: Standards in a Nutshell:
MPEG-4 High-Efficiency AAC Coding, IEEE [29] Pulkki, V.: Virtual sound source positioning using
Signal Processing Magazine, Vol. 25, Iss. 3, 2008; vector base amplitude panning. Journal of the
pp. 137-142. Audio Engineering Society, Volume 45, Issue6,
June 1997; pp. 456-466.
[19] EBU Evaluations of Multichannel Audio Codecs.
EBU-Tech. 3324. Geneva, September 2007, [30] ITU-R, Recommendation-BS1770.3. Algorithms to
available at measure audio programme loudness and true-peak
https://tech.ebu.ch/docs/tech/tech3324.pdf. audio level, 2012, Intern. Telecom Union, Geneva,
Suisse.
[20] Hilpert, J., Disch, S.: Standards in a Nutshell: The
MPEG Surround Audio Coding Standard, IEEE [31] European Broadcasting Union (EBU),
Signal Processing Magazine, Vol. 26, Iss. 1, 2009; Recommendation R128. Lautheitsaussteuerung,
pp. 148-152. Normalisierung und zulässiger Maximalpegel von
Audiosignalen, 2011, Geneva, Suisse.
[21] ISO/IEC 23003-1:2007, MPEG-D (MPEG audio
technologies), Part 1: MPEG Surround, 2007. [32] Füg, S. et al.: “Design, Coding and Processing of
Metadata for Object-Based Interactive Audio”,
[22] Herre, J., Purnhagen, H., Koppens, J., Hellmuth, O., 137th AES convention, 2014, Los Angeles, USA.
Engdegård, J., Hilpert, J., Villemoes, L., Terentiv