Voice Coding in 3G Networks
Voice Coding in 3G Networks
Tommi Koistinen
Signal Processing Systems
Nokia Networks
[email protected]
This article will discuss the voice coding and user Figure 1. 3GPP Reference Architecture of Release
plane issues particularly in 3G networks. The first 99.
chapter presented the basic reasons and means for
speech coding in general. The second chapter will Release R4
review the basic 3G network architecture models.
The most important 3G network elements that The next step that is taken with release 4 (formerly
provide speech related processing are discussed in known as Release 2000) is to separate the
chapters four and five. The sixth chapter will signaling and the user data in Iu-CS interface. The
discuss the issues related to tandeming of speech signaling goes now to MSC Server and the
codecs and finally the seventh chapter will conclude transcoder is separated as a standalone media
the presentation. gateway. Figure 2 presents the R4 architecture with
clear separation to packet side and to circuit
switched side. Media gateway in the PSTN
2 Network Architectures interface converts the AMR coded speech to G.711.
Speech goes in packet mode from UTRAN to PSTN
interface.
This chapter will present the basic 3G network
evolution according to 3GPP (Third Generation
Partnership Project [5]) reference architectures.
and announcement and conferencing services.
H
HSS
S/C
/CS
SCF Control mechanisms for these functionalities have
usually been proprietary. In 3G networks, all of
Multimedia these functions must be offered by the Media
SG
GSN
N GGSN
N IPnetworks Gateway that is controlled by the Media Gateway
Iu-PS
Controller (MGC) with the standard H.248 control
MT UT
TRAN MGW
W M
MG
GW protocol [7].
Iu-CS PSTN/legacy
user data
networks
An example (and quite full) set of functions that
Iu-CS
control MSCC M
M S
SC Media Gateway could implement is:
S
Serve
err Se
erv
rver
• support for several interfaces (A-interface for
Figure 2. 3GPP Reference Architecture of Release 4. 2G and Iu-interface for 3G) and for several
transmission protocols (ATM, IP, TDM)
The final architecture model, also called as All-IP • support for several codecs including the
network [6], moves also speech to full end-to-end Adaptive Multirate (AMR) codec and future
packet mode. The IP packets that are generated in coming wideband codecs
a mobile terminal go as such either to another IP • electric and acoustic echo cancellation
terminal or to MGW from GGSN. The architecture is • announcement services
presented in Figure 3. A new network entity is also • DTMF and call progress tone generation and
introduced, namely the Multimedia Resource detection
Functions (MRF) unit that implements mainly • support for fax/modem/data protocols
conferencing services for the IP based calls. • support for Tandem Free Operation (TFO) and
Transcoder Free Operation (TrFO)
• bad frame handling
H
HSS/CSC
CF
F • IP protocol handling (RTP/RTCP, encryption,
QoS support)
Multimedia
S
SGSN G
GG
GS
SN IPnetworks
Iu-PS Some functions, especially the conferencing service
MT U
and possible speech enhancement services, are
T UT
TR
RAN MRF
F MG
GW
W
basically thought to be provided by the Multimedia
PSTN/legacy Resource Functions (MRF) unit, but they may
networks optionally be added to Media Gateway
responsibilities.
Multimedia
S
SG
GSN G
GGSN IPnetworks
MSC PSTN MSC
C
64kbps
MT
T U
UTRA
AN
N Tran
ranscoder Tra
ransco
oder
64↔ 16 64↔ 16
Figure 4. MRF unit as a network side speech
enhancement server.
BSS BS
SS
Calls between mobile IP terminals are transferred in
M
MS MS
coded format end-to-end and if any speech
enhancement services are desired to be provided
on the network side, the MRF entity could do the
necessary operations (as it already has to support Figure 5. No Tandem Free Operation.
all coding formats for the conferencing service).
The other option is that all speech enhancement
services shall be provided by mobile terminals.
MSC PSTN MSC
48(16) kbps
A set of speech enhancements that the MRF entity
could provide is: Transcode
er Tra
ranscode
er
16↔ 16 16↔ 16
• Noise suppression
• Gain (volume) control B
BSS
S B
BSS
S
• Acoustic echo cancellation
M
MS MS
S
It should also be mentioned that the Media
Gateway and the Multimedia Resource Functions
unit are logical entities only and physically they may Figure 6. Tandem Free Operation is utilised.
co-locate in the same device.
TFO is based on inband procedures that means
5 Tandem Avoidance that no outband signaling is used to form a TFO
connection. In practice, the TFO connection
establishment starts with a negotiation phase where
5.1 Tandem Free Operation (TFO) certain TFO protocol messages are exchanged
between transcoders to agree on the used codecs.
Every time voice is encoded or decoded the speech
If the other end doesn’t support TFO it will not
quality will degrade a little bit. Thus, as few
acknowledge the negotiation and also the TFO
conversion as possible are desired. The basic 2G
capable transcoder will start to encode and decode
mobile-to-mobile call suffers from tandem coding
the 64 kbps as in figure 5.
that means that separate speech coding happens
in both radio interfaces and between the
transcoders voice goes in 64 kbps G.711 format. In 5.2 Transcoder Free Operation (TrFO)
general two encodings in clear speech conditions is
no problem but more than two encodings especially For the 3G networks a slightly different approach is
in bad line conditions cause severe degradations. taken considering tandem avoidance. Firstly,
outband signaling is used for codec negotiation and
To overcome this kind of quality problem ETSI has if codecs match there is no need for the
specified so called Tandem Free Operation (TFO) transcoders at all. Operation is called as
[8] that establishes a sub channel (of 16 or 8 kbps) Transcoder Free Operation (TrFO) [9].
inside the 64 kbps G.711 stream for the encoded
TrFO is relevant mainly for the MSC Server concept during silence only silence description (SID) frames
and for intersystem compatibility as in the final All- are periodically sent to other end. All modes
IP network calls are by nature of TrFO type. In operate on 20 ms frame basis.
figure 7 is presented a basic call where outband
signaling travels from MSC Server to another until
the whole link is negotiated. If a common codec can Codec mode Source codec bit-rate
be agreed no transcoding resources are reserved AMR_12.20 12.20 kbit/s FR
from the intermediate media gateways. AMR_10.20 10.20 kbit/s FR
AMR_7.95 7.95 kbit/s FR / HR
AMR EFR! AMR_7.40 7.40 kbit/s FR / HR
M
MT
T U
UT
TR
RA
AN
N M
MG
GW
W M
MG
GW
W G
GS
SMBS
SS
S AMR_6.70 6.70 kbit/s FR / HR
AMR?
AMR_5.90 5.90 kbit/s FR / HR
M
MSSC
C M
MSSC
C
PSTN/legacy AMR_5.15 5.15 kbit/s FR / HR
S networks
Se
erv
rve
err S
Se
erv
rve
err
AMR?
AMR_4.75 4.75 kbit/s FR / HR
AMR_SID 1.80 kbit/s FR / HR
Figure 7. A basic TrFO call.
Table 1. 8+1 different AMR modes.
The choice between the full rate and the half rate
4 Adaptive Speech Coding channel mode can be made off-line based on the
capacity requirements of the operator. The
The traditional GSM speech codecs operate in the selection of the codec mode happens continuously
radio interface at a fixed source rate with a fixed by the radio resource management. Basically, as a
level of error protection (e.g. Full Rate codec with lower AMR mode is selected, more bits from the
framing overhead consumes 16 kbps and error gross bit rate are freed for the channel coding and
protection adds 6.8 kbps resulting a 22.8 kbps error protection. Even that we use a very low codec
gross bit rate over the air). The codec itself do not bit rate the high error protection keeps the overall
have means (except bad frame handling speech quality sufficiently high. The figure 8 shows
mechanism) to adapt to changing radio conditions. reasoning for the mode selection. To follow the
For this reason, ETSI (and later 3GPP) has asked optimum quality curve (MOS=Mean Opinion Score
for new adaptive coding schemes that could select of speech quality) against decreasing signal-to-
the optimum channel mode (full rate or half rate) noise ratio (C/I) the AMR mode that is used must
and the optimum codec mode (speech rates) based be changed accordingly.
on the radio conditions. As a result, the Adaptive
Multirate (AMR) codec [10,11] has now been
standardized as an additional codec for the GSM M
system and as the only mandatory codec (thus far) O
for the 3G system. Two most important design S
targets for the AMR codec were: Mode 1
Mode 2
• improved speech quality in both half-rate and Mode 3
full-rate modes by means of codec mode
adaptation i.e. varying the balance between
speech and channel coding for the same gross
bit-rate.
C/I
Excellent
AMR-NB
services will make speech quality better, even to
Verygood level never experienced before.
EFR
Good This article has mainly focused on the application
level. Good network conditions (low delay, no lost
Poor packets due congestion) are a starting point also for
superior application level speech quality. Media
Unacceptable gateways shall support the network level QoS
mechanisms (like DiffServ) that are used to
Error-free 13 10 7 4 optimize and prioritise the real-time and the non-
real-time traffic (see for example [16]).
Carrier-to-interfaceratio (dB)