Design Your Own Voip Solution
Design Your Own Voip Solution
with a Blackfin® Processor— minimizing transmission delay and using network bandwidth
most efficiently—smaller size allows packets to be sent more
Add Enhancements Later often, while larger packets take longer to compose. On the other
hand, larger packets amortize the header and trailer information
across a bigger chunk of voice data, so they use network
By David Katz [[email protected]] bandwidth more efficiently than do smaller packets.
Tomasz Lukasiak [[email protected]]
Rick Gentile [[email protected]] By their nature, networks cause the rate of data transmission
Wayne Meyer [[email protected]] to vary quite a bit. This variation, known as jitter, is removed
by buffering the packets long enough to ensure that the slowest
INTRODUCTION packets arrive in time to be decoded in the correct sequence.
The age of voice-over-Internet-protocol (VoIP) is here, bringing Naturally, a larger jitter buffer contributes to more overall
together telephony and data communications to provide packetized system latency.
voice and fax data streamed over low-cost Internet links. The As mentioned above, latency represents the time delay through
transition from circuit-switched to packet-switched networking, the IP system. A one-way latency is the time from when a word is
continuing right now at breakneck speed, is encouraging spoken to when the person on the other end of the call hears it.
applications that go far beyond simple voice transmission, Round-trip latency is simply the sum of the two one-way latencies.
embracing other forms of data and allowing them to all travel over The lower the latency value, the more natural a conversation
the same infrastructure. will sound. For the PSTN phone system in North America, the
The VoIP challenge to the embedded-system designer is to choose a round-trip latency is less than 150 ms.
processing solution that is cost-effective, easy to deploy, and scalable
For VoIP systems, a one-way latency of up to 200 ms is considered
in performance across market spaces. A “sweet-spot” embedded-
acceptable. The largest contributors to latency in a VoIP system
solution approach is to design with a platform that can implement
are the network and the gateways at either end of the call. The
a low-channel-count basic VoIP solution, yet retain plenty of
voice codec (coder-decoder) adds some latency—but this is usually
capacity for value-added capabilities and services—like video, music,
small by comparison (<20 ms).
imaging, and system control. The discussion below makes the case
that the Blackfin1 processor family from Analog Devices offers just When the delay is large in a voice network application, the
such an attractive solution. main challenges are to cancel echoes and eliminate overlap.
Echo cancellation directly affects perceived quality; it becomes
What Is VoIP? important when the round-trip delay exceeds 50 ms. Voice
Today’s voice networks—such as the public switched telephone overlap becomes a concern when the one-way latency is more
network (PSTN)—utilize digital switching technology to establish than 200 ms.
a dedicated link between the caller and the receiver. While this Because most of the time elapsed during a voice conversation is
connection offers only limited bandwidth, it does provide an “dead time”—during which no speaker is talking—codecs take
acceptable quality level without the burden of a complicated advantage of this silence by not transmitting any data during
encoding algorithm. these intervals. Such “silence compression” techniques detect
The VoIP alternative uses Internet protocol (IP) to send digitized voice activity and stop transmitting data when there is no voice
voice traffic over the Internet or private networks. An IP activity, instead generating “comfort” noise to ensure that the line
packet consists of a train of digits containing a control header does not appear dead when no one is talking.
and a data payload. The header provides network navigation In a standard PSTN telephone system, echoes that degrade
information for the packet, and the payload contains the perceived quality can happen for a variety of reasons. The two
compressed voice data. most common causes are impedance mismatches in the circuit-
While circuit-switched telephony deals with the entire message, switched network (“line echo”) and acoustic coupling between
VoIP-based data transmission is packet-based, so that chunks of data the microphone and speaker in a telephone (“acoustic echo”).
are packetized (separated into units for transmission), compressed, Line echoes are common when there is a two-wire-to-four-wire
and sent across the network—and eventually re-assembled at the conversion in the network (e.g., where analog signaling is converted
designated receiving end. The key point is that there is no need for a into a T1 system).
dedicated link between transmitter and receiver. Because VoIP systems can link to the PSTN, they must be able to
Packetization is a good match for transporting data (for example, a deal with line echo, and IP phones can also fall victim to acoustic
JPEG file or email) across a network, because the delivery falls into echo. Echo cancellers can be optimized to operate on line echo,
a non-time-critical “best-effort” category. The network efficiently acoustic echo, or both. The effectiveness of the cancellation
moves data from multiple sources across the same medium. For depends directly on the quality of the algorithm used.
voice applications, however, “best-effort” is not adequate, because An important parameter for an echo canceller is the length of the
variable-length delays as the packets make their way across the packet on which it operates. Put simply, the echo canceller keeps
network can degrade the quality of the decoded audio signal a copy of the signal that was transmitted. For a given time after
at the receiving end. For this reason, VoIP protocols, via QoS the signal is sent, it seeks to correlate and subtract the transmitted
(quality-of-service) techniques, focus on managing network signal from the returning reflected signal—which is, of course,
bandwidth to prevent delays from degrading voice quality. delayed and diminished in amplitude. To achieve effective
Packetizing voice data involves adding header and trailer cancellation, it usually suffices to use a standard correlation
infor mation to t he data blocks. Packetization overhead window size (e.g., 32 ms, 64 ms, or 128 ms), but larger sizes may
(additional time and data introduced by this process) must be be necessary.
ANALOG MEDIA
ANALOG PHONE ENCODER/ TRANSPORT
TRANSPORT
PHONE ENDPOINT 1 DECODER PROTOCOL LAYER
PSTN (e.g., G.723.1) (e.g., UDP)
(e.g., RTP)
GATEWAY
SWITCH
IP IP
IP PHONE
PHONE
IP
SWITCH
MEDIA
TRANSPORT ENCODER/
PC PC TRANSPORT
LAYER DECODER
PROTOCOL ENDPOINT 2
GATEWAY (e.g., UDP) (e.g., RTP) (e.g., G.723.1)
SIGNALING
DATA FLOW
(a) (b)
NETWORK
LAYER INTERNET INTERNET PROTOCOL (IP)
The OSI (Open Systems Interconnection) seven-layer model TRANSPORT LAYER PROTOCOLS
(Figure 2) specifies a framework for networking. If there are two The signaling protocols above are responsible for configuring
parties to a communication session, data generated by each starts multimedia sessions across a network. Once the connection is set
at the top, undergoing any required configuration and processing up, media flows between network nodes are established by utilizing
through the layers, and is finally delivered to the physical layer for one or more data-transport protocols, such as UDP or TCP.
transmission across the medium. At the destination, processing
occurs in the reverse direction, until the packets are finally UDP (User Datagram Protocol)
reassembled and the data is provided to the second user. UDP 8 is a network protocol covering only packets that are
Session Control: H.323 vs. SIP broadcast out. There is no acknowledgement that a packet has
The first requirement in a VoIP system is a session-control protocol been received at the other end. Since delivery is not guaranteed,
to establish presence and locate users, as well as to set up, modify, voice transmission will not work very well with UDP alone when
and terminate sessions. There are two protocols in wide use there are peak loads on a network. That is why a media transport
today. Historically, the first of these protocols was H.323*, but protocol, like RTP,9 usually runs on top of UDP.
SIP (Session Initiation Protocol) is rapidly becoming the main
standard. Let’s take a look at the role played by each. TCP (Transmission Control Protocol)
International Telecommunication Union (ITU) H.323 TCP10 uses a client/server communication model. The client
4
H.323 is an ITU standard originally developed for real-time requests (and is provided) a service by another computer (a
multimedia (voice and video) conferencing and supplementary server) in the network. Each client request is handled individually,
data transfer. It has rapidly evolved to meet the requirements unrelated to any previous one. This ensures that “free” network
of VoIP networks. It is technically a container for a number of paths are available for other channels to use.
required and optional network and media codec standards. The TCP creates smaller packets that can be transmitted over the
connection signaling part of H.323 is handled by the H.225 Internet and received by a TCP layer at the other end of the call,
protocol, while feature negotiation is supported by H.245. such that the packets are “reassembled” back into the original
SIP (Session Initiation Protocol) message. The IP layer interprets the address field of each packet
SIP5 is defined by the IETF6 (Internet Engineering Task Force) so that it arrives at the correct destination.
under RFC 3261. It was developed specifically for IP telephony Unlike UDP, TCP does guarantee complete receipt of packets
and other Internet services—and though it overlaps H.323 in many at the receiving end. However, it does this by allowing packet
ways, it is usually considered a more streamlined solution. retransmission, which adds latencies that are not helpful for real-
SIP is used with SDP 7 (Session Description Protocol) for user time data. For voice, a late packet due to retransmission is as bad
discovery; it provides feature negotiation and call management. as a lost packet. Because of this characteristic, TCP is usually
SDP is essentially a format for describing initialization parameters not considered an appropriate transport for real-time streaming
for streaming media during session announcement and invitation. media transmission.
The SIP/SDP pair is somewhat analogous to the H.225/H.245
protocol set in the H.323 standard. Figure 2 shows how the TCP/IP Internet model, and its
associated protocols, compares with and utilizes various layers
SIP can be used in a system with only two endpoints and no server of the OSI model.
infrastructure. However, in a public network, special proxy and
registrar servers are utilized for establishing connections. In such Media Transport
a setup, each client registers itself with a server, in order to allow As noted above, sending media data directly over a transport
callers to find it from anywhere on the Internet. protocol is not very efficient for real-time communication. Because
* To be exact, the task of session control and initiation lies in the domain of of this, a media transport layer is usually responsible for handling
H.225.0 and H.245, which are part of the H.323 umbrella protocol. this data in an efficient manner.
Blackfin/Linphone
RTP A Blackfin VoIP system can be designed using open-source
software13 based on mClinux, the embedded version of the
popular GNU/Linux OS. One such General Public License
(GPL-licensed) IP-phone package, called Linphone14 —based
TCP
on the SIP suite—has been ported to mClinux for Blackfin
processor s, a l low i ng t he Black f i n reference desig n to
RELIABILITY communicate with any SIP-compatible endpoint. In a public
network with the proper SIP servers and gateway infrastructure,
Figure 4. Performance vs. reliability. this system can even be used to connect to phones on a PSTN
node. For voice encoding and decoding, the current Blackfin
RTCP (RTP Control Protocol) implementation of Linphone supports: G.711 (A-law and m-law),
RTCP11 is a complementary protocol used to communicate GSM (Global System for Mobile Communications), and the
control information, such as number of packets sent and lost, Speex audio compression format.
jitter, delay, and endpoint descriptions. It is most useful for The main components used in the Blackfin Linphone reference
managing session time bases and for analyzing QoS of an design are:
RTP stream. It also can provide a backchannel for limited
retransmission of RTP packets. Linux TCP/IP networking stack: includes necessary transport
and control protocols, such as TCP and UDP.
MEDIA CODECS Linphone : t he main VoIP application, which includes
At the top of the VoIP stack are protocols to handle the actual Blackfin-based G.711 and GSM codec implementations. It
media being transported. There are potentially quite a few audio comprises both a graphical user interface (GUI) for desktop
and video codecs that can feed into the media transport layer. A PCs and a simple command-line application for nongraphical
sampling of the most common ones can be found in the sidebar embedded systems.
on the last page of this article.
oRTP: an implementation of an RTP stack developed for
A number of factors help determine how desirable a codec Linphone and released under the LGPL license.
is—including how efficiently it makes use of available system
bandwidth, how it handles packet loss, and what costs are oSIP: a thread-safe implementation of the SIP protocol released
associated with it, including intellectual-property royalties. under the LGPL license.
Speex: the open-source reference implementation of the Speex
BLACKFIN VoIP COLLATERAL codec. Blackfin-specific optimizations to the fixed-point Speex
Unlike traditional VoIP embedded solutions that utilize two implementation have been contributed back to the mainline
processor cores to provide VoIP functionality, Blackfin processors code branch.
provide a convergent solution in a unified core architecture that
allows voice and video signal processing concurrent with RISC Unicoi Systems Blackfin-Based Fusion Voice Gateway
MCU processing to handle network- and user-interface demands. The Fusion Voice Gateway (Figure 5) is a complete Voice Gateway/
This unique ability to offer full VoIP functionality on a single Terminal Adapter Reference Design15 from Unicoi Systems.16 With
convergent processor provides for a unified software development router functionality and full-featured SIP telephony running on a
environment, faster system debugging and deployment, and lower single-core Blackfin Processor, the Fusion Voice Gateway allows
overall system cost. for quick time-to-market for terminal adapters.
The Fusion Voice Gateway features robust functionality, including complete suite of software for VoIP applications, all controlled
G.168 echo cancellation and multiple G.7xx voice codecs. The by a comprehensive set of application program interfaces (APIs) for
Fusion reference design also includes full-featured telephony and customization and control of core system functions.
router functionality by combining an Internet router, a 4-port For audio, the designs support multiple G.7xx audio codecs,
Ethernet switch and VoIP gateway functionality. G.168-compliant network echo cancellation, and acoustic
Unicoi Systems Blackfin-Based Fusion IP Phone echo cancellation for enhanced audio clarity. Optionally, RF
The Fusion IP Phone from Unicoi Systems is a complete software/ transceivers can be included in the design to provide wireless audio
silicon solution that offers a full-featured platform supporting capability. The designs support both H.323- and SIP-compliant
current and emerging IP phone standards, and has expansion software stacks.
capabilities for product differentiation. On the video front, the BR AVO Broadband Audio/Video
The Fusion IP Phone Reference Design reduces BOM cost as Communications18 reference design (Figure 6) provides up to
well as the time and complexity often associated with developing 30 frames per second of common intermediate format (CIF) color
an IP phone. Designed around the ADSP-BF536, the reference video, including support for ITU-standard H.263 and H.264 video
design software delivers the critical processing (e.g., real-time codecs, picture-in-picture, high-resolution graphics with overlay,
operating system, call manager, voice algorithms, acoustic echo alpha and chroma keying, and antiflicker filtering.
cancellation for full-duplex speakerphone), communication
CONCLUSION
protocols (TCP/IPv4/v6, SIP, RTP, etc.), and peripheral functions
Clearly, VoIP technology has the potential to revolutionize
(LCD and keypad controllers, etc.) required to build a basic or
the way people communicate—whether they’re at home or
advanced IP phone.
at work, plugged-in or untethered, video-enabled or just
Blackfin BRAVO VoIP Reference Designs plain audio-minded. The power and versatility of Blackfin
The Analog Devices Blackfin BRAVO ™ VoIP17 and Videophone Processors, working with a wide variety of standards, will
reference designs are complete system solutions for OEMs building make VoIP increasingly pervasive in embedded environments,
feature-rich, high-performance, low-cost VoIP desktop phones, creating value-added features in many markets that are not yet
videophones, and telephone adapters. The designs include the experiencing the benefits of this exciting technology. b
DIGITAL CCIR656
CAMERA ETHERNET
MAC + PHY RJ-45
TV VIDEO ENCODER
(NTSC/PAL)
QVGA
LCD
REMOTE
CONTROL KEYPADS
Figure 6. Blackfin BRAVO Broadband Audio/Video Communications reference design, functional diagram.