Interworking Between SIP/SDP and H.323: June 2000
Interworking Between SIP/SDP and H.323: June 2000
net/publication/2240315
CITATIONS READS
31 333
2 authors:
36 PUBLICATIONS 664 CITATIONS
Columbia University
95 PUBLICATIONS 2,473 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kundan Singh on 23 October 2012.
May 8, 2000
Abstract
There are currently two standards for signaling and control of Internet
telephone calls, namely ITU-T Recommendation H.323 and the IETF Ses-
sion Initiation Protocol (SIP). We describe how an interworking function
(IWF) can allow SIP user agents to call H.323 terminals and vice versa. Our
solution addresses user registration, call sequence mapping and session de-
scription. We also describe and compare various approaches for multi-party
conferencing and call tranfer.
1 Introduction
It appears likely that both the Session Initiation Protocol (SIP) [1, 2], together with
the Session Description Protocol (SDP) [3], and the ITU-T recommendation H.323
in its various versions [4, 5] will be used for setting up Internet multimedia con-
ferences and telephone calls. For example, currently H.323 is the most widely
used protocol for PC-based conferences, due to the widespread availability of Mi-
crosoft’s NetMeeting tool, while carrier networks using so-called soft switches and
IP telephones seem to be built based on SIP. Thus, in order to achieve universal
connectivity, interworking between the two protocols is desirable. This paper de-
scribes approaches to achieving this.
The ITU-T Recommendation H.323 [4] defines packet-based multimedia com-
munication systems and is based heavily on previous ITU-T multimedia protocols.
This work was supported by a grant from Sylantro Corp. An earlier version of this paper ap-
peared in the 1st IP-Telephony Workshop (IPTel’2000), Berlin, April 2000.
1
In particular, H.323 call signaling is inspired by H.320 [6] for ISDN, and call con-
trol by H.324 [7] for GSTN terminals. SIP [1], developed in the IETF, builds on a
simple text-based request-response architecture similar to other Internet protocols
such as HTTP [8] and RTSP [9]. With the exception of conference control, SIP
provides a similar set of basic services as H.323 [10, 11].
Interworking between the protocols is made simpler since both operate over IP
(Internet Protocol) and use RTP (Real time Transport Protocol [12]) for transferring
realtime audio/video data, reducing the task of interworking between these proto-
cols to merely translating the signaling protocols and session description. Since no
media data needs to be translated, a single gateway can likely serve thousands of
end systems.
Interworking between SIP and H.323 requires transparent support of signaling
and session descriptions between the SIP and H.323 entities. We call the server
providing this translation a SIP-H.323 interworking function (IWF). We refer to
the set of terminals speaking H.323 and SIP as the H.323 and SIP networks, re-
spectively, even though they are likely to be intermingled on the same IP network.
We use the term native network to refer to the network used by a particular termi-
nal, while the foreign network is the network whose access is mediated by the IWF.
For an H.323 terminal, a SIP terminal is in a foreign network.
When addressing a terminal using another signaling protocol, there are two ap-
proaches. First, the user can explicitly identify the protocol as part of the address,
for example, by inventing some form of H.323 URL1 such as h323:[email protected].
If, for example, an H.323 URL is used by a SIP terminal, it would then be the re-
sponsibility of the SIP terminal to find the appropriate IWF.
Alternatively, a terminal using a particular signaling protocol sees all other ter-
minals as being native, and does not know or care that a particular address refers to
a terminal in the foreign network. Indeed, an address could well change between
being native and foreign, depending on what equipment the owner of the address
happens to be using. This approach is preferable, but requires that user registra-
tions are exported into the foreign network. Depending on the type of information
sharing between H.323 or SIP elements and the IWF, different architectures are
possible to provide the transparent address resolution and call establishment, as we
will discuss below.
2
different approaches to address user registration. In Section 4, we describe a mech-
anism to map SIP addresses to H.323 addresses. Call sequence mapping between
SIP and H.323 is described in Section 5. Section 6 gives an insight into trans-
lating multi-party conferencing and call transfer. Finally, we describe our current
implementation and future work in Section 8.
2 Background
2.1 Protocol overview
H.323 includes various other subprotocols: H.225.0 [14] for connection setup and
media transport (RTP), resource access and address translation, H.245 [15] for call
control and capability negotiation, H.332 [16] for large conferences, H.235 [17]
for security, H.246 [18] for interoperability with the PSTN, H.450.x [19, 20, 21]
for supplementary services like call transfer.
In H.323, a simple call is established as follows. If a user (say Alice) wants to
talk to another user (Bob), Alice first sends an admission request to its gatekeeper.
The gatekeeper acts as a management entity in H.323, which grants access to re-
sources, controls bandwidth and maps user names to IP addresses, among other
things. The gatekeeper finds out the IP addresses at which Bob can be reached and
informs Alice. After that, Alice establishes a TCP connection to the IP address
of Bob. This is followed by ISDN-like call signaling procedure. Alice sends a
Q.931 [22] SETUP message and Bob responds with a Q.931 CONNECT mes-
sage. Once the first stage of Q.931 signaling is complete, H.245 takes over. H.245
messages are used to negotiate terminal capabilities, i.e., the support for various
audio/video algorithms. The H.245 OpenLogicalChannel procedure is used for
opening different unidirectional media channels. A media channel is defined as a
pair of UDP channels, one for RTP and the other for RTCP. Audio and video pack-
ets are encapsulated in RTP and sent from one end system to the other. Depending
on the version of H.323, Q.931 and H.245 steps can be combined in various ways.
SIP sets up calls with an INVITE message and a response from the called party.
Both INVITE and the response contain a session description indicating terminal
capabilities, typically, but not necessarily, encoded using SDP. Proxy and redirect
servers are responsible for translating between user names and the called party’s IP
address.
3
bilities, and local and remote media transport addresses at which the endpoint can
receive the media packets. In H.323, this information is spread over different stages
of the call setup, while SIP conveys it in an INVITE message and its response.
Translating a SIP call to an H.323 call is straightforward. The IWF gets all three
pieces of information in the SIP INVITE message and can split it across multiple
stages of the H.323 call establishment. However, in the reverse direction, from
H.323 to SIP, the different stages of H.323 call establishment have to be merged
into a single SIP INVITE message. We describe and compare various approaches
in Section 5. The H.323v2 (version 2.0) Fast Connect procedure is a step towards
simplifying the multi-stage signaling of H.323. However, it is optional and an
H.323v2 entity is required to support the traditional multi-stage signaling. Thus,
we describe call setup both with and without Fast Connect.
4
tive capability sets: [a1; a2], [v1 ; v2], and [d1]. It indicates that the terminal can
support audio, video and data simultaneously. Audio can use either codec a1 or a2 ,
video codec v1 or v2 , and data format d1 .
SIP can, in principle, use any session description format. In practice, however,
SDP is used exclusively. SDP lists media types and the supported encodings for
each. Unlike H.245, SDP cannot express cross-media or inter-media constraints,
however. For example, SDP cannot indicate that for a particular media type, the
other side can only choose subset A or subset B of the listed codecs, but not codecs
from both subsets. Similarly, SDP cannot express that certain audio codecs can
only be used in conjunction with certain video codecs.
Thus, a SIP media capability can be easily described in H.245, however the
reverse is more complicated. One approach is to carry multiple SDP messages in
the message body of SIP INVITE requests and responses, using the “multipart”
content type. Each SDP message then represents one capability descriptor of the
H.245 capability set. In Section 5 we describe how sending multiple SDP messages
can be avoided.
5
assumptions, basic conferences can be set up, as described in Section 6.
6
SIP−H.323
IWF
SIP−H.323
IWF
SIP−H.323
IWF H.323 Terminal
SIP User Agent
(LRQ) to the H.323 gatekeepers. The gatekeeper to which the H.323 user is regis-
tered responds with the IP address of the H.323 user. Once the SIP server knows
that the address belongs to the H.323 world, it can route the call to the destination.
One drawback of this approach is that the H.323 gatekeepers are burdened with
all the registrations in the SIP network.
This approach only makes those SIP addresses handled by the registrar avail-
able to the H.323 zone. Typically, a registrar is responsible for a single domain,
e.g., columbia.edu. Thus, each H.323 zone would have to have an IWF. If an H.323
user wants to call a SIP terminal, first the H.323 terminal locates, using DNS TXT
records, [25, p. 57] the appropriate gatekeeper2 , which in turn uses the registration
information conveyed by the IWF to discover that this address is actually located
in the SIP network.
2
It is not clear how widely implemented this approach is.
7
3.2 IWF contains an H.323 gatekeeper
This architecture, shown in Fig. 1(b) is similar to the previous approach except
that the SIP proxy server maintains the user registration information from both
networks. Any H.323 registration request received by the H.323 gatekeeper is
forwarded to the appropriate SIP registrar, which thus stores the user registration
information of both the SIP and H.323 entities.
To the SIP terminal, H.323 terminals simply appear as SIP URLs within the
same domain. (See Section 4 on how H.323 addresses are translated to SIP URLs.)
If an H.323 entity wants to talk to a user who happens to reside in the SIP network,
it sends an admission request (ARQ) to its gatekeeper. The gatekeeper multicasts
the location request (LRQ) to all the other gatekeepers. The GK-IWF server cap-
tures the request and tries to find out if the address belongs to a SIP user. It does
so by sending a SIP OPTIONS request, which does not set up any call state. If the
address is valid in the SIP network and the user is currently available to be called,
the IWF responds with the location confirmation (LCF), letting the H.323 terminal
know that the destination is reachable.
This approach has the similar drawback as the previous approach (Section 3.1)
in that the proxy has to store all H.323 registration information.
However, this approach has the advantage that even if some H.323 gatekeepers
are not equipped with a IWF, the address resolution works: If an H.323 gatekeeper
cannot resolve a called address, it multicasts a location request (LRQ) to the other
gatekeepers in the network. As long as at least one H.323 gatekeeper exists with
the SIP-H.323 signaling translation capability, the SIP user can be located from the
H.323 network. Note that the previous approach (Section 3.1) required that all the
SIP registrars/proxy servers must be equipped with IWFs.
8
is in the H.323 network. The IWF, in turn, multicasts the location request (LRQ)
for Henry to all gatekeepers. If there is no positive response from the gatekeepers
of the H.323 network within a timeout period, the IWF concludes that the address
is not valid in the H.323 network and the branch fails.
In the other direction, Henry sends an admission request (ARQ) to its gate-
keeper. Since this gatekeeper does not have the address mapping for Sam, it multi-
casts the location request (LRQ) for Sam to the other gatekeepers in the network.
In addition, the IWF is tuned to receive the LRQ. The IWF then uses the SIP OP-
TIONS request (as in Section 3.2) to find out if Sam is available in the SIP network
and informs the GK if the request succeeds. This is followed by H.323 call estab-
lishment between Henry and the IWF and a SIP call between the IWF and Sam.
The IWF should support direct H.323 connections. For instance, a SIP user
(Sam) should be able to call an H.323 user (Henry) through the IWF (say sip323.columbia.edu)
by placing a call to sip:[email protected]. Similarly, the H.323 user
should be able to reach a SIP user (sip:[email protected]) by establishing a
Q.931 TCP connection to the IWF and providing the destination address or the re-
mote extension address in the Q.931 SETUP message as sip:[email protected].
The direct connection does not involve user registration and the caller is expected
to know that the destination is reachable via the IWF.
4 Address translation
While user registration exports identities into the foreign network, address transla-
tion is performed by the IWF to create valid SIP addresses from H.323 addresses
and vice versa. In SIP, addresses are typically SIP URIs of the form sip:user@host,
where user names can also be telephone numbers. However, SIP terminals can also
support other URLs schemes, for example “tel:” URLs for telephone numbers [26]
or H.323 URLs [13]. Generally, SIP terminals proxy calls to their local server if
they do not understand the particular URL scheme, in the hope that the server can
translate it.
In H.323, addresses (ASN.1 AliasAddress) can take many forms, including
unstructured identifiers (h323-ID), E.164 (global) telephone numbers, URLs of
various types, host names or IP address, and email addresses (email-ID). Local user
names and host names appear to be most common. For compatibility with H.323
version 1.0 entities, the h323-ID field of H.323 AliasAddress must be present.
For SIP-H.323 interoperability, there should be a consistent and unique way of
mapping a SIP URI to an H.323 address and vice-versa. Translating a SIP URI
to an H.323 AliasAddress is easy: We simply copy the SIP URI verbatim into
the h323-ID. The user and host parts of SIP-URI are used to generate an email
9
identifier, “user@host”, which is stored in the email-ID field of AliasAddress.
The transport-ID parameter is copied from the host part of SIP-URI if the latter
is given numerically. The e164 field is extracted from the user part of SIP address
if it is marked as a telephone number.
Translating an H.323 AliasAddress to a SIP address is more difficult since
multiple representations (e.g., e164, url-ID, transport-ID) need to be merged into
a single SIP address. In the easiest case, the alias contains a url-ID with a SIP
URI, in which case it is simply copied into the SIP message. Otherwise, if the
h323-ID can be parsed as a valid SIP address (e.g., “Alice <sip:alice@host>” or
“alice@host”) it is used. Next, if the transport-ID is present and it does not point
to the IWF itself, then it forms the host and port portions of the SIP URI. Finally,
if the H.323 alias has an email-ID, it is used in the SIP URI prefixed with “sip:”
URI scheme.
Note that the translated address may not necessarily be valid. On the H.323
side, it may be desirable to configure a gatekeeper to route all calls that are not
resolvable within the H.323 network to the IWF, which would then attempt a trans-
lation to a SIP URI. This would allow H.323 terminals to reach any SIP terminal,
even those not cross-registered.
5 Connection establishment
Once the user knows that the destination is reachable via the IWF, the connec-
tion is established. A point-to-point call from Alice to Bob needs three cruicial
pieces of information, namely the logical destination address (A) of Bob, the media
transport address (T ) at which each of the users is ready to receive media packets
(RTP/RTCP) and a description of the media capabilities (M ) of the parties. Alice
should know A, T and M of Bob and Bob needs to know Alice’s T and M . The
difficulty in translating between SIP and H.323 arises because A, M , and T are all
contained in the SIP INVITE request and its response, while H.323 may spread this
information among several messages.
10
required components (M and T of the call destination).
Since Fast Connect is optional in H.323v2, an H.323 entity must be able to
handle calls without the Fast Connect feature for backward compatibility. In par-
ticular, the IWF must accept a non-Fast Connect call from the H.323 side. In the
other direction, the IWF should try to use H.323v2 Fast Connect, but must be pre-
pared to switch to the multi-stage call establishment procedure if the response from
the H.323 entity indicates that this is not supported.
11
SIP user agent IWF H.323 Termin
INVITE
SETUP
C1 = capability set
CONNECT
TerminalCapabilitySet
Ack
TerminalCapabilitySet = C2
Ack
OpenLogicalChannel
Ack if present in C1
ACK
Figure 2: Call from SIP terminal to H.323 terminal without Fast Connect
and G.723.1 means that the sender can switch between these algorithms at any time
during a call without explicitly informing the receiver beyond changing the RTP
payload type. However, in H.245, the sender chooses an algorithm from the capa-
bility set of the receiver and explicitly opens a logical channel for that algorithm.
The sender cannot switch dynamically to another algorithm without informing the
receiver. The sender has to close the previous logical channel and re-open it with
new algorithm. Alternatively, the receiver can use H.245 ModeRequest to request
12
the sender to use a different algorithm.
This problem can be addressed by having the RTP/RTCP packets from SIP to
H.323 be intercepted by the IWF. If the IWF detects a change in coding algorithm,
it initiates the required H.245 procedures. However, this approach is not advisable,
as it scales poorly.
Another approach limits the media description sent to the SIP side to only one
algorithm per media (or per alternative capability set). This can be achieved by
maintaining a maximal intersection of the SIP and H.323 terminal capability sets.
A maximal intersection of two capability sets is a capability set which is a subset
of both the capability sets and no other superset is a subset of those capability sets.
The operating mode, that is, the selected algorithms for the call, is derived from
the intersection of the two capability sets by selecting one algorithm per alternative
capability set. If the SIP side sends additional INVITE requests during the call to
change media parameters, the IWF simply recalculates the operating modes.
Finding maximal intersection of capability sets is described in [27]. As an ex-
ample, let the SIP capability set be f[PCMU,PCMA,G.723.1][H.261]g and H.323
capability set be f[PCMU,PCMA,G.729][H.261]g f[G.723.1][H.263]g (i.e., the
SIP user can support PCMU, PCMA or G.723.1 audio and H.261 video, whereas
the H.323 user can support either one of the PCMU, PCMA, G.729 audio with
H.261 video or G.723.1 audio with H.263 video). The maximal intersection as cal-
culated by the IWF is f[PCMU,PCMA][H.261]g f[G.723.1]g. The IWF derives an
operating mode by selecting a capability descriptor from the maximal intersection
and selecting one algorithm per alternative capability set (e.g., fPCMU,H.261g).
The IWF conveys only the PCMU audio and H.261 video to the SIP user agent. If
the SIP side sends additional INVITE with a different capability set (f[G.729,G.723.1][H.261]g,
the new maximal intersection becomes f[G.729][H.261]gf[G.723.1]g. The IWF
derives a new operating mode (fG.729,H.261g) and initiates the H.245 procedure
to change the PCMU audio to G.729.
13
H.323 Terminal IWF SIP user agen
SETUP
INVITE
No session description
200 OK
CONNECT C1 = capability set
TerminalCapabilitySet
Ack
TerminalCapabilitySet = C2
Ack
OpenLogicalChannel
Ack if present in C1
OpenLogicalChannel For all C1 ^ C2 = M
M is operating mode
Ack
ACK
Session description = M
Figure 3: Call from H.323 to SIP terminal call without Fast Connect
H.323 terminals (H1 and H2) and two SIP user agents (S1 and S2) are involved in
a conference. From the H.323 side, the interworking function (IWF1) looks like a
single H.323 terminal. From the SIP side, the IWF acts as a single SIP user agent.
This approach fails if S1 invites another H.323 user H3 via a different inter-
working function (IWF2). How will the other participants such as H2 know that
H3 has joined the conference? Alternatively, if H1 invites a SIP user, S3, S2 will
not know of the presence of S3. One way for the participants to know about the
14
S1
H1 Multipoint
Controller Interworking function
MC IWF1
H2 S2
IWF3 IWF2
S3 H3
existence of the other participants is to rely on the RTP/RTCP packets. This goes
against the idea of H.323 conferencing where H.245 messages are used to convey
the existence of new participants.
We can solve this problem by forcing all invitations to pass through the IWF.
Fig. 5(a) shows a conference managed by an MC where H.323 terminals are di-
rectly connected to the MC and SIP user agents are connected through IWFs. A
SIP user agent is allowed to only invite other SIP UAs through the IWF, so that the
IWF can update the MC state. In a SIP-centric architecture, Fig. 5(b), the H.323
terminals take part in the conference through the IWFs.
We recommend a SIP-centered architecture because the SIP conferencing model
is more general, allowing full mesh with distributed control or centralized bridged
conferences. In general, translating services is greatly simplified if an operator
adopts a primary signaling protocol, with services offered only in that protocol.
Terminals using another protocol are restricted to making calls through the IWF.
Supporting H.332 loosely coupled conferences is straightforward, since SDP
15
H1
S1
IWF H1 IWF
S1
SIP cloud
MC H3 IWF
H3 S2
IWF S3
H.323 cloud
S2 IWF
SIP cloud IWF H2
H.323 cloud
S3 H2
16
A B C A B C
Original Call Original Call
FACILITY BYE
Invoke Call transfer Also: C
Initiate SETUP
Invoke Call 200 OK
Tranfer Setup
INVITE
CONNECT 200 OK
Return Result ACK
RELEASE
COMPLETE
Return Result New Call New Call
FACILITY
Invoke Call transfer SETUP
Invoke Call
Tranfer Setup INVITE
CONNECT 200 OK
RELEASE
COMPLETE Return Result ACK
Return Result
17
7 Related work
The problem of interworking between SIP and H.323 has only recently started to
attract attention, with ETSI TIPHON and ITU now likely to get involved.
Details of the SIP-H.323 interworking described here can be found in [27].
Agboh [28] and Kausar and Crowcroft [29] address the problem of interworking,
but do not solve the issues of registration and media capability translation.
9 Acknowledgments
We would like to thank the members of the sip-h323 mailing list ([email protected])
for their comments.
References
[1] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, “SIP: session ini-
tiation protocol,” Request for Comments (Proposed Standard) 2543, Internet
Engineering Task Force, Mar. 1999.
18
[2] H. Schulzrinne and J. Rosenberg, “Internet telephony: Architecture and
protocols – an IETF perspective,” Computer Networks and ISDN Systems,
vol. 31, pp. 237–255, Feb. 1999.
[10] H. Schulzrinne and J. Rosenberg, “A comparison of SIP and H.323 for inter-
net telephony,” in Proc. International Workshop on Network and Operating
System Support for Digital Audio and Video (NOSSDAV), (Cambridge, Eng-
land), pp. 83–86, July 1998.
[11] I. Dalgic and H. Fang, “Comparison of H.323 and SIP for IP telephony signal-
ing,” in Proc. of Photonics East, (Boston, Massachusetts), SPIE, Sept. 1999.
[12] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: a transport
protocol for real-time applications,” Request for Comments (Proposed Stan-
dard) 1889, Internet Engineering Task Force, Jan. 1996.
19
[13] P. Cordell, “Conversational multimedia URLs,” Internet Draft, Internet Engi-
neering Task Force, Dec. 1997. Work in progress.
20
[23] H. Schulzrinne and J. Rosenberg, “SIP call control services,” Internet Draft,
Internet Engineering Task Force, June 1999. Work in progress.
[26] A. Vaha-Sipila, “URLs for telephone calls,” Internet Draft, Internet Engineer-
ing Task Force, Dec. 1999. Work in progress.
[28] C. Agboh, “A study of two main ip telephony signaling protocols: H.323 sig-
naling and sip; a comparison and a signaling gateway specification,” Master’s
thesis, Unversite Libre de Bruxelles (ULB), Facuts des Science, Dpartment
Informatique, Brussels, Belgium, 1999. supervised by Eric Manie.
21