0% found this document useful (0 votes)
8 views

Multimedia

Multimedia combines multiple forms of media such as text, audio, video, graphics, and animation for communication. The document discusses digitizing and compressing audio and video for streaming over the Internet, detailing techniques like predictive and perceptual encoding for audio, and JPEG and MPEG standards for video. It also covers various approaches for streaming stored and live audio/video, emphasizing the importance of protocols like RTSP and RTP for real-time interactive applications.

Uploaded by

Juveria Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Multimedia

Multimedia combines multiple forms of media such as text, audio, video, graphics, and animation for communication. The document discusses digitizing and compressing audio and video for streaming over the Internet, detailing techniques like predictive and perceptual encoding for audio, and JPEG and MPEG standards for video. It also covers various approaches for streaming stored and live audio/video, emphasizing the importance of protocols like RTSP and RTP for real-time interactive applications.

Uploaded by

Juveria Shaikh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 85

Multimedia

The word multi and media are combined to form the


word multimedia.
The word “multi” signifies “Multiple” and
“Media” signifies “Means of communication.”
Multimedia is a combination of multiple forms of media
useful to communicate which includes
1. Text
2. Audio
3. Video
4. Graphics
5. Animation
29.1
Figure 29.1 Internet audio/video

29.2
Note

Streaming stored audio/video refers to


on-demand requests for compressed
audio/video files.

29.3
Note

Streaming live audio/video refers to the


broadcasting of radio and TV programs
through the Internet.

29.4
Note

Interactive audio/video refers to the use


of the Internet for interactive audio/video
applications.

29.5
29-1 DIGITIZING AUDIO AND VIDEO

Before audio or video signals can be sent on the


Internet, they need to be digitized. We discuss audio
and video separately.

Topics discussed in this section:


Digitizing Audio
Digitizing Video

29.6
Note

Compression is needed to send video


over the Internet.

29.7
29-2 AUDIO AND VIDEO COMPRESSION

To send audio or video over the Internet requires


compression. In this section, we discuss audio
compression first and then video compression.

Topics discussed in this section:


Audio Compression
Video Compression

29.8
Audio Compression

• Audio compression can be used for speech or music.


• For speech, we need to compress a 64-kHz digitized
signal;
• for music, we need to compress a 1.411 –MHz digitized
signal.
Two categories of techniques are used for audio
compression:
1. predictive encoding
2. perceptual encoding.

29.9
1. Predictive Encoding

In predictive encoding, the differences between the


samples are encoded instead of encoding all the
sampled values.
This type of compression is normally used for speech.

Several standards have been defined such as GSM


(13 kbps), G.729 (8 kbps), and G.723.3 (6.4 or 5.3
kbps).

29.10
2. Perceptual Encoding: MP3

• The most common compression technique that is used


to create CD-quality audio is based on the perceptual
encoding technique.
• As we mentioned before, this type of audio needs at
least 1.411 Mbps; this cannot be sent over the Internet
without compression.
• MP3 (MPEG audio layer 3), a part of the MPEG
standard uses this technique.

29.11
Video Compression

• Video is composed of multiple frames.


• Each frame is one image.
• We can compress video by first compressing images.

Two standards are prevalent in the market.


1. Joint Photographic Experts Group (JPEG) is used
to compress images.
2. Moving Picture Experts Group (MPEG) is used to
compress video.

29.12
Image Compression: JPEG

• If the picture is not in color (gray scale), each pixel can


be represented by an 8-bit integer (256 levels).
• If the picture is in color, each pixel can be represented
by 24 bits (3 x 8 bits), with each 8 bits representing
red, blue, or green (RBG).
• To simplify the discussion, we concentrate on a gray
scale picture.
• In JPEG, a gray scale picture is divided into blocks of
8 x 8 pixels

29.13
Figure 29.2 JPEG gray scale

29.14
Figure 29.3 JPEG process

29.15
Discrete Cosine Transform (DCT):

• In this step, each block of 64 pixels goes through


a transformation called the Discrete Cosine
Transform (DCT).
• The transformation changes the 64 values so
that the relative relationships between pixels are
kept but the redundancies are revealed.
• The results of the transformation will be shown in
three different cases.

29.16
Case I:
In this case, we have a block of uniform gray, and the
value of each pixel is 20.
When we do the transformations, we get a nonzero value
for the first element (upper left corner); the rest of the
pixels have a value of 0.
The value of T(0, 0) is the average (multiplied by a
constant) of the P(x,y) values and is called the dc value
(direct current, borrowed from electrical engineering).
The rest of the values, called ac values, in T(m,n)
represent changes in the pixel values.
But because there are no changes, the rest of the values
are 0s

29.17
Figure 29.4 Case 1: uniform gray scale

29.18
Case 2:
In the second case, we have a block with two different
uniform gray scale sections.

There is a sharp change in the values of the pixels (from


20 to 50).

When we do the transformations, we get a dc value as


well as nonzero ac values.

However, there are 55 only a few nonzero values


clustered around the dc value. Most of the values are 0
29.19
Figure 29.5 Case 2: two sections

29.20
Case 3:

In the third case, we have a block that changes gradually.


That is, there is no sharp change between the values of
neighboring pixels.

When we do the transformations, we get a dc value, with


many nonzero ac values also

29.21
Figure 29.6 Case 3: gradient gray scale

29.22
From the above 3 case, we can state the
following:

• The transformation creates table T from table P.


• The dc value is the average value (multiplied by a
constant) of the pixels.
• The ac values are the changes.
• Lack of changes in neighboring pixels creates 0s

29.23
Quantization:

After creating the T table, values are quantized to


reduce bit usage. This quantization involves dividing
each value by a constant and discarding the
fractional part, which lowers the bit requirement
further. An 8x8 quantizing table usually specifies how
to quantize values, with the divisor depending on the
value's position to optimize bit usage and minimize
zeros. This quantization phase, which is irreversible,
leads to information loss and is why JPEG is
considered lossy compression.

29.24
Compression:

After quantization, the values are read from the


table, and redundant 0s are removed.
However, to cluster the 0s together, the table is read
diagonally in a zigzag fashion rather than row by row
or column by column.
The reason is that if the picture changes smoothly,
the bottom right corner of the T table is all 0s shows
the process.

29.25
Figure 29.7 Reading the table

29.26
Video Compression: MPEG
The Moving Picture Experts Group method is used
to compress video.
In principle, a motion picture is a rapid flow of a
set of frames, where each frame is an image.
In other words, a frame is a spatial combination of
pixels, and a video is a temporal combination of
frames that are sent one after another.
Compressing video, then, means spatially
compressing each frame and temporally
compressing a set of frames

29.27
Spatial Compression:
The spatial compression of each frame is done with
JPEG (or a modification of it). Each frame is a picture
that can be independently compressed.

Temporal Compression:
In temporal compression, redundant frames are
removed. When we watch television, we receive 50 frames
per second. However, most of the consecutive frames are
almost the same. For example, when someone is talking,
most of the frame is the same as the previous one except
for the segment of the frame around the lips, which
changes from one frame to another. To temporally
compress data, the MPEG method first divides frames into
three categories: I-frames, P-frames, and B-frames
29.28
Figure 29.8 MPEG frames

29.29
Intracoded frame (I-frame)
• Independence: I-frames are self-contained and do not
rely on other frames for decoding.
• Regular Intervals: They appear at regular intervals in a
video stream (e.g., every ninth frame).
• Handling Changes: I-frames handle sudden changes in
the video content that other frame types might not
capture effectively.
• Broadcast Scenarios: Essential for broadcast scenarios
as they allow viewers who tune in late to see a complete
picture from the latest I-frame.
• No Dependencies: They cannot be constructed from
previous or future frames; they are complete on their
own.
29.30
Predicted frame (P-frame)
• Dependency: P-frames are related to and depend on
preceding I-frames or P-frames.
• Change Representation: They contain only the
changes from the previous frame, rather than the entire
image.
• Segment Limitation: They may struggle to capture
significant changes, such as those involving fast-
moving objects.
• Construction: P-frames can only be constructed from
previous I- or P-frames.
• Data Efficiency: They carry less information
compared to other frame types and are more
compressed.
29.31
Bidirectional frame (B-frame)

A bidirectional frame (B-frame) is related to the


preceding and following I-frame or P-frame.
In other words, each B-frame is relative to the past
and the future.
Note that a B-frame is never related to another B-frame.

Fig. 29.8 shows a sample sequence of frames

29.32
Figure 29.9 MPEG frame construction
This shows how I-, P-, and B-frames are constructed from a series of seven
frames.

29.33
29-3 STREAMING STORED AUDIO/VIDEO

Now that we have discussed digitizing and


compressing audio/video, we turn our attention to
specific applications. The first is streaming stored
audio and video.

Topics discussed in this section:


First Approach: Using a Web Server
Second Approach: Using a Web Server with a Metafile
Third Approach: Using a Media Server
Fourth Approach: Using a Media Server and RTSP

29.34
First Approach: Using a Web Server

A compressed audio/video file can be downloaded as a


text file. The client (browser) can use the services of
HTTP and send a GET message to download the file.
The Web server can send the compressed file to the
browser.
The browser can then use a help application, normally
called a media player, to play the file.
The file needs to download completely before it can be
played.

29.35
Figure 29.10 Using a Web server

29.36
Second Approach: Using a Web Server
with Metafile
The media player is directly connected to the Web server for
downloading the audio/video file. The Web server stores two files: the
actual audio/video file and a metafile that holds information about the
audio/video file.
1. The HTTP client accesses the Web server using the
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the metafile to access
the audio/video file.
5. The Web server responds

29.37
Figure 29.11 Using a Web server with a metafile

29.38
Third Approach: Using a Media Server
The problem with the second approach is that the browser
and the media player both use the services of HTTP.
HTTP is designed to run over TCP.
This is appropriate for retrieving the metafile, but not for
retrieving the audio/video file.
The reason is that TCP retransmits a lost or damaged
segment, which is counter to the philosophy of streaming.
We need to dismiss TCP and its error control; we need to use
UDP.
However, HTTP, which accesses the Web server, and the
Web server itself are designed for TCP; we need another
server, a media server.
29.39
Steps
1. The HTTP client accesses the Web server using a
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the metafile to
access the media server to download the file.
Downloading can take place by any protocol that uses
UDP.
5. The media server responds.

29.40
Figure 29.12 Using a media server

29.41
Fourth Approach: Using a Media Server
and RTSP

The Real-Time Streaming Protocol (RTSP) is a control


protocol designed to add more functionalities to the
streaming process.
Using RTSP, we can control the playing of audio/video.

Figure 29.13 shows a media server and RTSP

29.42
1. The HTTP client accesses the Web server using a GET
message.
2. The information about the metafile comes in the response.
3. The metafile is passed to the media player.
4. The media player sends a SETUP message to create a
connection with the media server.
5. The media server responds.
6. The media player sends a PLAY message to start playing
(downloading).
7. The audio/video file is downloaded using another
protocol that runs over UDP.
8. The connection is broken using the TEARDOWN
message.
9. The media server responds.
29.43
Figure 29.13 Using a media server and RTSP

29.44
29-4 STREAMING LIVE AUDIO/VIDEO

Streaming live audio/video is similar to the


broadcasting of audio and video by radio and TV
stations. Instead of broadcasting to the air, the stations
broadcast through the Internet. There are several
similarities between streaming stored audio/video and
streaming live audio/video. They are both sensitive to
delay; neither can accept retransmission. However,
there is a difference. In the first application, the
communication is unicast and on-demand. In the
second, the communication is multicast and live.

29.45
29-5 REAL-TIME INTERACTIVE
AUDIO/VIDEO
In real-time interactive audio/video, people
communicate with one another in real time. The
Internet phone or voice over IP is an example of this
type of application. Video conferencing is another
example that allows people to communicate visually
and orally.

Topics discussed in this section:


Characteristics

29.46
Figure 29.14 Time relationship

29.47
Note

Jitter is introduced in real-time data by


the delay between packets.

29.48
Figure 29.15 Jitter

29.49
Figure 29.16 Timestamp

29.50
Note

To prevent jitter, we can time-stamp the


packets and separate the arrival time
from the playback time.

29.51
Figure 29.17 Playback buffer

29.52
Note

A playback buffer is required for real-


time traffic.

29.53
Note

A sequence number on each packet is


required for real-time traffic.

29.54
Note

Real-time traffic needs the support of


multicasting.

29.55
Note

Translation means changing the


encoding of a payload to a lower
quality to match the bandwidth
of the receiving network.

29.56
Note

Mixing means combining several


streams of traffic into one stream.

29.57
Note

TCP, with all its sophistication, is not


suitable for interactive multimedia
traffic because we cannot allow
retransmission of packets.

29.58
Note

UDP is more suitable than TCP for


interactive traffic. However, we
need the services of RTP,
another transport layer
protocol, to make up
for the deficiencies
of UDP.

29.59
29-6 RTP

Real-time Transport Protocol (RTP) is the protocol


designed to handle real-time traffic on the Internet.
RTP does not have a delivery mechanism; it must be
used with UDP. RTP stands between UDP and the
application program. The main contributions of RTP
are time-stamping, sequencing, and mixing facilities.

Topics discussed in this section:


RTP Packet Format
UDP Port

29.60
Figure 29.18 RTP

29.61
RTP packet header format
• version (2 bits) - RTP version number, always 2;
• padding (1 bit) - if set, the packet contains padding
bytes at the end of the payload; the last byte of padding
contains how many padding bytes should be ignored;
• extension (1 bit) - if set, the RTP header is followed by
an extension header;
• CSRC count (4 bits) - number of CSRCs (contributing
sources) following the fixed header;
• marker (1 bit) - the interpretation is defined by a profile;
• payload type (7 bits) - specifies the format of the
payload and is defined by an RTP profile;

29.62
Figure 29.19 RTP packet header format

29.63
• sequence number (16 bits) - the sequence number of the
packet; the sequence number is incremented with each
packet and it can be used by the receiver to detect packet
losses;
• timestamp (32 bits) - reflects the sampling instance of
the first byte of the RTP data packet; the timestamp must
be generated by a monotonically and linearly increasing
clock;
• synchronization source (SSRC) (32 bits) - identifies the
source of the real-time data carried by the packet;
• contributing sources (CSRC) (32 bits) - identifies a
maximum of 15 additional contributing sources for the
payload of this RTP packet.

29.64
Table 20.1 Payload types

29.65
Note

RTP uses a temporary even-numbered


UDP port.

29.66
29-7 RTCP

RTP allows only one type of message, one that carries


data from the source to the destination. In many cases,
there is a need for other messages in a session. These
messages control the flow and quality of data and
allow the recipient to send feedback to the source
or sources. Real-time Transport Control Protocol
(RTCP) is a protocol designed for this purpose.
Topics discussed in this section:
Sender Report and Receiver Report
Messages
UDP Port

29.67
Figure 29.20 RTCP message types

29.68
Note

RTCP uses an odd-numbered UDP port


number that follows the port number
selected for RTP.

29.69
29-8 VOICE OVER IP

Let us concentrate on one real-time interactive


audio/video application: voice over IP, or Internet
telephony. The idea is to use the Internet as a
telephone network with some additional capabilities.
Two protocols have been designed to handle this type
of communication: SIP and H.323.

Topics discussed in this section:


SIP
H.323

29.70
Session Initiation Protocol (SIP)

• The Session Initiation Protocol (SIP) was


designed by IETF.
• It is an application layer protocol that establishes,
manages, and terminates a multimedia session (call).
• It can be used to create two-party, multiparty, or
multicast sessions.
• SIP is designed to be independent of the underlying
transport layer; it can run on UDP, TCP, or SCTP
• SIP is a text-based protocol, as is HTTP.
• SIP, like HTTP, uses messages.

29.71
Figure 29.21 SIP messages

29.72
Each message has a header and a body. The header
consists of several lines that describe the structure of the
message, caller’s capability, media type, and so on.
• The caller initializes a session with the INVITE
message.
• After the callee answers the call, the caller sends an
ACK message for confirmation.
• The BYE message terminates a session.
• The OPTIONS message queries a machine about
• its capabilities.
• The CANCEL message cancels an already started
initialization process.
• The REGISTER message makes a connection when
the callee is not available.
29.73
Figure 29.22 SIP formats

29.74
SIP Addresses

In a regular telephone communication, a telephone


number identifies the sender, and another telephone
number identifies the receiver SIP is very flexible.

In SIP, an e-mail address, an IP address, a


telephone number, and other types of addresses
can be used to identify the sender and receiver.

However, the address needs to be in SIP format


(also called scheme).

29.75
Figure 29.23 SIP simple session

29.76
• Establishing a Session:
Establishing a session in SIP requires a three-way
handshake. The caller sends an INVITE message, using
UDP, TCP, or SCTP to begin the communication. If the
callee is willing to start the session, first sends a reply
message. To confirm that a reply code has been
received, the caller sends an ACK message.
• Communicating:
After the session has been established, the caller and the
callee can communicate by using two temporary ports.
• Terminating the Session:
The session can be terminated with a BYE message sent by
either party

29.77
• Tracking the Callee
What happens if the callee is not sitting at his/her terminal? He/ She may be away from
his/ her system or at another terminal. He/ She may not even have a fixed IP address if
DHCP is being used.
SIP has a mechanism (similar to one in DNS) that finds the IP address of the terminal at
which the callee is sitting.
To perform this tracking, SIP uses the concept of registration. SIP defines some servers
as registrars.
At any moment a user is registered with at least one registrar server; this server knows
the IP address of the callee.
When a caller needs to communicate with the callee, the caller can use the e-mail
address instead of the IP address in the INVITE message.
The message goes to a proxy server.
The proxy server sends a lookup message (not part of SIP) to some registrar server that
has registered the callee.
When the proxy server receives a reply message from the registrar server, the proxy
server takes the caller’s INVITE message and inserts the newly discovered IP address
of
the callee.
This message is then sent to the callee.
29.78
Figure 29.24 Tracking the callee

29.79
H.323
H.323 is a standard designed by ITU to allow telephones
on the public telephone network to talk to computers
(called terminals in H.323) connected to the Internet.

A gateway connects the Internet to the telephone


network. In general, a gateway is a five-layer device that
can translate a message from one protocol stack to
another. The gateway here does exactly the same thing. It
transforms a telephone network message
to an Internet message.

The gatekeeper server on the local area network plays the


role of the registrar server, as we discussed in the SIP.
29.80
Figure 29.25 H.323 architecture

29.81
Protocols

• H.323 uses a number of protocols to establish and


maintain voice (or video) communication.
• H.323 uses G.71 or G.723.1 for compression.
• It uses a protocol named H.245 which allows the
parties to negotiate the compression method.
• Protocol Q.931 is used for establishing and
terminating connections.
• Another protocol called H.225, or RAS
(Registration/Administration/Status), is used for
registration with the gatekeeper.

29.82
Figure 29.26 H.323 protocols

29.83
Operation:

Let us show the operation of a telephone communication


using H.323 with a simple example.
1. The terminal sends a broadcast message to the gatekeeper. The gatekeeper
responds with its IP address.
2. The terminal and gatekeeper communicate, using H.225 to negotiate bandwidth.
3. The terminal, gatekeeper, gateway, and telephone communicate by using Q.931 to
set up a connection.
4. The terminal, gatekeeper, gateway, and telephone communicate by using H.245 to
negotiate the compression method.
5. The terminal, gateway, and telephone exchange audio by using RTP under
management of RTCP.
6. The terminal, gatekeeper, gateway, and telephone communicate by using Q.931to
terminate the communication.

29.84
Figure 29.27 H.323 example

29.85

You might also like