Multimedia
Multimedia
29.2
Note
29.3
Note
29.4
Note
29.5
29-1 DIGITIZING AUDIO AND VIDEO
29.6
Note
29.7
29-2 AUDIO AND VIDEO COMPRESSION
29.8
Audio Compression
29.9
1. Predictive Encoding
29.10
2. Perceptual Encoding: MP3
29.11
Video Compression
29.12
Image Compression: JPEG
29.13
Figure 29.2 JPEG gray scale
29.14
Figure 29.3 JPEG process
29.15
Discrete Cosine Transform (DCT):
29.16
Case I:
In this case, we have a block of uniform gray, and the
value of each pixel is 20.
When we do the transformations, we get a nonzero value
for the first element (upper left corner); the rest of the
pixels have a value of 0.
The value of T(0, 0) is the average (multiplied by a
constant) of the P(x,y) values and is called the dc value
(direct current, borrowed from electrical engineering).
The rest of the values, called ac values, in T(m,n)
represent changes in the pixel values.
But because there are no changes, the rest of the values
are 0s
29.17
Figure 29.4 Case 1: uniform gray scale
29.18
Case 2:
In the second case, we have a block with two different
uniform gray scale sections.
29.20
Case 3:
29.21
Figure 29.6 Case 3: gradient gray scale
29.22
From the above 3 case, we can state the
following:
29.23
Quantization:
29.24
Compression:
29.25
Figure 29.7 Reading the table
29.26
Video Compression: MPEG
The Moving Picture Experts Group method is used
to compress video.
In principle, a motion picture is a rapid flow of a
set of frames, where each frame is an image.
In other words, a frame is a spatial combination of
pixels, and a video is a temporal combination of
frames that are sent one after another.
Compressing video, then, means spatially
compressing each frame and temporally
compressing a set of frames
29.27
Spatial Compression:
The spatial compression of each frame is done with
JPEG (or a modification of it). Each frame is a picture
that can be independently compressed.
Temporal Compression:
In temporal compression, redundant frames are
removed. When we watch television, we receive 50 frames
per second. However, most of the consecutive frames are
almost the same. For example, when someone is talking,
most of the frame is the same as the previous one except
for the segment of the frame around the lips, which
changes from one frame to another. To temporally
compress data, the MPEG method first divides frames into
three categories: I-frames, P-frames, and B-frames
29.28
Figure 29.8 MPEG frames
29.29
Intracoded frame (I-frame)
• Independence: I-frames are self-contained and do not
rely on other frames for decoding.
• Regular Intervals: They appear at regular intervals in a
video stream (e.g., every ninth frame).
• Handling Changes: I-frames handle sudden changes in
the video content that other frame types might not
capture effectively.
• Broadcast Scenarios: Essential for broadcast scenarios
as they allow viewers who tune in late to see a complete
picture from the latest I-frame.
• No Dependencies: They cannot be constructed from
previous or future frames; they are complete on their
own.
29.30
Predicted frame (P-frame)
• Dependency: P-frames are related to and depend on
preceding I-frames or P-frames.
• Change Representation: They contain only the
changes from the previous frame, rather than the entire
image.
• Segment Limitation: They may struggle to capture
significant changes, such as those involving fast-
moving objects.
• Construction: P-frames can only be constructed from
previous I- or P-frames.
• Data Efficiency: They carry less information
compared to other frame types and are more
compressed.
29.31
Bidirectional frame (B-frame)
29.32
Figure 29.9 MPEG frame construction
This shows how I-, P-, and B-frames are constructed from a series of seven
frames.
29.33
29-3 STREAMING STORED AUDIO/VIDEO
29.34
First Approach: Using a Web Server
29.35
Figure 29.10 Using a Web server
29.36
Second Approach: Using a Web Server
with Metafile
The media player is directly connected to the Web server for
downloading the audio/video file. The Web server stores two files: the
actual audio/video file and a metafile that holds information about the
audio/video file.
1. The HTTP client accesses the Web server using the
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the metafile to access
the audio/video file.
5. The Web server responds
29.37
Figure 29.11 Using a Web server with a metafile
29.38
Third Approach: Using a Media Server
The problem with the second approach is that the browser
and the media player both use the services of HTTP.
HTTP is designed to run over TCP.
This is appropriate for retrieving the metafile, but not for
retrieving the audio/video file.
The reason is that TCP retransmits a lost or damaged
segment, which is counter to the philosophy of streaming.
We need to dismiss TCP and its error control; we need to use
UDP.
However, HTTP, which accesses the Web server, and the
Web server itself are designed for TCP; we need another
server, a media server.
29.39
Steps
1. The HTTP client accesses the Web server using a
GET message.
2. The information about the metafile comes in the
response.
3. The metafile is passed to the media player.
4. The media player uses the URL in the metafile to
access the media server to download the file.
Downloading can take place by any protocol that uses
UDP.
5. The media server responds.
29.40
Figure 29.12 Using a media server
29.41
Fourth Approach: Using a Media Server
and RTSP
29.42
1. The HTTP client accesses the Web server using a GET
message.
2. The information about the metafile comes in the response.
3. The metafile is passed to the media player.
4. The media player sends a SETUP message to create a
connection with the media server.
5. The media server responds.
6. The media player sends a PLAY message to start playing
(downloading).
7. The audio/video file is downloaded using another
protocol that runs over UDP.
8. The connection is broken using the TEARDOWN
message.
9. The media server responds.
29.43
Figure 29.13 Using a media server and RTSP
29.44
29-4 STREAMING LIVE AUDIO/VIDEO
29.45
29-5 REAL-TIME INTERACTIVE
AUDIO/VIDEO
In real-time interactive audio/video, people
communicate with one another in real time. The
Internet phone or voice over IP is an example of this
type of application. Video conferencing is another
example that allows people to communicate visually
and orally.
29.46
Figure 29.14 Time relationship
29.47
Note
29.48
Figure 29.15 Jitter
29.49
Figure 29.16 Timestamp
29.50
Note
29.51
Figure 29.17 Playback buffer
29.52
Note
29.53
Note
29.54
Note
29.55
Note
29.56
Note
29.57
Note
29.58
Note
29.59
29-6 RTP
29.60
Figure 29.18 RTP
29.61
RTP packet header format
• version (2 bits) - RTP version number, always 2;
• padding (1 bit) - if set, the packet contains padding
bytes at the end of the payload; the last byte of padding
contains how many padding bytes should be ignored;
• extension (1 bit) - if set, the RTP header is followed by
an extension header;
• CSRC count (4 bits) - number of CSRCs (contributing
sources) following the fixed header;
• marker (1 bit) - the interpretation is defined by a profile;
• payload type (7 bits) - specifies the format of the
payload and is defined by an RTP profile;
29.62
Figure 29.19 RTP packet header format
29.63
• sequence number (16 bits) - the sequence number of the
packet; the sequence number is incremented with each
packet and it can be used by the receiver to detect packet
losses;
• timestamp (32 bits) - reflects the sampling instance of
the first byte of the RTP data packet; the timestamp must
be generated by a monotonically and linearly increasing
clock;
• synchronization source (SSRC) (32 bits) - identifies the
source of the real-time data carried by the packet;
• contributing sources (CSRC) (32 bits) - identifies a
maximum of 15 additional contributing sources for the
payload of this RTP packet.
29.64
Table 20.1 Payload types
29.65
Note
29.66
29-7 RTCP
29.67
Figure 29.20 RTCP message types
29.68
Note
29.69
29-8 VOICE OVER IP
29.70
Session Initiation Protocol (SIP)
29.71
Figure 29.21 SIP messages
29.72
Each message has a header and a body. The header
consists of several lines that describe the structure of the
message, caller’s capability, media type, and so on.
• The caller initializes a session with the INVITE
message.
• After the callee answers the call, the caller sends an
ACK message for confirmation.
• The BYE message terminates a session.
• The OPTIONS message queries a machine about
• its capabilities.
• The CANCEL message cancels an already started
initialization process.
• The REGISTER message makes a connection when
the callee is not available.
29.73
Figure 29.22 SIP formats
29.74
SIP Addresses
29.75
Figure 29.23 SIP simple session
29.76
• Establishing a Session:
Establishing a session in SIP requires a three-way
handshake. The caller sends an INVITE message, using
UDP, TCP, or SCTP to begin the communication. If the
callee is willing to start the session, first sends a reply
message. To confirm that a reply code has been
received, the caller sends an ACK message.
• Communicating:
After the session has been established, the caller and the
callee can communicate by using two temporary ports.
• Terminating the Session:
The session can be terminated with a BYE message sent by
either party
29.77
• Tracking the Callee
What happens if the callee is not sitting at his/her terminal? He/ She may be away from
his/ her system or at another terminal. He/ She may not even have a fixed IP address if
DHCP is being used.
SIP has a mechanism (similar to one in DNS) that finds the IP address of the terminal at
which the callee is sitting.
To perform this tracking, SIP uses the concept of registration. SIP defines some servers
as registrars.
At any moment a user is registered with at least one registrar server; this server knows
the IP address of the callee.
When a caller needs to communicate with the callee, the caller can use the e-mail
address instead of the IP address in the INVITE message.
The message goes to a proxy server.
The proxy server sends a lookup message (not part of SIP) to some registrar server that
has registered the callee.
When the proxy server receives a reply message from the registrar server, the proxy
server takes the caller’s INVITE message and inserts the newly discovered IP address
of
the callee.
This message is then sent to the callee.
29.78
Figure 29.24 Tracking the callee
29.79
H.323
H.323 is a standard designed by ITU to allow telephones
on the public telephone network to talk to computers
(called terminals in H.323) connected to the Internet.
29.81
Protocols
29.82
Figure 29.26 H.323 protocols
29.83
Operation:
29.84
Figure 29.27 H.323 example
29.85