Ai Notes2
Ai Notes2
Contents
UNIT-I: DATA COMMUNICATION COMPONENTS.................................................................................7
1. REPRESENTATION OF DATA: ANALOG VS. DIGITAL SIGNALS ....................................................................................... 7
1.1 Data Communication and Signals ....................................................................................................... 7
1.2 Analog Signals: Characteristics and Examples ..................................................................................... 7
1.3 Digital Signals: Characteristics and Examples ...................................................................................... 8
1.4 Analog-to-Digital Conversion .............................................................................................................. 8
1.5 Digital-to-Analog Conversion .............................................................................................................. 9
2. DATA ENCODING ........................................................................................................................................... 9
2.1 Importance of Data Encoding.............................................................................................................. 9
2.2 ASCII (American Standard Code for Information Interchange) .............................................................. 9
.............................................................................................................................................................. 10
2.3 Unicode and Multibyte Character Encoding ....................................................................................... 10
3. DATA FLOW NETWORKS ................................................................................................................................ 11
3.1 Simplex Communication ................................................................................................................... 11
3.2 Half-duplex Communication .............................................................................................................. 11
3.3 Full-duplex Communication............................................................................................................... 12
3.4 Comparison and Use Cases ............................................................................................................... 12
4. NETWORKS ................................................................................................................................................ 13
4.1 LAN (Local Area Network) ................................................................................................................. 13
4.2 MAN (Metropolitan Area Network) ................................................................................................... 13
4.3 WAN (Wide Area Network) ............................................................................................................... 14
4. CONNECTION TOPOLOGY ............................................................................................................................... 15
4.1 Bus Topology .................................................................................................................................... 15
4.2 Star Topology ................................................................................................................................... 15
4.3 Ring Topology ................................................................................................................................... 16
4.4 Mesh Topology ................................................................................................................................. 16
4.5 Hybrid Topology ............................................................................................................................... 17
5. PROTOCOLS AND STANDARDS ......................................................................................................................... 18
5.1 Importance of Protocols in Data Communication ............................................................................... 18
5.2 Overview of TCP/IP Protocol Suite ..................................................................................................... 18
5.3 Introduction to Ethernet and IEEE 802.3 ............................................................................................ 18
5.4 Other Networking Protocols.............................................................................................................. 19
6. OSI MODEL ............................................................................................................................................ 20
6.1 Seven Layers of the OSI Model and Their Functions ........................................................................... 20
6.2 Explanation of each layer: Physical, Data Link, Network, Transport, Session, Presentation, and
Application............................................................................................................................................. 20
6.3 Encapsulation and De-encapsulation................................................................................................. 23
7. TRANSMISSION MEDIA ............................................................................................................................ 24
7.1 Twisted Pair Cable: UTP vs. STP, Categories ....................................................................................... 24
2
FORWARDING .............................................................................................................................................. 70
UNICAST ROUTING PROTOCOLS ................................................................................................................... 71
UNIT-4 TRANSPORT LAYER ............................................................................................................................. 73
PROCESS TO PROCESS COMMUNICATION ...................................................................................................... 73
USER DATAGRAM PROTOCOL (UDP) ............................................................................................................ 74
TRANSMISSION CONTROL PROTOCOL (TCP)................................................................................................. 75
THREE-WAY HANDSHAKE AND CONNECTION TERMINATION ......................................................................... 77
STREAM CONTROL TRANSMISSION PROTOCOL (SCTP) ................................................................................. 79
CONGESTION CONTROL ............................................................................................................................... 80
EXPLICIT CONGESTION NOTIFICATION (ECN) .............................................................................................. 81
QUALITY OF SERVICE (QOS) ........................................................................................................................ 82
QOS IMPROVING TECHNIQUES ..................................................................................................................... 83
UNIT-V: APPLICATION LAYER .......................................................................................................................... 85
DOMAIN NAME SPACE (DNS) ...................................................................................................................... 85
DYNAMIC DNS (DDNS) ............................................................................................................................. 87
TELNET .................................................................................................................................................... 89
FILE TRANSFER PROTOCOL (FTP) ................................................................................................................ 90
WORLD WIDE WEB (WWW) ....................................................................................................................... 91
SIMPLE NETWORK MANAGEMENT PROTOCOL (SNMP)................................................................................. 92
BLUETOOTH ............................................................................................................................................... 93
.................................................................................................................................................................. 94
FIREWALLS .................................................................................................................................................... 94
UNIT-I: INTRODUCTION TO MACHINE LEARNING........................................................................... 96
1. DEFINITION OF MACHINE LEARNING ........................................................................................................ 96
2. TYPES OF MACHINE LEARNING ................................................................................................................ 96
3. APPLICATIONS OF MACHINE LEARNING .................................................................................................... 98
4. CHALLENGES AND ISSUES IN MACHINE LEARNING .................................................................................... 99
5. UNDERSTANDING THE MACHINE LEARNING WORKFLOW ........................................................................ 100
4
Computer Networks
UNIT-I: Data Communication Components
1. Representation of Data: Analog vs. Digital Signals
1.1 Data Communication and Signals
Data communication is the process of transferring information from one point to another us-
ing a medium such as a wire, cable, or wireless channel.
A signal is a variation of a physical quantity that conveys information. Signals can be classi-
fied into two types: analog and digital.
Analog signals are continuous signals that can have any value within a range. For example,
sound waves, light waves, and temperature are analog signals.
Digital signals are discrete signals that can have only two values: 0 or 1. For example, binary
numbers, Morse code, and on-off switches are digital signals.
Analog signals have four main characteristics: amplitude, frequency, phase, and bandwidth.
Amplitude is the height or strength of the signal. It is measured in volts (V) or decibels (dB).
Frequency is the number of cycles or oscillations of the signal per second. It is measured in
hertz (Hz) or kilohertz (kHz).
Phase is the position or angle of the signal relative to a reference point. It is measured in de-
grees (°) or radians (rad).
Bandwidth is the range of frequencies that the signal occupies. It is measured in hertz (Hz) or
kilohertz (kHz).
Some examples of analog signals are:
o Voice: Voice is an analog signal that varies in amplitude and frequency according to
the sound produced by the speaker. The human voice has a frequency range of about
300 Hz to 3400 Hz.
o Music: Music is an analog signal that consists of multiple frequencies and amplitudes
that create different sounds and melodies. The musical notes have a frequency range
of about 20 Hz to 20 kHz.
o Video: Video is an analog signal that consists of multiple frames or images that
change rapidly to create motion. Each frame has a different amplitude and frequency
that represent the color and brightness of the pixels. The video signal has a bandwidth
of about 4.2 MHz.
8
Digital signals have two main characteristics: bit rate and bit interval.
Bit rate is the number of bits (0 or 1) transmitted per second. It is measured in bits per second
(bps) or megabits per second (Mbps).
Bit interval is the time required to transmit one bit. It is measured in seconds (s) or millisec-
onds (ms).
Some examples of digital signals are:
o Text: Text is a digital signal that consists of a sequence of characters encoded using a
standard code such as ASCII or Unicode. Each character has a fixed number of bits
that represent its value. For example, ASCII uses 7 bits to encode 128 characters,
while Unicode uses 8, 16, or 32 bits to encode over a million characters.
o Image: Image is a digital signal that consists of a matrix of pixels that represent the
color and brightness of the picture. Each pixel has a fixed number of bits that repre-
sent its value. For example, a black-and-white image uses 1 bit per pixel, while a
color image uses 8, 16, or 24 bits per pixel.
o Audio: Audio is a digital signal that consists of a sequence of samples that represent
the amplitude and frequency of the sound wave. Each sample has a fixed number of
bits that represent its value. For example, CD-quality audio uses 16 bits per sample at
a sampling rate of 44.1 kHz.
Analog-to-digital conversion (ADC) is the process of converting an analog signal into a digi-
tal signal.
ADC involves two steps: sampling and quantization.
Sampling is the process of taking snapshots or measurements of the analog signal at regular
intervals called sampling rate. The sampling rate determines how often the analog signal is
sampled and how accurate the digital signal is. The higher the sampling rate, the more sam-
ples are taken and the more accurate the digital signal is. However, higher sampling rates also
require more bandwidth and storage space.
Quantization is the process of assigning a discrete value or level to each sample based on its
amplitude. The quantization level determines how many bits are used to represent each sam-
ple and how precise the digital signal is. The higher the quantization level, the more bits are
used and the more precise the digital signal is. However, higher quantization levels also in-
crease the noise and distortion in the digital signal.
ADC is used for various applications such as digital audio recording, digital photography, dig-
ital video, and digital communication.
9
Digital-to-analog conversion (DAC) is the process of converting a digital signal into an ana-
log signal.
DAC involves two steps: reconstruction and filtering.
Reconstruction is the process of generating a continuous signal from the discrete samples us-
ing a technique called interpolation. Interpolation is the process of filling in the gaps between
the samples using a mathematical function or a curve. The interpolation function determines
how smooth and accurate the analog signal is. The most common interpolation function is the
linear interpolation, which connects the samples with straight lines.
Filtering is the process of removing the unwanted frequencies or noise from the reconstructed
signal using a device called a filter. A filter is a circuit that allows only certain frequencies to
pass through and blocks others. The filter determines how clean and clear the analog signal is.
The most common filter is the low-pass filter, which allows only the low frequencies to pass
through and blocks the high frequencies.
DAC is used for various applications such as digital audio playback, digital video display, and
analog communication.
2. Data Encoding
2.1 Importance of Data Encoding
Data encoding is the process of transforming data into a format that can be transmitted,
stored, or processed by a system.
Data encoding is important for several reasons:
o It enables data compression, which reduces the size of data and saves bandwidth and
storage space.
o It enables data encryption, which protects data from unauthorized access and modifi-
cation.
o It enables data error detection and correction, which ensures data integrity and relia-
bility.
o It enables data modulation, which adapts data to the characteristics of the transmis-
sion medium.
ASCII is a standard code that assigns a 7-bit binary number to each character in the English
alphabet, digits, punctuation marks, and some control characters.
ASCII can encode 128 characters in total, with values ranging from 0 to 127.
ASCII is widely used for text-based communication and data processing.
Some examples of ASCII characters and their binary codes are:
A 01000001
B 01000010
C 01000011
1 00110001
10
2 00110010
3 00110011
! 00100001
? 00111111
\n 00001010
Unicode is a standard code that assigns a unique number to each character in almost every
language and writing system in the world.
Unicode can encode over a million characters in total, with values ranging from 0 to 1114111
(hexadecimal 10FFFF).
Unicode uses different formats to represent characters using different numbers of bytes. These
formats are called UTF (Unicode Transformation Format).
Some examples of UTF formats are:
o UTF-8: This format uses 1 to 4 bytes to encode each character. It is compatible with
ASCII for the first 128 characters. It is widely used for web pages and email mes-
sages.
o UTF-16: This format uses 2 or 4 bytes to encode each character. It is commonly used
for text files and software applications.
o UTF-32: This format uses 4 bytes to encode each character. It is rarely used because it
requires more space than other formats.
Multibyte character encoding is a general term that refers to any encoding scheme that uses
more than one byte to encode each character. Unicode is an example of multibyte character
encoding, but there are other schemes as well.
Some examples of non-Unicode multibyte character encoding are:
o GB2312: This scheme uses 1 or 2 bytes to encode each character in simplified Chi-
nese. It can encode about 7400 characters in total.
o Shift-JIS: This scheme uses 1 or 2 bytes to encode each character in Japanese. It can
encode about 7000 characters in total.
11
Simplex communication is a type of data flow network that allows data to flow in only one
direction, from the sender to the receiver.
Simplex communication is simple and cheap, but it has limited functionality and efficiency.
Some examples of simplex communication are:
o Radio broadcast: The radio station transmits audio signals to the listeners, but the lis-
teners cannot send any feedback to the station.
o Keyboard: The keyboard sends keystrokes to the computer, but the computer does not
send any signals back to the keyboard.
o Printer: The computer sends print commands to the printer, but the printer does not
send any status information back to the computer.
Half-duplex communication is a type of data flow network that allows data to flow in both
directions, but not at the same time. The sender and the receiver have to take turns to transmit
and receive data.
Half-duplex communication is more flexible and efficient than simplex communication, but it
still has some drawbacks such as delay and collision.
Some examples of half-duplex communication are:
o Walkie-talkie: The users can talk and listen to each other, but they have to press a but-
ton to switch between transmitting and receiving modes.
o Ethernet: The devices can send and receive data over a shared cable, but they have to
use a protocol called CSMA/CD (Carrier Sense Multiple Access with Collision De-
tection) to avoid interfering with each other’s transmissions.
o Bluetooth: The devices can exchange data wirelessly, but they have to use a protocol
called TDD (Time Division Duplexing) to divide the time into slots for transmitting
and receiving.
12
Full-duplex communication is a type of data flow network that allows data to flow in both di-
rections simultaneously. The sender and the receiver can transmit and receive data at the same
time without any interruption or interference.
Full-duplex communication is the most advanced and efficient type of data flow network, but
it also requires more complex and expensive hardware and software.
Some examples of full-duplex communication are:
o Telephone: The users can talk and listen to each other at the same time without any
delay or noise.
o Fiber-optic cable: The devices can send and receive data over a pair of light signals
that travel in opposite directions without any collision or attenuation.
o Wi-Fi: The devices can exchange data wirelessly using a protocol called OFDM (Or-
thogonal Frequency Division Multiplexing) that splits the frequency into subcarriers
for transmitting and receiving.
The table below summarizes the main features and advantages of each type of data flow net-
work:
The choice of data flow network depends on the requirements and constraints of the applica-
tion. Some factors that influence the decision are:
o Data rate: How fast does the data need to be transmitted or received?
o Data volume: How much data needs to be transmitted or received?
o Data quality: How important is the accuracy and reliability of the data?
13
o Data security: How sensitive is the data and how vulnerable is it to unauthorized ac-
cess or modification?
o Cost: How much budget is available for the hardware and software components?
o Availability: How easy is it to obtain and maintain the hardware and software compo-
nents?
4. Networks
4.1 LAN (Local Area Network)
A LAN is a network that connects devices within a small geographic area, such as a home,
office, or campus.
A LAN typically uses wired or wireless technologies, such as Ethernet, Wi-Fi, or Bluetooth,
to transmit data over short distances at high speeds.
A LAN usually has a single owner or administrator who controls the network configuration
and security policies.
A LAN can support various applications, such as file sharing, printer sharing, email, web
browsing, video conferencing, gaming, etc.
A MAN is a network that connects devices within a large geographic area, such as a city or a
region.
A MAN typically uses fiber-optic cables, microwave links, or satellite links to transmit data
over long distances at moderate speeds.
A MAN usually has multiple owners or operators who cooperate or compete with each other
to provide network services and resources.
A MAN can support various applications, such as telephony, cable TV, internet access, public
safety, etc.
14
A WAN is a network that connects devices across a vast geographic area, such as a country or
a continent.
A WAN typically uses a combination of wired and wireless technologies, such as copper
wires, fiber-optic cables, microwave links, satellite links, cellular networks, etc., to transmit
data over very long distances at low speeds.
A WAN usually has many owners or operators who follow international standards and proto-
cols to ensure interoperability and compatibility among different networks.
A WAN can support various applications, such as email, web browsing, online shopping, so-
cial media, etc.
15
4. Connection Topology
4.1 Bus Topology
A bus topology is a type of connection topology that connects all the devices on a network
using a single cable called a bus.
A bus topology has the following advantages and disadvantages:
o Advantages:
It is simple and cheap to install and maintain.
It can easily accommodate new devices by adding more taps or connectors
to the bus.
It does not require any central device or switch to control the data flow.
o Disadvantages:
It has low performance and reliability due to signal degradation and interfer-
ence.
It has low security and privacy due to data broadcast and snooping.
It has low scalability and fault tolerance due to limited cable length and sin-
gle point of failure.
A star topology is a type of connection topology that connects all the devices on a network
to a central device called a hub or a switch.
A star topology has the following advantages and disadvantages:
o Advantages:
It has high performance and reliability due to dedicated links and isolation.
It has high security and privacy due to data transmission and filtering.
It has high scalability and fault tolerance due to easy addition and removal of
devices and multiple paths.
o Disadvantages:
It is complex and expensive to install and maintain.
It requires a central device or switch to control the data flow.
It depends on the functionality and availability of the central device or
switch.
16
A ring topology is a type of connection topology that connects all the devices on a network in
a circular fashion using a single cable.
A ring topology has the following advantages and disadvantages:
o Advantages:
It is simple and cheap to install and maintain.
It does not require any central device or switch to control the data flow.
It can support high data rates and long distances due to signal regeneration.
o Disadvantages:
It has low performance and reliability due to data collision and delay.
It has low security and privacy due to data circulation and snooping.
It has low scalability and fault tolerance due to limited cable length and sin-
gle point of failure.
A mesh topology is a type of connection topology that connects all the devices on a network
directly or indirectly using multiple cables.
A mesh topology has the following advantages and disadvantages:
o Advantages:
It has high performance and reliability due to dedicated links and redun-
dancy.
It has high security and privacy due to data encryption and routing.
It has high scalability and fault tolerance due to easy addition and removal of
devices and multiple paths.
17
o Disadvantages:
It is complex and expensive to install and maintain.
It requires a lot of cables, ports, and switches to connect all the devices.
It may cause network congestion and overhead due to excessive routing.
A hybrid topology is a type of connection topology that combines two or more different to-
pologies to form a network.
A hybrid topology has the following advantages and disadvantages:
o Advantages:
It can leverage the benefits of different topologies according to the needs of
the network.
It can provide flexibility, diversity, and compatibility among different net-
works.
It can enhance the performance, reliability, security, scalability, and fault tol-
erance of the network by using appropriate topologies for different seg-
ments or layers of the network.
o Disadvantages:
It is complex and expensive to design, install, maintain, and troubleshoot.
It requires careful planning, coordination, integration, and management
among different topologies, devices, protocols, standards, etc.
18
A protocol is a set of rules or conventions that governs how data is exchanged between two
or more entities on a network.
A protocol defines the format, structure, content, meaning, timing, sequence, order, direc-
tion, error handling, etc., of data transmission or reception.
A protocol ensures that data communication is consistent, reliable, efficient, secure, interop-
erable, compatible, etc., among different entities on a network.
Some examples of protocols are HTTP (Hypertext Transfer Protocol), FTP (File Transfer Proto-
col), SMTP (Simple Mail Transfer Protocol), TCP (Transmission Control Protocol), IP (Internet
Protocol), etc.
TCP/IP protocol suite is a collection of protocols that enables data communication over the
internet or any other network that follows the internet standards.
TCP/IP protocol suite consists of four layers: application layer, transport layer, internet layer,
and network access layer.
Each layer performs a specific function and interacts with the adjacent layers using well-de-
fined interfaces.
Each layer uses one or more protocols to perform its function and provides services to the
upper layer or receives services from the lower layer.
The figure below shows the TCP/IP protocol suite and some of its protocols:
Ethernet is a family of protocols that defines how data is transmitted and received over a
LAN using a bus or a star topology.
Ethernet is based on the IEEE 802.3 standard, which specifies the physical and data link lay-
ers of the TCP/IP protocol suite.
Ethernet uses a technique called CSMA/CD (Carrier Sense Multiple Access with Collision De-
tection) to share the medium and avoid collisions among multiple devices on a network.
19
Ethernet supports various data rates, such as 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps, etc., de-
pending on the type and quality of the cable, the length of the cable, the number of devices,
etc.
Ethernet uses a 48-bit address called MAC (Media Access Control) address to identify each
device on a network.
Ethernet uses a frame format to encapsulate data packets from the upper layer and add
header and trailer information for transmission and reception.
The figure below shows the Ethernet frame format:
Besides TCP/IP and Ethernet, there are many other networking protocols that are used for
different purposes, such as:
o IPX/SPX: This protocol suite is used for data communication over Novell NetWare
networks. It provides connectionless and connection-oriented services at the net-
work and transport layers, respectively.
o NetBEUI: This protocol is used for data communication over Microsoft Windows net-
works. It provides connectionless service at the transport layer and relies on MAC
addresses for addressing.
o ATM (Asynchronous Transfer Mode): This protocol is used for data communication
over high-speed networks that use fiber-optic cables or satellite links. It provides
connection-oriented service at the network layer and uses fixed-length cells for
transmission and switching.
o MPLS (Multiprotocol Label Switching): This protocol is used for data communication
over WANs that use different underlying technologies, such as IP, ATM, Frame Relay,
etc. It provides connection-oriented service at the network layer and uses labels for
routing and forwarding.
20
6. OSI Model
6.1 Seven Layers of the OSI Model and Their Functions
OSI (Open Systems Interconnection) model is a conceptual framework that defines how data
communication occurs between different systems or devices on a network.
OSI model consists of seven layers, each of which performs a specific function and interacts
with the adjacent layers using well-defined interfaces.
The seven layers of the OSI model are:
7 Application Provides services and interfaces to the user applications, such as email, web browsing,
file transfer, etc.
6 Presenta- Translates, encrypts, compresses, and formats the data for transmission or reception.
tion
5 Session Establishes, maintains, and terminates the connection between the communicating en-
tities.
4 Transport Ensures reliable and efficient delivery of data between the source and destination.
3 Network Determines the best path and routes the data packets across different networks.
2 Data Link Transfers data frames between adjacent nodes on the same network.
1 Physical Transmits and receives raw bits over the physical medium.
6.2 Explanation of each layer: Physical, Data Link, Network, Transport, Session, Presenta-
tion, and Application
Physical layer: This layer is responsible for converting the digital data into electrical, optical,
or radio signals and vice versa. It also defines the characteristics of the physical medium, such
as voltage levels, frequency range, modulation scheme, connector type, cable type, etc. Some
examples of physical layer protocols are RS-232, V.35, Ethernet, Wi-Fi, etc.
21
Data Link layer: This layer is responsible for transferring data frames between adjacent nodes
on the same network. It also provides error detection and correction, flow control, and me-
dium access control. It consists of two sublayers: logical link control (LLC) and media access
control (MAC). LLC provides services to the upper layer and controls the frame synchroniza-
tion and sequencing. MAC provides services to the lower layer and controls the access to the
shared medium. Some examples of data link layer protocols are Ethernet, HDLC (High-Level
Data Link Control), PPP (Point-to-Point Protocol), etc.
Network layer: This layer is responsible for determining the best path and routing the data
packets across different networks. It also provides logical addressing, fragmentation and reas-
sembly, congestion control, and network management. Some examples of network layer pro-
tocols are IP (Internet Protocol), ICMP (Internet Control Message Protocol), ARP (Address
Resolution Protocol), RIP (Routing Information Protocol), OSPF (Open Shortest Path First),
etc.
Transport layer: This layer is responsible for ensuring reliable and efficient delivery of data
between the source and destination. It also provides port addressing, segmentation and reas-
sembly, flow control, error control, and connection management. Some examples of transport
layer protocols are TCP (Transmission Control Protocol), UDP (User Datagram Protocol),
SCTP (Stream Control Transmission Protocol), etc.
22
Session layer: This layer is responsible for establishing, maintaining, and terminating the con-
nection between the communicating entities. It also provides synchronization, dialogue con-
trol, session recovery, and authentication. Some examples of session layer protocols are RPC
(Remote Procedure Call), NFS (Network File System), SQL (Structured Query Language),
etc.
Presentation layer: This layer is responsible for translating, encrypting, compressing, and for-
matting the data for transmission or reception. It also provides data representation, conver-
sion, encryption, compression, and decompression. It also provides character set conversion,
data encryption, data compression, and data formatting. Some examples of presentation layer
protocols are ASCII, Unicode, JPEG, MPEG, SSL (Secure Sockets Layer), etc.
23
Application layer: This layer is responsible for providing services and interfaces to the user
applications, such as email, web browsing, file transfer, etc. It also provides network access,
resource sharing, remote access, directory services, and network management. Some exam-
ples of application layer protocols are HTTP (Hypertext Transfer Protocol), FTP (File Trans-
fer Protocol), SMTP (Simple Mail Transfer Protocol), DNS (Domain Name System), SNMP
(Simple Network Management Protocol), etc.
Encapsulation is the process of adding header and trailer information to the data as it moves
down the layers of the OSI model. Each layer adds its own header and trailer to the data re-
ceived from the upper layer. The header contains information such as source and destination
addresses, sequence numbers, error codes, etc. The trailer contains information such as check-
sums, end-of-frame markers, etc.
De-encapsulation is the process of removing header and trailer information from the data as it
moves up the layers of the OSI model. Each layer removes its own header and trailer from the
data received from the lower layer. The header and trailer are used to verify, interpret, and
process the data before passing it to the upper layer.
24
7. Transmission Media
7.1 Twisted Pair Cable: UTP vs. STP, Categories
A twisted pair cable is a type of transmission media that consists of two insulated copper
wires twisted together to reduce electromagnetic interference and crosstalk.
A twisted pair cable can be classified into two types: unshielded twisted pair (UTP) and
shielded twisted pair (STP).
o UTP is a twisted pair cable that does not have any additional shielding or protection.
It is cheaper and easier to install, but it is more susceptible to noise and interference.
o STP is a twisted pair cable that has a metallic shield or foil around each pair or the
entire cable. It is more expensive and difficult to install, but it provides better noise
and interference immunity.
25
o
A twisted pair cable can also be classified into different categories based on its data rate,
bandwidth, and quality. The table below shows some of the common categories of twisted
pair cables:
A coaxial cable is a type of transmission media that consists of a central copper core sur-
rounded by an insulating layer, a braided metal shield, and an outer cover.
A coaxial cable can be classified into two types: thick coaxial cable and thin coaxial cable.
o Thick coaxial cable, also known as 10Base5 or RG-8, is a thick and rigid coaxial ca-
ble that can support data rates up to 10 Mbps and distances up to 500 meters. It is
used for backbone networks and long-distance connections.
o Thin coaxial cable, also known as 10Base2 or RG-58, is a thin and flexible coaxial
cable that can support data rates up to 10 Mbps and distances up to 185 meters. It is
used for local area networks and short-distance connections.
26
o
A coaxial cable has the following advantages and disadvantages:
o Advantages:
It has high bandwidth and data rate compared to twisted pair cables.
It has low attenuation and signal loss compared to twisted pair cables.
It has high noise and interference immunity compared to twisted pair cables.
o Disadvantages:
It is more expensive and difficult to install and maintain compared to twisted
pair cables.
It is less flexible and scalable compared to twisted pair cables.
It is more prone to security breaches due to tapping compared to twisted pair
cables.
A fiber-optic cable is a type of transmission media that consists of one or more thin strands of
glass or plastic that carry light signals.
A fiber-optic cable has the following advantages over other types of transmission media:
o It has very high bandwidth and data rate compared to copper cables.
o It has very low attenuation and signal loss compared to copper cables.
o It has very high noise and interference immunity compared to copper cables.
o It has very high security and privacy compared to copper cables.
A fiber-optic cable can be classified into two types based on the mode of light propagation:
single-mode fiber and multi-mode fiber.
o Single-mode fiber, also known as SMF, is a fiber-optic cable that allows only one
mode or path of light to travel through it. It has a very thin core diameter of about 8 to
10 micrometers. It can support very high data rates and long distances up to several
kilometers. It is used for backbone networks and long-distance connections.
o Multi-mode fiber, also known as MMF, is a fiber-optic cable that allows multiple
modes or paths of light to travel through it. It has a larger core diameter of about 50 to
62.5 micrometers. It can support moderate data rates and short distances up to several
hundred meters. It is used for local area networks and short-distance connections.
27
Wireless transmission is a type of transmission media that uses electromagnetic waves or sig-
nals to transmit data through air or space without any physical medium or cable.
Wireless transmission can use different types of electromagnetic waves or signals depending
on the frequency, wavelength, range, and application. Some of the common types of wireless
transmission are:
o Radio waves: These are electromagnetic waves that have frequencies ranging from 3
kHz to 300 GHz and wavelengths ranging from 1 mm to 100 km. They can penetrate
through walls and obstacles and travel long distances. They are used for various ap-
plications, such as AM/FM radio, TV, cellular phones, Wi-Fi, etc.
o Microwaves: These are electromagnetic waves that have frequencies ranging from
300 MHz to 300 GHz and wavelengths ranging from 1 mm to 1 m. They can travel in
straight lines and require line-of-sight communication. They are used for various ap-
plications, such as satellite communication, radar, microwave ovens, etc.
o Infrared: These are electromagnetic waves that have frequencies ranging from 300
GHz to 400 THz and wavelengths ranging from 1 mm to 700 nm. They can be
blocked by solid objects and have a short range. They are used for various applica-
tions, such as remote controls, optical communication, thermal imaging, etc.
o Bluetooth: This is a wireless technology that uses radio waves in the 2.4 GHz fre-
quency band to transmit data over short distances up to 10 meters. It is used for vari-
ous applications, such as wireless headphones, keyboards, mice, printers, etc.
A LAN (Local Area Network) is a network that connects devices within a small geographic
area, such as a home, office, or campus.
28
A LAN typically uses wired or wireless technologies, such as Ethernet, Wi-Fi, or Bluetooth,
to transmit data over short distances at high speeds.
A LAN usually has a single owner or administrator who controls the network configuration
and security policies.
A LAN can support various applications, such as file sharing, printer sharing, email, web
browsing, video conferencing, gaming, etc.
A LAN consists of various components that perform different functions and roles on the net-
work. Some of the common components of a LAN are:
o Computers: These are the devices that generate, process, store, and consume data on
the network. They can be desktops, laptops, tablets, smartphones, etc. They can run
various operating systems, such as Windows, Linux, MacOS, Android, iOS, etc. They
can use various applications, such as browsers, email clients, word processors, games,
etc.
o Switches: These are the devices that connect multiple computers on the same network
using cables or ports. They can forward data frames between the computers based on
their MAC addresses. They can also divide the network into smaller segments or sub-
nets to reduce traffic and improve performance.
o Routers: These are the devices that connect multiple networks using different proto-
cols or technologies. They can route data packets between the networks based on their
IP addresses. They can also perform various functions, such as NAT (Network Ad-
dress Translation), DHCP (Dynamic Host Configuration Protocol), firewall, VPN
(Virtual Private Network), etc.
o Access Points: These are the devices that provide wireless connectivity to the com-
puters on the network using radio waves or signals. They can broadcast a wireless
29
network name or SSID (Service Set Identifier) and a password or key to authenticate
the computers. They can also support various wireless standards or protocols, such as
Wi-Fi 802.11a/b/g/n/ac/ax, Bluetooth 802.15.1/4/5/6, etc.
A LAN (Local Area Network) has many benefits and limitations for the users and the network
administrators. Some of the benefits and limitations are:
Benefits
A LAN provides fast and reliable data communication within a small geographic area, such as
a home, office, or campus.
A LAN allows the users to share resources, such as files, printers, scanners, cameras, etc.,
among the devices on the network.
A LAN enables the users to access various applications, such as email, web browsing, video
conferencing, gaming, etc., on the network or the internet.
A LAN reduces the cost and complexity of data communication by using common hardware
and software components and protocols.
A LAN increases the security and privacy of data communication by using encryption, au-
thentication, firewall, VPN, etc., on the network or the internet.
A LAN improves the performance and efficiency of data communication by using switches,
routers, access points, etc., to optimize the data flow and reduce traffic and congestion on the
network.
Limitations
A LAN has a limited geographic scope and cannot connect devices over long distances or
across different networks.
A LAN requires a lot of maintenance and management by the network administrator to ensure
the proper functioning and security of the network.
A LAN may face various challenges and issues, such as network failure, device malfunction,
data loss, data corruption, data theft, data breach, etc., on the network or the internet.
A LAN may have compatibility and interoperability problems with other networks or devices
that use different hardware and software components and protocols.
30
9. Wired LAN
9.1 Introduction to Ethernet LAN
A wired LAN (Local Area Network) is a network that connects devices using cables or wires
as the transmission medium.
A wired LAN typically uses Ethernet as the protocol to define how data is transmitted and re-
ceived over the network.
Ethernet is a family of protocols that defines the physical and data link layers of the TCP/IP
protocol suite.
Ethernet uses a technique called CSMA/CD (Carrier Sense Multiple Access with Collision
Detection) to share the medium and avoid collisions among multiple devices on the network.
Ethernet supports various data rates, such as 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps, etc., de-
pending on the type and quality of the cable, the length of the cable, the number of devices,
etc.
Ethernet uses a 48-bit address called MAC (Media Access Control) address to identify each
device on the network.
Ethernet uses a frame format to encapsulate data packets from the upper layer and add header
and trailer information for transmission and reception.
9.2 Variations of Ethernet: Fast Ethernet (100 Mbps), Gigabit Ethernet (1 Gbps), 10 Gigabit
Ethernet (10 Gbps)
Ethernet has evolved over time to meet the increasing demands of data communication. Some
of the variations of Ethernet are:
o Fast Ethernet: This is a variation of Ethernet that supports data rates up to 100 Mbps.
It uses the same frame format as the original Ethernet, but it requires higher quality
cables, such as Category 5 twisted pair or fiber-optic cables. It is also known as
100Base-T or 100Base-FX.
o Gigabit Ethernet: This is a variation of Ethernet that supports data rates up to 1 Gbps.
It uses a slightly modified frame format than the original Ethernet, but it is compati-
ble with Fast Ethernet. It requires higher quality cables, such as Category 5e twisted
pair or fiber-optic cables. It is also known as 1000Base-T or 1000Base-X.
o 10 Gigabit Ethernet: This is a variation of Ethernet that supports data rates up to 10
Gbps. It uses a different frame format than the original Ethernet, but it is compatible
31
with Gigabit Ethernet. It requires higher quality cables, such as Category 6 twisted
pair or fiber-optic cables. It is also known as 10GBase-T or 10GBase-X.
An Ethernet frame is a unit of data that is transmitted and received over an Ethernet network.
It consists of various fields that contain information such as source and destination addresses,
type of data, error detection, etc.
An Ethernet frame has two main parts: header and payload. The header contains information
that is used by the data link layer to deliver the frame to the correct destination. The payload
contains information that is used by the upper layers to process the data.
An Ethernet frame has different formats depending on the variation of Ethernet. The table be-
low shows some of the common formats of Ethernet frames:
Origi- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Frame
nal nation MAC Address (6 bytes), Source MAC Address (6 Check Sequence (4 bytes)
bytes), Length/Type (2 bytes)
Fast Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Frame
nation MAC Address (6 bytes), Source MAC Address (6 Check Sequence (4 bytes)
bytes), Length/Type (2 bytes)
Giga- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Pad (0
bit nation MAC Address (6 bytes), Source MAC Address (6 to 42 bytes), Frame Check Se-
bytes), Length/Type (2 bytes) quence (4 bytes)
10 Gi- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (64 to 1500 bytes), Pad (0
gabit nation MAC Address (6 bytes), Source MAC Address (6 to 36 bytes), Frame Check Se-
bytes), Length/Type (2 bytes) quence (4 bytes)
A wireless LAN (Local Area Network) is a network that connects devices using radio waves
or signals as the transmission medium.
A wireless LAN typically uses Wi-Fi as the protocol to define how data is transmitted and re-
ceived over the network.
Wi-Fi is a family of protocols that is based on the IEEE 802.11 standard, which specifies the
physical and data link layers of the TCP/IP protocol suite.
Wi-Fi uses a technique called CSMA/CA (Carrier Sense Multiple Access with Collision
Avoidance) to share the medium and avoid collisions among multiple devices on the network.
Wi-Fi supports various data rates, such as 11 Mbps, 54 Mbps, 600 Mbps, 1.3 Gbps, etc., de-
pending on the type and quality of the signal, the distance between the devices, the number of
devices, etc.
Wi-Fi uses a 48-bit address called MAC (Media Access Control) address to identify each de-
vice on the network.
Wi-Fi uses a frame format to encapsulate data packets from the upper layer and add header
and trailer information for transmission and reception.
Wi-Fi has evolved over time to meet the increasing demands of data communication. Some of
the variations of Wi-Fi are:
o 802.11a: This is a variation of Wi-Fi that operates in the 5 GHz frequency band and
supports data rates up to 54 Mbps. It has less interference and more channels than
802.11b, but it has shorter range and higher cost.
o 802.11b: This is a variation of Wi-Fi that operates in the 2.4 GHz frequency band and
supports data rates up to 11 Mbps. It has longer range and lower cost than 802.11a,
but it has more interference and fewer channels than 802.11a.
o 802.11g: This is a variation of Wi-Fi that operates in the 2.4 GHz frequency band and
supports data rates up to 54 Mbps. It is compatible with 802.11b, but it has higher
performance and security than 802.11b.
o 802.11n: This is a variation of Wi-Fi that operates in both the 2.4 GHz and 5 GHz fre-
quency bands and supports data rates up to 600 Mbps. It uses a technique called
MIMO (Multiple Input Multiple Output) to increase the number of antennas and
streams for transmission and reception. It also uses a technique called OFDM (Or-
thogonal Frequency Division Multiplexing) to split the frequency into subcarriers for
transmission and reception.
o 802.11ac: This is a variation of Wi-Fi that operates in the 5 GHz frequency band and
supports data rates up to 1.3 Gbps. It uses a technique called MU-MIMO (Multi-User
Multiple Input Multiple Output) to allow multiple devices to transmit and receive
data simultaneously. It also uses a technique called Wider Channel Bandwidth to in-
crease the channel width from 20 MHz to 80 MHz or 160 MHz for transmission and
reception.
o 802.11ax: This is a variation of Wi-Fi that operates in both the 2.4 GHz and 5 GHz
frequency bands and supports data rates up to10 Gbps. It uses a technique called
OFDMA (Orthogonal Frequency Division Multiple Access) to divide the subcarriers
into smaller units called resource units for transmission and reception. It also uses a
technique called BSS Coloring to reduce the interference from neighboring networks.
Wireless LANs have many advantages and challenges for the users and the network adminis-
trators. Some of the advantages and challenges are:
33
Advantages
Wireless LANs provide mobility and flexibility to the users, as they can access the network
from anywhere within the coverage area without any cables or wires.
Wireless LANs reduce the cost and complexity of data communication, as they do not require
any physical infrastructure or installation of cables or wires.
Wireless LANs enable the users to connect various devices, such as laptops, tablets,
smartphones, etc., to the network using different wireless standards or protocols, such as Wi-
Fi, Bluetooth, etc.
Wireless LANs support various applications, such as email, web browsing, video conferenc-
ing, gaming, etc., on the network or the internet.
Challenges
Wireless LANs have a limited range and capacity, as they depend on the signal strength and
quality, the distance between the devices, the number of devices, etc.
Wireless LANs require a lot of security and management by the network administrator to en-
sure the proper functioning and protection of the network.
Wireless LANs may face various challenges and issues, such as signal interference, noise, at-
tenuation, multipath fading, hidden node problem, etc., on the network or the internet.
Wireless LANs may have compatibility and interoperability problems with other networks or
devices that use different wireless standards or protocols.
A network bridge is a device that connects two or more LANs that use the same protocol or
technology, such as Ethernet.
A network bridge operates at the data link layer of the OSI model and forwards data frames
between the LANs based on their MAC addresses.
A network bridge has the following roles and functions:
o It extends the range and capacity of a LAN by connecting multiple segments or sub-
nets.
o It reduces the traffic and congestion on a LAN by filtering and forwarding only the
relevant frames to the destination segment or subnet.
o It improves the performance and reliability of a LAN by dividing it into smaller colli-
sion domains.
o It maintains a table of MAC addresses and ports to keep track of the devices on each
segment or subnet.
34
A router is a device that connects two or more networks that use different protocols or tech-
nologies, such as Ethernet, Wi-Fi, ATM, etc.
A router operates at the network layer of the OSI model and routes data packets between the
networks based on their IP addresses.
A router has the following roles and functions:
o It enables data communication across different networks or internetworks, such as
LANs, WANs, or the internet.
o It determines the best path and routes the data packets using various algorithms, such
as shortest path, least cost, etc.
o It performs various functions, such as NAT (Network Address Translation), DHCP
(Dynamic Host Configuration Protocol), firewall, VPN (Virtual Private Network),
etc.
o It maintains a table of IP addresses and interfaces to keep track of the networks and
devices connected to it.
35
A VLAN (Virtual LAN) is a logical grouping of devices on a physical LAN that share com-
mon characteristics or requirements, such as location, department, function, security, etc.
A VLAN operates at the data link layer of the OSI model and segments a physical LAN into
multiple logical LANs using software configuration rather than hardware wiring.
A VLAN has the following concepts, benefits, and implementation methods:
o Logical segmentation: This is the process of dividing a physical LAN into multiple
logical LANs based on various criteria, such as IP address, MAC address, port num-
ber, protocol type, etc. Each logical LAN is assigned a unique identifier called VLAN
ID or VID. The devices on each logical LAN can communicate with each other as if
they are on the same physical LAN, but they cannot communicate with the devices on
other logical LANs unless they use a router or a switch that supports inter-VLAN
routing.
o Benefits: This is the advantages of using VLANs over traditional LANs. Some of the
benefits are:
It increases the security and privacy of data communication by isolating dif-
ferent groups of devices from each other and preventing unauthorized access
or eavesdropping.
36
Introduction
Data communication is the process of transmitting and receiving data over a communication
channel, such as a wired or wireless network
Data communication can be affected by various types of errors, such as noise, interference,
distortion, or corruption, that can alter or damage the transmitted data
Error detection and correction are techniques that enable the sender and the receiver to detect
and correct errors in the transmitted data, ensuring reliable and accurate data communication
Error detection is the process of identifying errors in the received data by using some extra
information or redundancy added by the sender
Error correction is the process of recovering the original data from the received data by using
some additional information or redundancy added by the sender or by requesting retransmis-
sion
Block Coding
Block coding is a technique that divides the data into fixed-size blocks of bits and adds some
extra bits to each block to form a code word
The extra bits are called parity bits or check bits, and they are calculated based on some
rules or algorithms applied to the data bits
The parity bits provide redundancy that can be used to detect and correct errors in the code
words
The ratio of data bits to code bits is called the code rate, and it indicates the efficiency of the
block coding scheme
A higher code rate means less redundancy and more efficiency, but also less error detection
and correction capability
38
A lower code rate means more redundancy and less efficiency, but also more error detection
and correction capability
Hamming Codes
Hamming codes are a type of block coding scheme that can detect and correct single-
bit errors in code words
Hamming codes use even parity for calculating the parity bits, which means that the
number of 1s in a code word should be even
Hamming codes follow these steps to generate code words from data bits:
o Determine the number of parity bits (p) needed for a given number of data bits (d) by
solving the equation: 2^p >= d + p + 1
o Assign the parity bits to positions that are powers of 2 in the code word, such as 1, 2,
4, 8, etc.
o Fill in the remaining positions with the data bits
o Calculate the value of each parity bit by using an exclusive OR (XOR) operation on
all the bits that have a 1 in its position
o For example, to generate a hamming code for d = 4 data bits (1011), we need p = 3
parity bits, and we get the following code word: p1 p2 d1 p3 d2 d3 d4 = 0111001
o The value of each parity bit is calculated as follows:
p1 = d1 XOR d2 XOR d4 = 1 XOR 0 XOR 1 = 0
p2 = d1 XOR d3 XOR d4 = 1 XOR 1 XOR 1 = 1
p3 = d2 XOR d3 XOR d4 = 0 XOR 1 XOR 1 = 0
Hamming codes follow these steps to detect and correct errors in code words:
o Recalculate the value of each parity bit by using an exclusive OR (XOR) operation on
all the bits that have a 1 in its position
o Compare the recalculated parity bits with the received parity bits
o If all the parity bits match, then there is no error in the code word
o If some parity bits do not match, then there is an error in the code word
o To locate the error bit, add up all the positions that have a mismatched parity bit
o The sum gives the position of the error bit in the code word
o To correct the error bit, flip its value from 0 to 1 or from 1 to 0
o For example, to detect and correct an error in a hamming code with p = 3 parity bits
and d = 4 data bits (0111001), we get the following steps:
Recalculate the value of each parity bit as follows:
p1’ = d1 XOR d2 XOR d4 = 1 XOR 0 XOR 1 = 0
p2’ = d1 XOR d3 XOR d4 = 1 XOR 1 XOR 1 = 1
p3’ = d2 XOR d3 XOR d4 = 0 XOR 1 XOR 1 = 0
Compare the recalculated parity bits with the received parity bits as follows:
p1’ = p1 (no mismatch)
p2’ = p2 (no mismatch)
p3’ != p3 (mismatch)
39
Locate the error bit by adding up the positions that have a mismatched parity
bit as follows:
Error bit position = 4 (only p3 has a mismatch)
Correct the error bit by flipping its value as follows:
Error bit value = d2 = 0
Corrected bit value = d2’ = 1
Corrected code word = 0111101
o
o For example, to detect an error in a CRC code with d = 7 data bits and n = 3
checksum bits (1101011 001) using the generator polynomial x^3 + x + 1
(1011), we get the following steps:
Perform a binary division as follows:
____
1011 |1101011001
---- 1011
0110 ----
41
---- 01101
0110 ----
---- 011010
0100 ----
---- 010010
0100 ----
0011
----
???
The remainder is non-zero (0011), which means there is an error in the code
word
Python Implementation
Here is a Python program that implements hamming codes and CRC for error detection and
correction
Python
def hamming_code(data_bits):
def hamming_code(data_bits):
# Assign the parity bits to positions that are powers of 2 in the code word
else:
# Calculate the value of each parity bit by using an XOR operation on all the bits
that have a 1 in its position
pos = 2**i # Calculate the actual position of the parity bit in the code word
for j in range(1, len(code_word) + 1): # Loop through each bit in the code
word
if j & pos == pos: # Check if the bit has a 1 in the parity bit position using
bitwise AND operation
code_word[pos - 1] = val # Assign the calculated value to the parity bit posi-
tion
def hamming_error(code_word):
# Determine the number of parity bits by finding the largest power of 2 that is
less than or equal to the length of the code word
44
# Recalculate the value of each parity bit by using an XOR operation on all the
bits that have a 1 in its position
# Compare the recalculated parity bits with the received parity bits
# Use a list to store the positions that have a mismatched parity bit
pos = 2**i # Calculate the actual position of the parity bit in the code word
for j in range(1, len(code_word) + 1): # Loop through each bit in the code
word
if j & pos == pos: # Check if the bit has a 1 in the parity bit position using
bitwise AND operation
if val != code_word[pos - 1]: # Compare the recalculated value with the re-
ceived value
# Locate the error bit by adding up all the positions that have a mismatched par-
ity bit
code_word[error_bit - 1] ^= 1 # Flip the value of the error bit using XOR op-
eration
# Return the corrected code word and the error bit position as a tuple
# Define a function to calculate CRC codes for data bits using a generator polyno-
mial
# Determine the degree of the generator polynomial by finding the length of the
generator minus one
n = len(generator) - 1
# Perform a binary division of the modified data by the generator, using XOR
operations instead of subtraction
# Use a for loop to iterate through each bit of the modified data, starting from the
leftmost bit
else:
remainder.pop(0)
else:
# Append the remainder to the right of the original data bits to form the code
word
# Perform a binary division of the received code word by the same generator,
using XOR operations instead of subtraction
# Use a for loop to iterate through each bit of the code word, starting from the
leftmost bit
else:
remainder.pop(0)
else:
# Check if there is an error in the code word by comparing the remainder with
zero
error = any(remainder) # Use any function to check if any element in the remain-
der is non-zero
return error
48
```
Introduction
Flow control is a technique that regulates the amount and rate of data transmission between a
sender and a receiver
Flow control ensures that the sender does not overwhelm the receiver with more data than it
can process or store
Flow control also prevents data loss or corruption due to buffer overflow, congestion, or er-
rors in the communication channel
Flow control can be implemented at different layers of the network architecture, such as the
data link layer or the transport layer
Flow control can be classified into two types: feedback-based and rate-based
o Feedback-based flow control relies on the receiver to send feedback messages to the
sender, indicating its readiness or capacity to receive more data
o Rate-based flow control relies on the sender to estimate the available bandwidth or
congestion level of the channel, and adjust its transmission rate accordingly
Stop-and-Wait
Stop-and-wait is a simple and basic feedback-based flow control mechanism that works as
follows:
o The sender sends one frame of data to the receiver and waits for an acknowledgment
(ACK) from the receiver before sending the next frame
o The receiver sends an ACK to the sender after receiving and processing a frame suc-
cessfully
o If the sender does not receive an ACK within a specified time interval, called
the timeout, it assumes that the frame was lost or corrupted, and retransmits the same
frame
o If the receiver receives a duplicate frame, it discards it and sends an ACK for the pre-
vious frame
o To distinguish between original and duplicate frames, each frame is assigned a se-
quence number (0 or 1) that alternates between successive frames
49
Sliding Window
Sliding window is a more advanced and efficient feedback-based flow control mechanism that
works as follows:
o The sender maintains a window of frames that it can send without waiting for an
ACK from the receiver
o The window size is determined by the receiver’s buffer capacity, which is communi-
cated to the sender through feedback messages
o The sender slides its window forward as it receives ACKs from the receiver, allowing
it to send new frames
o The receiver also maintains a window of frames that it can receive and process
o The receiver slides its window forward as it sends ACKs to the sender, indicating its
readiness to receive new frames
o If a frame is lost or corrupted, the sender retransmits all the frames in its window after
a timeout, or after receiving a negative acknowledgment (NAK) from the receiver
o To distinguish between different frames, each frame is assigned a sequence number
that ranges from 0 to 2^n - 1, where n is the number of bits used for sequence num-
bers
50
Understand the difference between error detection and error control in data communication
Learn the basic concepts and principles of error control protocols, such as stop-and-wait
ARQ, go-back-N ARQ, and selective repeat ARQ
Compare the performance and trade-offs of different error control protocols
51
Introduction
Error control is a technique that ensures reliable and accurate data transmission between a
sender and a receiver
Error control involves both error detection and error correction, as well as retransmission and
acknowledgment of data frames
Error control protocols are algorithms that define the rules and procedures for error detection,
correction, retransmission, and acknowledgment
Error control protocols can be classified into two types: automatic repeat request
(ARQ) and forward error correction (FEC)
o ARQ protocols rely on the receiver to send feedback messages to the sender, indicat-
ing whether a frame was received correctly or not
o If a frame was received incorrectly, the sender retransmits the frame until it is re-
ceived correctly
o ARQ protocols can be further divided into three types: stop-and-wait ARQ, go-back-
N ARQ, and selective repeat ARQ
o FEC protocols rely on the sender to add extra information or redundancy to the data
frames, enabling the receiver to correct errors without requesting retransmission
o FEC protocols use techniques such as block coding, hamming codes, or CRC for er-
ror correction
Stop-and-Wait ARQ
Stop-and-wait ARQ is a simple and basic ARQ protocol that works as follows:
o The sender sends one frame of data to the receiver and waits for an acknowledgment
(ACK) from the receiver before sending the next frame
o The receiver sends an ACK to the sender after receiving and processing a frame suc-
cessfully
o
52
o If the sender does not receive an ACK within a specified time interval, called
the timeout, it assumes that the frame was lost or corrupted, and retransmits the same
frame
o If the receiver receives a duplicate frame, it discards it and sends an ACK for the pre-
vious frame
o To distinguish between original and duplicate frames, each frame is assigned a se-
quence number (0 or 1) that alternates between successive frames
Stop-and-wait ARQ has some advantages and limitations, such as:
o Advantages:
It is easy to implement and understand
It guarantees reliable delivery of data frames
It avoids buffer overflow at the receiver side
o Limitations:
It has low efficiency and utilization of the channel bandwidth, as the sender
has to wait for an ACK after each frame
It suffers from long delays due to propagation time, processing time, and
timeout intervals
It cannot handle multiple senders or receivers simultaneously
Go-Back-N ARQ
Go-back-N ARQ is a more advanced and efficient ARQ protocol that works as follows:
o The sender maintains a window of frames that it can send without waiting for an
ACK from the receiver
o The window size is determined by the maximum number of frames that can be sent
before receiving an ACK, called the window size
o The sender slides its window forward as it receives ACKs from the receiver, allowing
it to send new frames
o The receiver sends a cumulative ACK to the sender after receiving a frame success-
fully, indicating the next expected sequence number
o If a frame is lost or corrupted, the receiver discards all subsequent frames until it re-
ceives the missing frame
o The sender retransmits all the frames in its window after a timeout, or after receiving
a duplicate ACK from the receiver
o To distinguish between different frames, each frame is assigned a sequence number
that ranges from 0 to 2^n - 1, where n is the number of bits used for sequence num-
bers
53
Learning Objectives
Understand the concepts and principles of sliding window technique for data transmission
Learn how to determine the optimal window size and its impact on data transmission effi-
ciency and reliability
Compare and contrast the two window management strategies: selective repeat and go-back-
N
Introduction
Sliding window protocol is a technique that regulates the amount and rate of data transmis-
sion between a sender and a receiver
Sliding window protocol uses a window of frames that can be sent or received at a time, with-
out waiting for an acknowledgment (ACK) or a negative acknowledgment (NAK)
54
Sliding window protocol ensures reliable and efficient data transmission by using feedback
messages, sequence numbers, timers, and retransmission mechanisms
Sliding window protocol can be implemented at different layers of the network architecture,
such as the data link layer or the transport layer
Sliding window protocol can be classified into two types: sender-initiated and receiver-initi-
ated
o Sender-initiated sliding window protocol relies on the sender to determine the win-
dow size and slide the window forward as it receives ACKs from the receiver
o Receiver-initiated sliding window protocol relies on the receiver to determine the
window size and slide the window forward as it sends ACKs to the sender
Window size is a parameter that determines how many frames can be sent or received at a
time, without waiting for an acknowledgment or a negative acknowledgment
55
Window size affects the efficiency and reliability of data transmission, as well as the com-
plexity and overhead of sliding window protocol
Window size can be determined by various factors, such as:
o The capacity of the sender’s and receiver’s buffers
o The bandwidth of the communication channel
o The propagation delay of the communication channel
o The error rate of the communication channel
o The feedback mechanism used by the sender and receiver
Window size can be calculated by using various formulas, such as:
o Window size = 1 + 2 * (bandwidth * delay product), where bandwidth is the data rate
of the channel in bits per second, and delay is the round-trip time of a frame in sec-
onds
o Window size = 1 + 2 * (bandwidth * delay product) / (frame size), where frame size
is the number of bits in a frame
o Window size = min(sender buffer, receiver buffer), where sender buffer and receiver
buffer are the number of frames that can be stored at each end
Window size has some advantages and limitations, such as:
o Advantages:
A larger window size can improve efficiency and utilization of the channel
bandwidth, as more frames can be sent or received without waiting for feed-
back messages
A larger window size can reduce delays due to propagation time, processing
time, and timeout intervals, as fewer feedback messages are needed
A smaller window size can improve reliability and accuracy of data transmis-
sion, as fewer frames are affected by errors or losses
A smaller window size can avoid buffer overflow or congestion at either end,
as fewer frames are stored in memory
o Limitations:
A larger window size can increase complexity and overhead of sliding win-
dow protocol, as more sequence numbers, timers, and retransmission mecha-
nisms are needed
A larger window size can cause unnecessary retransmissions of frames that
were received correctly but discarded due to a missing frame
56
A smaller window size can decrease efficiency and utilization of the channel
bandwidth, as fewer frames can be sent or received without waiting for feed-
back messages
A smaller window size can increase delays due to propagation time, pro-
cessing time, and timeout intervals, as more feedback messages are needed
Piggybacking
Piggybacking reduces the overhead and improves the efficiency of data transmission
by avoiding sending separate acknowledgment frames.
o Suppose there are two stations, A and B, that want to exchange data frames.
o Station B receives the data frame and stores the acknowledgment in a buffer.
o Station A receives the data frame and extracts the acknowledgment from it.
o If station B does not have a data frame to send to station A, it waits for a timeout pe-
riod before sending a separate acknowledgment frame to station A.
The advantage of piggybacking is that it reduces the number of frames sent on the channel
and saves bandwidth.
Random Access
Random access protocols are a class of medium access control (MAC) protocols that allow
multiple stations to share a common channel without any coordination or reservation.
Random access protocols are suitable for bursty and unpredictable traffic, where stations
have variable and independent data rates.
Random access protocols are based on the principle of contention, where stations compete
for the channel and resolve any collisions that may occur.
Random access protocols can be classified into two types: unslotted and slotted.
o Unslotted protocols do not divide the channel into fixed time slots, and stations can
transmit at any time.
o Slotted protocols divide the channel into fixed time slots, and stations can transmit
only at the beginning of a slot.
Pure ALOHA and Slotted ALOHA are two examples of random access protocols that were de-
veloped for wireless networks.
o Stations transmit their frames whenever they have data to send, without checking
the channel status.
o If a station receives an acknowledgment from the receiver, it knows that the trans-
mission was successful.
o Stations synchronize their clocks and transmit their frames only at the beginning of a
slot, which is equal to the frame transmission time.
o If a station receives an acknowledgment from the receiver, it knows that the trans-
mission was successful.
o If a station does not receive an acknowledgment within a slot, it assumes that a colli-
sion occurred and retransmits the frame after a random delay.
The advantage of Slotted ALOHA over Pure ALOHA is that it reduces the collision probabil-
ity by avoiding overlapping transmissions.
The disadvantage of Slotted ALOHA over Pure ALOHA is that it requires clock synchroniza-
tion among stations.
The maximum efficiency of Pure ALOHA is 18%, which means that only 18% of the channel
capacity can be utilized by successful transmissions.
The maximum efficiency of Slotted ALOHA is 37%, which means that only 37% of the channel
capacity can be utilized by successful transmissions.
58
An example of Pure ALOHA and Slotted ALOHA is shown in the following diagram:
Carrier Sense Multiple Access (CSMA) is another example of random access protocol that
was developed for wired networks.
o Stations sense the channel before transmitting their frames, and transmit only if the
channel is idle.
o If a station detects a collision while transmitting, it aborts the transmission and re-
transmits the frame after a random delay.
CSMA can be further classified into three types: 1-persistent CSMA, non-persistent CSMA,
and p-persistent CSMA.
59
Stations sense the channel continuously, and transmit as soon as the channel
becomes idle.
Stations sense the channel at discrete intervals, and transmit if the channel
is idle.
If the channel is busy, stations wait for a random delay and sense the chan-
nel again.
Stations sense the channel at discrete intervals, and transmit with a proba-
bility p if the channel is idle.
If the channel is busy or the station decides not to transmit, stations wait for
the next slot and repeat the process.
The advantage of CSMA over ALOHA is that it reduces the collision probability by sensing
the channel before transmitting.
The disadvantage of CSMA over ALOHA is that it requires carrier sensing capability among
stations.
Summary
In this lecture, we have learned about two techniques for data transmission: piggybacking
and random access.
Random access protocols are a class of medium access control protocols that allow multiple
stations to share a common channel without any coordination or reservation.
We have discussed two examples of random access protocols for wireless networks: Pure
ALOHA and Slotted ALOHA, which are based on contention and retransmission.
We have also discussed another example of random access protocol for wired networks: Car-
rier Sense Multiple Access (CSMA), which is based on sensing and transmission.
Multiple access protocols are a class of medium access control (MAC) protocols that allow
multiple stations to share a common channel without interfering with each other.
Multiple access protocols are essential for efficient and fair utilization of the channel re-
sources and for reliable and secure communication among stations.
Multiple access protocols can be classified into three categories: channel partitioning, ran-
dom access, and demand access.
o Channel partitioning protocols divide the channel into smaller units, such as time
slots, frequency bands, or codes, and assign them to different stations.
o Random access protocols allow stations to transmit their frames whenever they have
data to send, without any reservation or coordination, and resolve any collisions that
may occur.
o Demand access protocols allow stations to request the channel before transmitting
their frames, and grant the channel to one or more stations based on some criteria.
CSMA/CD (Carrier Sense Multiple Access with Collision Detection) is a type of random ac-
cess protocol that is used in Ethernet networks.
61
CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) is another type of ran-
dom access protocol that is used in Wi-Fi networks.
CSMA/CA works as follows:
o Stations sense the channel before transmitting their frames, and transmit only if the
channel is idle for a certain period, called Distributed Interframe Space (DIFS).
o Stations perform a random delay, called backoff time, before transmitting their
frames, to reduce the collision probability.
o Stations send a short frame, called Request to Send (RTS), to the receiver, and wait
for a short frame, called Clear to Send (CTS), from the receiver.
o Stations transmit their frames only after receiving the CTS from the receiver, and wait
for an acknowledgment from the receiver.
o Stations defer their transmissions for a certain period, called Network Allocation
Vector (NAV), after hearing an RTS or CTS from other stations.
The advantage of CSMA/CA is that it avoids collisions by using RTS/CTS handshake and
NAV mechanism.
62
CSMA/CD and CSMA/CA are two different approaches to deal with the problem of collisions
in random access protocols.
CSMA/CD relies on detecting collisions and retransmitting frames after a random delay. It is
suitable for wired networks where collision detection is feasible and propagation delay is
small. It is used in Ethernet networks that use coaxial cables or twisted pair cables as the
physical medium.
CSMA/CA relies on avoiding collisions by using RTS/CTS handshake and NAV mechanism.
It is suitable for wireless networks where collision detection is difficult and propagation delay
is large. It is used in Wi-Fi networks that use radio waves as the physical medium.
63
Summary
In this lecture, we have learned about multiple access protocols, which are a class of medium
access control protocols that allow multiple stations to share a common channel without inter-
fering with each other.
We have discussed two examples of random access protocols: CSMA/CD and CSMA/CA,
which are used in Ethernet and Wi-Fi networks respectively.
We have compared the advantages and disadvantages of CSMA/CD and CSMA/CA, and ex-
plained their use cases.
64
Switching is a process that connects multiple devices in a network and transfers data from
one device to another.
Switching can be classified into two types: circuit switching and packet switching.
o Circuit switching is a type of switching that establishes a dedicated physical
path between the source and the destination devices for the duration of the communi-
cation.
o Packet switching is a type of switching that divides the data into smaller
units called packets and sends them independently over the network.
Circuit switching and packet switching can be further classified into two types: virtual cir-
cuits and datagram networks.
o Virtual circuits are a type of circuit switching or packet switching that maintains a
logical connection between the source and the destination devices for the duration of
the communication.
o Datagram networks are a type of packet switching that does not maintain any con-
nection between the source and the destination devices and treats each packet sepa-
rately.
The advantages and disadvantages of circuit switching and packet switching are as follows:
o Circuit switching has the advantages of guaranteed quality of service, no conges-
tion, and simple routing, but the disadvantages of low utilization, high setup time,
and lack of flexibility.
o Packet switching has the advantages of high utilization, low setup time, and flexi-
bility, but the disadvantages of variable quality of service, congestion, and complex
routing.
An example of circuit switching is shown in the following diagram:
Logical Addressing
Logical addressing is a process that assigns a unique identifier to each device in a network
and enables communication among devices across different networks.
Logical addressing can be classified into two types: IPv4 addressing and IPv6 addressing.
o IPv4 addressing is a type of logical addressing that uses a 32-bit binary number to
identify each device in an IPv4 network. An IPv4 address consists of two parts: net-
work ID and host ID. The network ID identifies the network to which the device be-
longs, and the host ID identifies the device within the network. An IPv4 address is
usually written in dotted decimal notation, where each byte is converted to its decimal
equivalent and separated by dots. For example, 192.168.1.100 is an IPv4 address.
o IPv6 addressing is a type of logical addressing that uses a 128-bit binary number to
identify each device in an IPv6 network. An IPv6 address consists of eight groups of
four hexadecimal digits, separated by colons. Each group represents 16 bits of the ad-
dress. An IPv6 address can be abbreviated by omitting leading zeros within each
group and replacing consecutive groups of zeros with a double colon. For example,
2001:0db8:0000:0000:0000:ff00:0042:8329 is an IPv6 address, which can be abbrevi-
ated as 2001:db8::ff00:42:8329.
IPv6 addressing has some features and advantages over IPv4 addressing, such as:
o Larger address space: IPv6 can support 2^128 (approximately 3.4 x 10^38) addresses,
compared to IPv4’s 2^32 (approximately 4.3 x 10^9) addresses, which allows more
devices to connect to the Internet and avoids address exhaustion.
o Hierarchical structure: IPv6 has a hierarchical structure that allows efficient routing
and aggregation of addresses. An IPv6 address consists of three parts: global routing
prefix, subnet ID, and interface ID. The global routing prefix identifies the network
provider, the subnet ID identifies the subnet within the provider’s network, and the
interface ID identifies the device within the subnet.
o Stateless address autoconfiguration: IPv6 allows devices to automatically configure
their own addresses without relying on a server, such as DHCP. A device can generate
its own interface ID based on its MAC address or a random number, and append it to
the subnet prefix obtained from a router advertisement message.
o Improved security: IPv6 supports IPsec, which is a protocol suite that provides au-
thentication, encryption, and integrity protection for IP packets. IPsec can be applied
66
to all IPv6 communications, unlike IPv4, where it is optional and requires additional
configuration.
Subnetting and supernetting are two techniques that allow efficient allocation and manage-
ment of IP addresses in a network.
o Subnetting is a technique that divides a large network into smaller subnetworks, or
subnets, by extending the network ID portion of an IP address. Subnetting allows bet-
ter utilization of the address space, reduces network congestion, and enhances secu-
rity and administration.
o Supernetting is a technique that combines multiple contiguous networks into a larger
network, or supernetwork, by reducing the network ID portion of an IP address. Su-
pernetting allows better aggregation of the address space, reduces routing table size,
and improves routing performance.
An example of subnetting is shown in the following diagram:
Address Mapping
Address mapping is a process that converts one type of address to another in a network.
Address mapping can be classified into four types: Address Resolution Protocol (ARP), Re-
verse Address Resolution Protocol (RARP), Bootstrap Protocol (BOOTP), and Dynamic
Host Configuration Protocol (DHCP).
o Address Resolution Protocol (ARP) is a protocol that maps an IPv4 address to a
MAC address in a local area network (LAN). ARP allows devices to communicate
67
with each other within the same LAN without knowing their MAC addresses before-
hand. ARP works as follows:
A device that wants to communicate with another device sends an ARP re-
quest message, which contains the sender’s IPv4 and MAC addresses and the
target’s IPv4 address, to the broadcast MAC address (FF:FF:FF:FF:FF:FF).
All devices in the LAN receive the ARP request message, but only the device
that has the target IPv4 address responds with an ARP reply message, which
contains the sender’s and target’s IPv4 and MAC addresses.
The device that sent the ARP request message receives the ARP reply mes-
sage and updates its ARP cache, which is a table that stores the mappings of
IPv4 and MAC addresses. The device then uses the target MAC address to
send data frames to the target device.
o Reverse Address Resolution Protocol (RARP) is a protocol that maps a MAC address
to an IPv4 address in a LAN. RARP allows devices that do not have an IPv4 address,
such as diskless workstations, to obtain one from a RARP server. RARP works as fol-
lows:
A device that wants to obtain an IPv4 address sends a RARP request mes-
sage, which contains the sender’s MAC address and a placeholder for the
sender’s IPv4 address, to the broadcast MAC address (FF:FF:FF:FF:FF:FF).
68
A RARP server in the LAN receives the RARP request message and searches
its RARP table, which is a table that stores the mappings of MAC and IPv4
addresses. If it finds a match, it responds with a RARP reply message, which
contains the sender’s MAC and IPv4 addresses.
The device that sent the RARP request message receives the RARP reply
message and configures its IPv4 address accordingly.
o Bootstrap Protocol (BOOTP) is a protocol that maps a MAC address to an IPv4 ad-
dress and other configuration parameters in a LAN or a wide area network (WAN).
BOOTP allows devices that do not have an IPv4 address or other configuration pa-
rameters, such as diskless workstations or routers, to obtain them from a BOOTP
server. BOOTP works as follows:
A device that wants to obtain an IPv4 address and other configuration param-
eters sends a BOOTP request message, which contains the sender’s MAC ad-
dress and other information, such as requested parameters and relay agent in-
formation, to the broadcast IP address (255.255.255.255).
A BOOTP server in the LAN or WAN receives the BOOTP request message
and searches its BOOTP table, which is a table that stores the mappings of
MAC addresses and configuration parameters. If it finds a match, it responds
with a BOOTP reply message, which contains the sender’s MAC address and
configuration parameters.
The device that sent the BOOTP request message receives the BOOTP reply
message and configures its IPv4 address and other parameters accordingly.
o Dynamic Host Configuration Protocol (DHCP) is a protocol that dynamically assigns
an IPv4 address and other configuration parameters to devices in a LAN or WAN.
DHCP allows devices to obtain an IPv4 address and other configuration parameters
without manual intervention or preconfiguration. DHCP works as follows:
A device that wants to obtain an IPv4 address and other configuration param-
eters sends a DHCP discover message, which contains the sender’s MAC ad-
dress and other information, such as requested parameters and relay agent in-
formation, to the broadcast IP address (255.255.255.255).
One or more DHCP servers in the LAN or WAN receive the DHCP discover
message and respond with a DHCP offer message, which contains an availa-
ble IPv4 address and other configuration parameters for the sender.
69
The device that sent the DHCP discover message receives one or more DHCP offer
messages and selects one of them based on some criteria. It then sends a DHCP re-
quest message, which contains the selected IPv4 address and other configuration pa-
rameters, to the broadcast IP address (255.255.255.255).
The DHCP server that offered the selected IPv4 address receives the DHCP request
message and verifies that the IPv4 address is still available. It then sends a DHCP
acknowledge message, which confirms the assignment of the IPv4 address and other
configuration parameters to the sender.
The device that sent the DHCP request message receives the DHCP acknowledge
message and configures its IPv4 address and other parameters accordingly. It also up-
dates its lease time, which is the duration for which the IPv4 address is valid.
An example of DHCP is shown in the following diagram:
Delivery
Delivery is a process that transfers packets from the source device to the destination device
in a network.
Delivery can be classified into two types: direct delivery and indirect delivery.
o Direct delivery is a type of delivery that occurs when the source and the destination
devices are in the same network. In direct delivery, the source device sends the
packet to the destination device using its MAC address as the destination address.
o Indirect delivery is a type of delivery that occurs when the source and the destination
devices are in different networks. In indirect delivery, the source device sends the
packet to a router, which forwards the packet to another router or the destination de-
vice using its MAC address as the destination address.
An example of direct delivery is shown in the following diagram:
Forwarding
Forwarding is a process that determines the next hop for a packet in a network and sends
the packet to that hop.
Forwarding is performed by routers, which are devices that connect multiple networks and
forward packets between them.
Forwarding relies on a data structure called a forwarding table, which stores the mappings of
destination network IDs and next hop addresses.
A forwarding table can be created and updated by using static or dynamic methods.
o Static methods involve manually configuring the forwarding table entries by an ad-
ministrator. Static methods are simple, secure, and consistent, but they are also inflex-
ible, error-prone, and inefficient.
o Dynamic methods involve automatically updating the forwarding table entries by us-
ing routing protocols. Dynamic methods are flexible, adaptive, and efficient, but they
are also complex, insecure, and inconsistent.
An example of a forwarding table is shown in the following table:
192.168.1.0/24 192.168.1.1
192.168.2.0/24 192.168.2.1
192.168.3.0/24 192.168.3.1
0.0.0.0/0 192.168.4.1
o The router extracts the destination IP address from the packet header and matches it
with the destination network ID in the forwarding table.
o If there is an exact match, the router sends the packet to the next hop address corre-
sponding to that destination network ID.
o If there is no exact match, but there is a default entry (0.0.0.0/0), the router sends the
packet to the next hop address corresponding to that default entry.
o If there is no match at all, the router drops the packet and sends an error message to
the source device.
Unicast routing protocols are protocols that exchange routing information among routers
and build forwarding tables for unicast packets, which are packets that have a single desti-
nation device.
Unicast routing protocols can be classified into two types: interior gateway protocols
(IGPs) and exterior gateway protocols (EGPs).
o Interior gateway protocols (IGPs) are protocols that exchange routing information
within an autonomous system (AS), which is a group of networks under a single ad-
ministrative authority. IGPs can be further classified into two types: distance vector
protocols and link state protocols.
Distance vector protocols are protocols that exchange routing information
based on the distance (or cost) and direction (or vector) to each destination
network. Distance vector protocols use a distributed algorithm called Bell-
man-Ford algorithm to calculate the shortest paths to each destination net-
work. Distance vector protocols are simple, easy to implement, and scalable,
but they are also slow to converge, prone to loops, and inefficient in band-
width usage.
Link state protocols are protocols that exchange routing information based on
the state (or status) of each link (or connection) in the network. Link state
protocols use a centralized algorithm called Dijkstra’s algorithm to calcu-
late the shortest paths to each destination network based on a complete map
of the network topology. Link state protocols are fast to converge, loop-free,
and efficient in bandwidth usage, but they are also complex, difficult to im-
plement, and resource-intensive.
o Exterior gateway protocols (EGPs) are protocols that exchange routing information
between autonomous systems (ASes). EGPs can be further classified into two
types: path vector protocols and policy-based routing protocols.
Path vector protocols are protocols that exchange routing information based
on the path (or sequence) of ASes to each destination network. Path vector
protocols use an extension of the distance vector protocol called Border
Gateway Protocol (BGP) to calculate the best paths to each destination net-
work based on various attributes, such as AS path length, origin, local prefer-
ence, and MED. Path vector protocols are robust, flexible, and scalable, but
they are also complex, slow to converge, and prone to instability.
Policy-based routing protocols are protocols that exchange routing infor-
mation based on the policies (or rules) of each AS. Policy-based routing pro-
tocols use a mechanism called route filtering to select or reject routes based
on various criteria, such as source, destination, protocol, port, or traffic type.
Policy-based routing protocols are secure, customizable, and efficient, but
they are also subjective, inconsistent, and unpredictable.
An example of a distance vector protocol is Routing Information Protocol (RIP), which is
an IGP that uses hop count as the distance metric and exchanges routing information every 30
72
seconds. RIP has a maximum hop count of 15, which limits its scalability. RIP also uses vari-
ous techniques to prevent loops and speed up convergence, such as split horizon, poison re-
verse, triggered updates, and hold-down timers.
An example of a link state protocol is Open Shortest Path First (OSPF), which is an IGP
that uses cost as the link state metric and exchanges routing information based on events.
OSPF has no maximum hop count, which enhances its scalability. OSPF also uses various
features to improve performance and reliability, such as hierarchical structure, designated
routers, equal-cost multipath, authentication, and multicast.
An example of a path vector protocol is Border Gateway Protocol (BGP), which is an EGP
that uses AS path as the path vector attribute and exchanges routing information based on
events. BGP has no maximum AS path length, which enhances its scalability. BGP also uses
various features to improve performance and stability, such as route aggregation, route reflec-
tors, confederations, communities, and route dampening.
An example of a policy-based routing protocol is Cisco IOS Policy-Based Routing (PBR),
which is a mechanism that allows network administrators to define policies for routing pack-
ets based on various criteria. PBR can be configured using access lists, route maps, and set
commands. PBR can be used to implement various functions, such as load balancing, traffic
engineering, quality of service, and security.
73
Process to process communication is a process that enables data exchange between two or
more processes running on the same or different devices in a network.
Process to process communication can be achieved by using ports and sockets.
o Ports are logical identifiers that distinguish different processes or applications on a
device. Ports are associated with the transport layer protocols, such as TCP or UDP,
and are represented by 16-bit numbers. For example, port 80 is used for HTTP, port
25 is used for SMTP, and port 53 is used for DNS.
o Sockets are endpoints of communication between processes or applications on differ-
ent devices. Sockets are created by the operating system and are identified by a com-
bination of IP address and port number. For example, a socket can be represented as
192.168.1.100:80, which means the process or application running on the device with
IP address 192.168.1.100 and using port 80.
o
Socket APIs are application programming interfaces that provide functions and data struc-
tures for creating, managing, and using sockets in various programming languages. Socket
APIs allow programmers to implement process to process communication without worrying
about the low-level details of the network protocols. Some examples of socket APIs are:
o Berkeley sockets: A socket API that was developed at the University of California,
Berkeley, and is widely used in Unix-like operating systems, such as Linux, macOS,
and FreeBSD. Berkeley sockets support both TCP and UDP protocols, as well as
other protocols, such as ICMP and IGMP. Berkeley sockets use functions such as
socket(), bind(), listen(), accept(), connect(), send(), receive(), close(), and so on.
74
o Winsock: A socket API that was developed by Microsoft and is used in Windows op-
erating systems. Winsock is based on Berkeley sockets, but has some differences and
extensions, such as support for overlapped I/O, asynchronous notification, and lay-
ered service providers. Winsock uses functions such as WSAStartup(), WSAC-
leanup(), WSASocket(), WSAConnect(), WSASend(), WSARecv(),
WSACloseSocket(), and so on.
o Java sockets: A socket API that was developed by Sun Microsystems and is used in
Java programming language. Java sockets are part of the java.net package and pro-
vide an object-oriented approach to socket programming. Java sockets support both
TCP and UDP protocols, as well as multicast and secure sockets. Java sockets use
classes such as Socket, ServerSocket, DatagramSocket, MulticastSocket, SSLSocket,
and so on.
User Datagram Protocol (UDP) is a transport layer protocol that provides unrelia-
ble and connectionless data transmission between processes or applications in a network.
Features and characteristics of UDP are as follows:
o UDP is unreliable, which means that it does not guarantee the delivery, order, or in-
tegrity of the data packets. UDP does not perform any error detection, correction, or
retransmission of the data packets. UDP leaves these tasks to the application layer or
the user.
o UDP is connectionless, which means that it does not establish or maintain any logical
connection between the source and the destination processes or applications. UDP
does not perform any handshake, synchronization, or termination of the communica-
tion. UDP treats each packet independently and statelessly.
o UDP is simple, which means that it has a minimal overhead and complexity. UDP has
a fixed header size of 8 bytes, which contains only four fields: source port, destina-
tion port, length, and checksum. UDP does not have any options or flags in its header.
o UDP is fast, which means that it has a low latency and high throughput. UDP does
not have any congestion control or flow control mechanisms that may slow down or
block the data transmission. UDP can send data packets as fast as the network allows.
Advantages and disadvantages of UDP are as follows:
o UDP has the advantages of simplicity, efficiency, and flexibility, but the disad-
vantages of unreliability, lack of security, and lack of quality of service.
o Simplicity: UDP is easy to implement and understand, as it has a minimal overhead
and complexity. UDP does not require any connection establishment or maintenance,
which reduces the processing time and resource consumption.
o Efficiency: UDP is fast and scalable, as it has a low latency and high throughput.
UDP does not have any congestion control or flow control mechanisms that may slow
down or block the data transmission. UDP can handle bursty and unpredictable traffic
better than TCP.
o Flexibility: UDP is adaptable and customizable, as it leaves the reliability and quality
of service tasks to the application layer or the user. UDP can support various types of
applications that have different requirements and preferences, such as real-time, mul-
timedia, or interactive applications.
o Unreliability: UDP does not guarantee the delivery, order, or integrity of the data
packets. UDP does not perform any error detection, correction, or retransmission of
the data packets. UDP may cause data loss, duplication, corruption, or reordering,
which may affect the performance and functionality of the applications.
o Lack of security: UDP does not provide any security features, such as authentication,
encryption, or integrity protection for the data packets. UDP is vulnerable to various
attacks, such as spoofing, modification, or replaying of the data packets. UDP may
compromise the confidentiality, integrity, or availability of the communication.
75
o Lack of quality of service: UDP does not provide any quality of service features, such
as bandwidth allocation, delay control, jitter control, or packet loss control for the
data packets. UDP does not differentiate between different types of data or applica-
tions that may have different levels of priority or importance. UDP may cause poor
quality of service for some applications that are sensitive to delay, jitter, or packet
loss.
Use cases of UDP are as follows:
o UDP is suitable for applications that require speed, simplicity, and flexibility over
reliability, security, and quality of service. Some examples of such applications are:
Real-time applications: These are applications that require timely and contin-
uous delivery of data, such as voice over IP (VoIP), video conferencing,
online gaming, or live streaming. These applications can tolerate some data
loss, but not delay or jitter.
Multimedia applications: These are applications that require efficient and
scalable delivery of data, such as audio and video streaming, file sharing, or
web browsing. These applications can adapt to the network conditions and
use error correction or compression techniques to improve the quality of ser-
vice.
Interactive applications: These are applications that require responsive and
user-friendly delivery of data, such as online chat, instant messaging, or re-
mote desktop. These applications can handle some data loss or duplication,
but not reordering or corruption.
Transmission Control Protocol (TCP) is a transport layer protocol that provides relia-
ble and connection-oriented data transmission between processes or applications in a net-
work.
Features and characteristics of TCP are as follows:
o TCP is reliable, which means that it guarantees the delivery, order, and integrity of
the data packets. TCP performs various mechanisms to ensure the reliability of the
data transmission, such as error detection, correction, retransmission, acknowledg-
ment, sequencing, and checksum.
o TCP is connection-oriented, which means that it establishes and maintains a logical
connection between the source and the destination processes or applications for the
duration of the communication. TCP performs various mechanisms to ensure the con-
nection-oriented nature of the data transmission, such as handshake, synchronization,
termination, state management, and window management.
o TCP is complex, which means that it has a high overhead and complexity. TCP has a
variable header size of 20 to 60 bytes, which contains many fields and options. TCP
also has many flags and states in its header and operation.
o TCP is slow, which means that it has a high latency and low throughput. TCP has var-
ious congestion control and flow control mechanisms that may slow down or block
the data transmission. TCP also has a slow start and a three-way handshake that may
delay the data transmission.
o
TCP header format and flags are as follows:
o TCP header format is shown in the following diagram:
76
It indicates that the packet contains an acknowledgment number that confirms the receipt of
the previous data segments from the sender. The acknowledgment number field specifies the
sequence number of the next expected byte from the sender.
PSH: Push flag. It indicates that the packet contains data that must be delivered to the applica-
tion layer as soon as possible by the receiver. The push function is used to avoid buffering de-
lays at the receiver side.
RST: Reset flag. It indicates that the packet contains a request or a response to reset the con-
nection due to an error or a refusal. The reset function is used to abort the connection and re-
lease the resources.
SYN: Synchronize flag. It indicates that the packet contains a request or a response to estab-
lish a connection between the sender and the receiver. The synchronize function is used to ini-
tiate and negotiate the connection parameters, such as sequence numbers and window sizes.
77
FIN: Finish flag. It indicates that the packet contains a request or a response to terminate the
connection between the sender and the receiver. The finish function is used to gracefully close
the connection and release the resources.
Window size: A 16-bit field that specifies the size of the receive window in bytes. The receive
window is the amount of data that the receiver can accept at a time. The window size is used
to implement flow control, which is a mechanism that prevents the sender from overwhelm-
ing the receiver with too much data.
Checksum: A 16-bit field that contains a value that is calculated based on the contents of the
TCP header and data segment. The checksum is used to detect any errors or corruption in the
packet during transmission.
Urgent pointer: A 16-bit field that specifies the offset of the last byte of urgent data in the
packet. The urgent pointer is used in conjunction with the URG flag to indicate and locate ur-
gent data in the packet.
Options: A variable-length field that contains optional parameters for TCP operation, such as
maximum segment size, window scaling, selective acknowledgment, timestamp, and so on.
The options field is used to enhance the performance and functionality of TCP.
Padding: A variable-length field that contains zeros to make the TCP header a multiple of 32
bits. The padding field is used to align the TCP header with the data segment.
Connection termination is a process that closes a connection between two processes or appli-
cations using TCP in a network.
Connection termination works as follows:
o The process or application that wants to terminate a connection sends a FIN segment
to the other process or application, which indicates that it has no more data to send
and requests to close the connection.
The process or application that receives the FIN segment responds with an ACK seg-
ment, which acknowledges the receipt of the FIN segment and confirms the closure of
the connection from its side.
o The process or application that receives the ACK segment enters a TIME-
WAIT state, which lasts for twice the maximum segment lifetime (MSL), to
ensure that the ACK segment has reached the other process or application and
to handle any delayed or duplicated segments.
o The process or application that sent the FIN segment enters a FIN-WAIT-2
state, which lasts until it receives a FIN segment from the other process or ap-
plication, indicating that it also wants to close the connection from its side.
o The process or application that receives the FIN segment responds with an
ACK segment, which acknowledges the receipt of the FIN segment and con-
firms the closure of the connection from its side.
o The process or application that sent the FIN segment enters a CLOSED state,
which means that it has successfully terminated the connection and released
the resources.
o The process or application that receives the ACK segment enters a CLOSED
state, which means that it has successfully terminated the connection and re-
leased the resources.
79
Stream Control Transmission Protocol (SCTP) is a transport layer protocol that provides re-
liable and connection-oriented data transmission between processes or applications in a net-
work, similar to TCP, but with some additional features and advantages.
Features and advantages of SCTP over TCP are as follows:
o SCTP supports multihoming, which means that it allows a process or application to
have multiple IP addresses and establish multiple paths to the other process or appli-
cation. Multihoming enhances the availability and reliability of the communication,
as it can switch to an alternate path in case of a failure or congestion in the primary
path.
o SCTP supports multistreaming, which means that it allows a process or application
to send multiple streams of data within a single connection. Multistreaming improves
the performance and efficiency of the communication, as it can avoid head-of-line
blocking, which is a problem that occurs when a lost or delayed packet in one stream
blocks the delivery of packets in other streams.
o SCTP supports partial reliability, which means that it allows a process or application
to specify a lifetime for each data chunk. Partial reliability enables the communica-
tion to discard outdated or irrelevant data chunks, which can save bandwidth and re-
duce latency.
o SCTP supports message-oriented delivery, which means that it preserves the bound-
aries and order of the data chunks sent by the process or application. Message-ori-
ented delivery simplifies the data processing and reduces the overhead at the applica-
tion layer.
An example of SCTP is shown in the following diagram:
80
Congestion Control
Explicit Congestion Notification (ECN) is a mechanism that signals the occurrence of con-
gestion in the network to the sender and the receiver of the data packets, without dropping the
packets.
ECN works as follows:
o ECN requires the support of both the transport layer protocols, such as TCP or SCTP,
and the network layer protocols, such as IPv4 or IPv6, to implement its functionality.
o ECN uses two bits in the IP header, called the ECN field, to indicate the congestion
status of the packet. The ECN field can have four values: 00 (Not-ECT), 01 (ECT(1)),
10 (ECT(0)), and 11 (CE).
Not-ECT means that the packet is not ECN-capable and should be dropped if
congestion occurs.
ECT(1) and ECT(0) mean that the packet is ECN-capable and can be marked
if congestion occurs. The difference between ECT(1) and ECT(0) is used for
experimental purposes.
CE means that the packet has been marked by a router as experiencing con-
gestion.
o ECN also uses two bits in the TCP header, called the ECE and CWR flags, to com-
municate the congestion status between the sender and the receiver. The ECE flag is
used by the receiver to notify the sender that it has received a packet with CE mark.
The CWR flag is used by the sender to notify the receiver that it has reduced its send-
ing rate in response to the ECE flag.
o The sender and the receiver negotiate the use of ECN during the connection establish-
ment phase, by setting the ECN field to ECT(0) or ECT(1) in the SYN and
SYN+ACK segments. If both sides agree to use ECN, they set their ECN field to
ECT(0) or ECT(1) for all subsequent data segments.
o When a router detects congestion in its queue, it randomly selects one or more ECN-
capable packets and sets their ECN field to CE, instead of dropping them. The router
forwards the marked packets to their destination.
o When the receiver receives a packet with CE mark, it sets its ECE flag to 1 in the next
ACK segment that it sends to the sender. The receiver also echoes back the sequence
number of the marked packet in the ACK segment.
82
o When the sender receives an ACK segment with ECE flag set to 1, it infers that there
is congestion in the network and reduces its sending rate accordingly, by using algo-
rithms such as AIMD or Slow Start. The sender also sets its CWR flag to 1 in the next
data segment that it sends to the receiver, to acknowledge that it has received the con-
gestion notification.
o When the receiver receives a data segment with CWR flag set to 1, it clears its ECE
flag to 0 in the next ACK segment that it sends to the sender. This completes one
round of ECN feedback loop.
The advantages of ECN over traditional congestion control mechanisms are as follows:
o ECN reduces packet loss and retransmission, which improves throughput and effi-
ciency.
o ECN reduces delay and jitter, which improves quality of service and user experience.
o ECN reduces network oscillation and instability, which improves fairness and robust-
ness.
An example of ECN is shown in the following diagram:
Quality of Service (QoS) is a process that ensures different levels of service for different
types of data or applications in a network, according to their requirements and preferences.
QoS can be achieved by using various techniques, such as:
o Classification: This is a technique that identifies and categorizes different types of
data or applications based on various criteria, such as source, destination, protocol,
port, or traffic type.
Classification allows the network to apply different policies and mechanisms for different
classes of data or applications, such as prioritization, scheduling, or shaping.
o Prioritization: This is a technique that assigns different levels of priority or im-
portance to different classes of data or applications, based on their requirements and
preferences. Prioritization allows the network to allocate more resources and provide
better service to higher-priority classes, while reducing or limiting the service to
lower-priority classes.
o Scheduling: This is a technique that determines the order and timing of transmitting
different classes of data or applications, based on their priority and characteristics.
83
Scheduling allows the network to optimize the utilization and performance of the net-
work resources and avoid congestion and starvation.
o Shaping: This is a technique that regulates the rate and volume of transmitting differ-
ent classes of data or applications, based on their characteristics and network condi-
tions. Shaping allows the network to smooth out the traffic fluctuations and match the
traffic profile with the network capacity and policies.
QoS can be measured by using various parameters, such as:
o Bandwidth: This is a parameter that measures the amount of data that can be trans-
mitted or received per unit time in a network. Bandwidth is usually expressed in bits
per second (bps) or multiples thereof, such as kilobits per second (Kbps), megabits
per second (Mbps), or gigabits per second (Gbps). Bandwidth affects the throughput
and efficiency of the communication.
o Delay: This is a parameter that measures the time it takes for a data packet to travel
from the source to the destination in a network. Delay is usually expressed in milli-
seconds (ms) or microseconds (µs). Delay affects the latency and responsiveness of
the communication.
o Jitter: This is a parameter that measures the variation or deviation of delay for differ-
ent data packets in a network. Jitter is usually expressed in milliseconds (ms) or mi-
croseconds (µs). Jitter affects the quality and smoothness of the communication, espe-
cially for real-time applications such as voice or video.
o Packet loss: This is a parameter that measures the percentage or ratio of data packets
that are lost or dropped during transmission in a network. Packet loss is usually ex-
pressed as a percentage (%) or a fraction. Packet loss affects the reliability and integ-
rity of the communication.
QoS improving techniques are techniques that aim to enhance the quality of service for differ-
ent types of data or applications in a network, by using various mechanisms such as traffic
shaping, traffic policing, or leaky bucket.
Traffic shaping is a technique that regulates the rate and volume of transmitting data packets
in a network, by using a mechanism called token bucket.
o Token bucket is a mechanism that uses a virtual bucket that holds tokens, which rep-
resent the permission to send data packets. The bucket has a certain capacity and a
certain rate of token generation. The token bucket works as follows:
When a source wants to send a data packet, it checks if there are enough to-
kens in the bucket to cover the size of the packet. If there are enough tokens,
it removes them from the bucket and sends the packet. If there are not enough
tokens, it either waits until there are enough tokens or drops the packet, de-
pending on the policy.
The bucket generates tokens at a constant rate, up to its capacity. If the bucket
is full, any new tokens are discarded. The rate and capacity of the bucket de-
termine the maximum and average sending rate of the source.
Traffic shaping is a technique that regulates the rate and volume of transmitting data
packets in a network, by using a mechanism called token bucket.
o Token bucket is a mechanism that uses a virtual bucket that holds tokens,
which represent the permission to send data packets. The bucket has a certain
capacity and a certain rate of token generation. The token bucket works as fol-
lows:
When a source wants to send a data packet, it checks if there are
enough tokens in the bucket to cover the size of the packet. If there are
enough tokens, it removes them from the bucket and sends the packet.
84
If there are not enough tokens, it either waits until there are enough to-
kens or drops the packet, depending on the policy.
The bucket generates tokens at a constant rate, up to its capacity. If the
bucket is full, any new tokens are discarded. The rate and capacity of
the bucket determine the maximum and average sending rate of the
source.
o The advantage of token bucket is that it allows bursty traffic to be smoothed
out and shaped according to the network capacity and policy. The disad-
vantage of token bucket is that it may cause delay or packet loss if the traffic
exceeds the bucket capacity or rate.
Traffic policing is a technique that regulates the rate and volume of receiving data
packets in a network, by using a mechanism called leaky bucket.
o Leaky bucket is a mechanism that uses a virtual bucket that holds data packets,
which represent the incoming traffic. The bucket has a certain capacity and a
certain rate of packet leakage. The leaky bucket works as follows:
When a destination receives a data packet, it checks if there is enough
space in the bucket to store the packet. If there is enough space, it adds
the packet to the bucket. If there is not enough space, it drops the
packet.
The bucket leaks packets at a constant rate, up to its capacity. If the
bucket is empty, no packets are leaked. The rate and capacity of the
bucket determine the maximum and average receiving rate of the desti-
nation.
o The advantage of leaky bucket is that it prevents congestion and ensures fair-
ness by limiting the receiving rate of the destination. The disadvantage of
leaky bucket is that it may cause delay or packet loss if the traffic exceeds the
bucket capacity or rate.
An example of traffic shaping using token bucket and traffic policing using leaky
bucket is shown in the following diagram:
85
Domain Name Space (DNS) is a system that maps domain names to IP addresses and vice
versa in a network, such as the Internet.
DNS works as follows:
o DNS uses a hierarchical structure that divides the domain name space into various
levels, such as top-level domains (TLDs), second-level domains, third-level domains,
and so on. Each level is separated by a dot (.) in the domain name. For example,
www.example.com is a domain name that consists of three levels: www is the third-
level domain, example is the second-level domain, and com is the TLD.
o DNS uses a distributed database that stores the mappings of domain names and IP ad-
dresses in various servers, called name servers. Each name server is responsible for a
certain portion of the domain name space, called a zone. For example, a name server
for the com zone can store the mappings of all the second-level domains under the
com TLD, such as example.com, google.com, or amazon.com.
o DNS uses a client-server model that involves two types of entities: DNS clients, also
called resolvers, and DNS servers, also called name servers. A DNS client is an appli-
cation or a device that requests the IP address of a domain name or vice versa from a
DNS server. A DNS server is an application or a device that responds to the DNS que-
ries from the DNS clients by providing the requested information or forwarding the
query to another DNS server.
o DNS uses a resolution process that involves two types of queries: recursive queries
and iterative queries.
Recursive query is a type of query that requires the DNS server to provide a
definitive answer to the DNS client, either by providing the requested infor-
mation or by returning an error message. If the DNS server does not have the
86
requested information in its cache or zone, it has to contact other DNS serv-
ers until it finds the answer or an error. For example, if a DNS client asks a
DNS server for the IP address of www.example.com using a recursive query,
the DNS server has to either provide the IP address or return an error message
to the DNS client.
Iterative query is a type of query that allows the DNS server to provide either
a definitive answer or a referral to another DNS server to the DNS client. If
the DNS server does not have the requested information in its cache or zone,
it can refer the DNS client to another DNS server that may have the answer
or another referral. For example, if a DNS client asks a DNS server for the IP
address of www.example.com using an iterative query, the DNS server can
either provide the IP address or refer the DNS client to another DNS server,
such as the name server for the com zone or the example.com zone.
An example of DNS hierarchy is shown in the following diagram:
An example of DNS resolution process using recursive and iterative queries is shown in the
following diagram:
87
Dynamic DNS (DDNS) is a system that dynamically updates the DNS records of a domain
name in response to the changes in its IP address or other parameters in a network, such as the
Internet.
88
DDNS works as follows:
o DDNS requires the support of three components: a DDNS client, a DDNS server, and
a DDNS provider. A DDNS client is an application or a device that monitors the IP
address or other parameters of a domain name and sends updates to a DDNS server. A
DDNS server is an application or a device that receives the updates from the DDNS
client and modifies the DNS records of the domain name accordingly. A DDNS pro-
vider is an entity that offers the DDNS service and maintains the DDNS server and
the domain name registration.
o The DDNS client and the DDNS server communicate using various protocols, such as
HTTP, HTTPS, or DNS update. The DDNS client and the DDNS server authenticate
each other using various methods, such as username and password, digital signature,
or IP address verification.
o The DDNS client periodically checks the IP address or other parameters of the do-
main name and compares them with the previous values. If there is any change, the
DDNS client sends an update request to the DDNS server, which contains the new
values and other information, such as hostname, TTL, or record type.
o The DDNS server receives the update request from the DDNS client and verifies its
validity and authenticity. If the update request is valid and authentic, the DDNS
server modifies the DNS records of the domain name accordingly and sends an up-
date response to the DDNS client, which contains the status and result of the update.
The use cases and benefits of DDNS are as follows:
o DDNS is useful for devices or applications that have dynamic or changing IP ad-
dresses, such as home networks, mobile devices, or cloud services. DDNS allows
these devices or applications to be accessible by using a consistent and memorable
domain name, instead of a variable and complex IP address.
o DDNS is beneficial for users who want to host their own websites, servers, or ser-
vices on their devices or applications, without relying on a third-party hosting pro-
vider. DDNS allows these users to have more control and flexibility over their online
presence and identity, as well as save money and resources.
89
TELNET
TELNET is a remote terminal access protocol that allows a user to access and control an-
other device or system over a network, such as the Internet.
SSH server. SSH also provides additional features, such as port forwarding, file trans-
fer, or tunneling.
File Transfer Protocol (FTP) is a protocol for transferring files between systems over a net-
work, such as the Internet.
FTP works as follows:
o FTP uses a client-server model that involves two types of entities: FTP client and FTP
server. An FTP client is an application or a device that initiates a connection to an
FTP server and requests to upload or download files. An FTP server is an application
or a device that accepts and responds to the connection requests from the FTP clients
and provides access to the files stored on the server.
o FTP uses two connections for transferring files: control connection and data connec-
tion. The control connection is used for exchanging commands and responses be-
tween the FTP client and the FTP server. The data connection is used for transferring
the actual files between the FTP client and the FTP server.
o FTP uses various commands for transferring files, such as USER, PASS, LIST,
RETR, STOR, QUIT, and so on. The FTP client sends these commands to the FTP
server using the control connection. The FTP server responds to these commands with
numerical codes and messages using the control connection. The FTP client and the
FTP server use the data connection to transfer the files according to the commands.
o FTP uses two modes for transferring files: active mode and passive mode. The active
mode and the passive mode differ in how they establish the data connection between
the FTP client and the FTP server.
Active mode is a mode where the FTP client initiates the data connection to
the FTP server. The FTP client sends its IP address and port number to the
FTP server using the PORT command. The FTP server connects to the IP ad-
dress and port number provided by the FTP client using the data connection.
Passive mode is a mode where the FTP server initiates the data connection to
the FTP client. The FTP client requests the IP address and port number from
the FTP server using the PASV command. The FTP server responds with its
IP address and port number using the control connection. The FTP client con-
nects to the IP address and port number provided by the FTP server using the
data connection.
An example of FTP operation using active mode is shown in the following diagram:
91
World Wide Web (WWW) is a system that allows users to access and share infor-
mation over a network, such as the Internet, using web browsers and web servers.
WWW works as follows:
o WWW uses a client-server model that involves two types of entities: web browsers
and web servers. A web browser is an application or a device that allows the user to
request and view web pages from a web server. A web server is an application or a
device that stores and delivers web pages to the web browsers.
o WWW uses a protocol called Hypertext Transfer Protocol (HTTP) to communicate
between the web browsers and the web servers. HTTP is a stateless and request-re-
sponse protocol that transfers hypertext documents, which are documents that contain
text, images, audio, video, or other multimedia elements, as well as links to other doc-
uments, called hyperlinks.
o WWW uses a standard format for the web pages, called Hypertext Markup Language
(HTML), which defines the structure and content of the web pages using tags and at-
tributes. HTML also allows the inclusion of other formats, such as Cascading Style
Sheets (CSS), which define the style and appearance of the web pages, or JavaScript,
which define the behavior and interactivity of the web pages.
HTTP request and response format are as follows:
o HTTP request is a message that is sent by the web browser to the web server to re-
quest a web page or a resource. HTTP request consists of three parts: request line, re-
quest header, and request body. The request line contains the method, the URL, and
the version of HTTP. The request header contains various fields that provide infor-
mation about the request, such as Host, User-Agent, Accept, Cookie, and so on. The
request body contains optional data that is sent to the server, such as form data or file
upload.
o HTTP response is a message that is sent by the web server to the web browser to re-
spond to a request. HTTP response consists of three parts: status line, response
header, and response body. The status line contains the version of HTTP, the status
code, and the status message. The status code is a three-digit number that indicates
the outcome of the request, such as 200 (OK), 404 (Not Found), or 500 (Internal
Server Error). The status message is a short phrase that describes the status code. The
response header contains various fields that provide information about the response,
such as Content-Type, Content-Length, Server, Set-Cookie, and so on. The response
92
body contains optional data that is sent to the browser, such as HTML document or
image file.
An example of HTTP request and response format is shown in the following diagram:
Simple Network Management Protocol (SNMP) is a protocol that allows network manage-
ment and monitoring of various devices and systems in a network, such as routers, switches,
servers, printers, or computers.
SNMP works as follows:
o SNMP uses a client-server model that involves two types of entities: SNMP managers
and SNMP agents. An SNMP manager is an application or a device that requests and
receives information from the SNMP agents. An SNMP agent is an application or a
device that provides and sends information to the SNMP managers.
o SNMP uses a protocol called User Datagram Protocol (UDP) to communicate be-
tween the SNMP managers and the SNMP agents. UDP is a connectionless and unre-
liable protocol that transfers data packets without guaranteeing their delivery, order,
or integrity. UDP uses port 161 for the SNMP managers and port 162 for the SNMP
agents.
o SNMP uses a data structure called Management Information Base (MIB) to store and
organize the information about the devices and systems in the network. MIB consists
of various objects that represent different aspects or attributes of the devices and sys-
tems, such as name, status, performance, or configuration. MIB uses a hierarchical
structure that divides the objects into various branches, such as system, interfaces, ip,
tcp, udp, and so on. Each object has a unique identifier, called an object identifier
(OID), which consists of a sequence of numbers separated by dots (.). For example,
the OID for the system name object is 1.3.6.1.2.1.1.5.
o SNMP uses various operations for network management and monitoring, such as
GET, SET, GETNEXT, GETBULK, TRAP, INFORM, and so on. The SNMP man-
ager sends these operations to the SNMP agent using a message format called Proto-
col Data Unit (PDU), which contains the operation type, the OID, the value, and other
information. The SNMP agent responds to these operations with another PDU mes-
sage that contains the result or status of the operation.
93
Bluetooth
Firewalls
Firewalls are network security mechanisms that monitor and control the incoming and out-
going traffic between different networks or devices, based on predefined rules and policies.
Firewalls have the following roles:
o Firewalls protect the network or device from unauthorized or malicious access, such
as hackers, viruses, worms, or denial-of-service attacks.
o Firewalls filter the network or device from unwanted or harmful traffic, such as spam,
phishing, or malware.
o Firewalls enforce the network or device policies and regulations, such as authentica-
tion, authorization, encryption, or logging.
Firewalls can be classified into three types: packet filtering, proxy, and stateful inspection.
o Packet filtering is a type of firewall that examines each packet that passes through the
firewall and decides whether to allow or block it based on its source address, destina-
tion address, protocol, port number, or other criteria. Packet filtering is simple and
fast, but it does not inspect the content or the state of the packets.
o Proxy is a type of firewall that acts as an intermediary between the network or device
and the external network or device. Proxy intercepts and modifies the packets that
pass through the firewall and creates a new connection with the external network or
device. Proxy can inspect the content and the state of the packets, as well as provide
caching and authentication services. Proxy is complex and slow, but it provides more
security and functionality than packet filtering.
o Stateful inspection is a type of firewall that combines the features of packet filtering
and proxy. Stateful inspection examines each packet that passes through the firewall
and decides whether to allow or block it based on its source address, destination ad-
dress, protocol, port number, content, state, or other criteria. Stateful inspection is
more intelligent and flexible than packet filtering and proxy, but it requires more
memory and processing power.
An example of firewalls is shown in the following diagram:
95
96
Machine Learning
UNIT-I: Introduction to Machine Learning
1. Definition of Machine Learning
Machine learning is the study of computer algorithms that can learn from data and make pre-
dictions or decisions based on the data.
Machine learning is important because it enables computers to perform tasks that are difficult
or impossible to program explicitly, such as recognizing faces, understanding natural lan-
guage, playing games, etc.
Machine learning can also help us discover new knowledge and insights from large and com-
plex data sets, such as genomic data, social media data, etc.
Some examples of machine learning in real-world applications are:
o Spam filtering: Machine learning algorithms can learn to classify email messages as
spam or not spam based on the content and metadata of the messages.
o Recommendation systems: Machine learning algorithms can learn to recommend
products, movies, music, etc. to users based on their preferences and behavior.
o Self-driving cars: Machine learning algorithms can learn to control a car autono-
mously by sensing the environment and making decisions accordingly.
o Speech recognition: Machine learning algorithms can learn to recognize and tran-
scribe spoken words into text.
Machine learning can be broadly classified into three types based on the type and amount of
data available for learning:
o Supervised learning: Learning with labeled data
o Unsupervised learning: Discovering patterns in unlabeled data
o Reinforcement learning: Learning through trial and error
Supervised learning is the most common type of machine learning, where the algorithm is
given a set of input-output pairs (called training data) and learns to find a function that maps
the input to the output. The goal of supervised learning is to generalize the learned function to
new and unseen inputs (called test data) and make accurate predictions. Some examples of
supervised learning tasks are:
o Classification: Predicting a discrete label for an input, such as whether an email is
spam or not, whether a tumor is benign or malignant, etc.
o Regression: Predicting a continuous value for an input, such as the price of a house,
the age of a person, etc.
97
o
Unsupervised learning is the type of machine learning where the algorithm is given a set of
inputs without any labels or outputs and learns to discover patterns or structure in the data.
The goal of unsupervised learning is to find hidden or latent variables that explain the varia-
tion in the data. Some examples of unsupervised learning tasks are:
o Clustering: Grouping similar inputs into clusters, such as customers based on their
purchase behavior, documents based on their topics, etc.
o Dimensionality reduction: Reducing the number of features or variables in the data
while preserving the essential information, such as principal component analysis
(PCA), singular value decomposition (SVD), etc.
98
o
Reinforcement learning is the type of machine learning where the algorithm learns from its
own actions and feedback from the environment. The algorithm (called an agent) interacts
with the environment (which can be stochastic or deterministic) and observes the state and
reward (which can be positive or negative) of the environment. The goal of reinforcement
learning is to find an optimal policy that maximizes the expected cumulative reward over
time. Some examples of reinforcement learning tasks are:
o Control: Learning to control a system or a device, such as a robot arm, a helicopter,
etc.
o Games: Learning to play games against human or computer opponents, such as chess,
Go, Atari games, etc.
Machine learning has been applied to various fields and domains, such as healthcare, finance,
robotics, etc., to solve complex and challenging problems that require human-like intelligence
and decision making. Some examples of machine learning applications in different fields are:
o Healthcare: Machine learning can help diagnose diseases, predict outcomes, recom-
mend treatments, analyze medical images, discover new drugs, etc. For instance,
DeepMind’s AlphaFold system can predict the three-dimensional structure of
proteins from their amino acid sequences using deep neural networks.
IBM’s Watson system can assist doctors in diagnosing cancer and suggesting
personalized treatments based on natural language processing and knowledge
representation.
Google’s DeepMind Health system can detect eye diseases from retinal scans
using convolutional neural networks.
o Finance: Machine learning can help detect fraud, optimize portfolios, predict market
movements, automate trading, analyze customer behavior, etc. For instance,
PayPal’s fraud detection system can identify fraudulent transactions and flag
them for review using supervised learning and anomaly detection.
99
BlackRock’s Aladdin system can help investors manage their portfolios and
risks using optimization and simulation techniques.
Renaissance Technologies’ Medallion fund can generate high returns by using
quantitative and statistical methods to trade in the financial markets.
o Robotics: Machine learning can help robots learn to perform tasks that require per-
ception, manipulation, navigation, coordination, etc. For instance,
Boston Dynamics’ Atlas robot can perform acrobatic feats such as backflips,
somersaults, etc. using reinforcement learning and control theory.
OpenAI’s Dactyl system can manipulate objects with a robotic hand using
reinforcement learning and computer vision.
Amazon’s Kiva system can optimize the warehouse operations by using au-
tonomous mobile robots that can move shelves and products.
Machine learning is not a magic bullet that can solve any problem without any difficulties or
drawbacks. There are many challenges and issues that need to be addressed when applying
machine learning to real-world problems, such as data quality and quantity, bias and fairness,
ethical considerations, etc.
Data quality and quantity: Machine learning algorithms depend on the data that they are given
to learn from. Therefore, the quality and quantity of the data are crucial for the performance
and reliability of the algorithms. Some of the common problems with data are:
o Missing data: Some data points may have missing values for some features or varia-
bles, which can affect the accuracy and completeness of the analysis.
o Outliers: Some data points may have extreme or abnormal values that deviate from
the rest of the data, which can affect the robustness and stability of the analysis.
o Noise: Some data points may have errors or inaccuracies due to measurement errors,
human errors, transmission errors, etc., which can affect the validity and reliability of
the analysis.
o Inconsistencies: Some data points may have conflicting or contradictory values due to
different sources, formats, standards, etc., which can affect the consistency and com-
parability of the analysis.
Bias and fairness: Machine learning algorithms may inherit or amplify the biases that exist in
the data, the models, or the users, which can lead to unfair or discriminatory outcomes or de-
cisions. Some of the common sources of bias are:
o Data bias: The data may not represent the true population or distribution of interest,
due to sampling bias, selection bias, reporting bias, etc., which can affect the generali-
zability and representativeness of the analysis.
o Model bias: The model may not capture the true relationship or function of interest,
due to overfitting, underfitting, confounding, etc., which can affect the accuracy and
interpretability of the analysis.
o User bias: The user may not use or evaluate the model in an objective or impartial
manner, due to confirmation bias, anchoring bias, hindsight bias, etc., which can af-
fect the trustworthiness and accountability of the analysis.
Ethical considerations: Machine learning algorithms may have ethical implications or conse-
quences that need to be considered when designing, deploying, or using them. Some of the
common ethical issues are:
o Privacy: The data may contain sensitive or personal information that may be exposed
or misused by unauthorized parties, which can affect the confidentiality and security
of the analysis.
o Transparency: The model may not be transparent or explainable enough to justify or
understand its predictions or decisions, which can affect the interpretability and ac-
countability of the analysis.
100
o Responsibility: The user may not be responsible or liable for the outcomes or actions
of the model, which can affect the trustworthiness and accountability of the analysis.
Machine learning is not a one-shot process that can be done in a single step. It is a complex
and iterative process that involves multiple steps from data collection to model deployment.
The machine learning workflow can be summarized as follows:
o Data collection: The first step is to collect or acquire the data that is relevant and suf-
ficient for the problem at hand. The data can come from various sources such as data-
bases, web pages, sensors, surveys, etc.
o Data exploration: The next step is to explore or analyze the data to understand its
characteristics and structure. This can involve descriptive statistics, data visualization,
feature engineering, etc.
o Data pre-processing: The next step is to pre-process or transform the data to make it
suitable and ready for machine learning. This can involve scaling, normalization, en-
coding, imputation, etc.
o
Model training: The next step is to train or fit the machine learning model to the data
using a suitable algorithm and technique. This can involve choosing a model architec-
ture, setting hyperparameters, optimizing a loss function, etc.
Model evaluation: The next step is to evaluate or test the machine learning model on
new and unseen data to measure its performance and generalization. This can involve
choosing a metric, splitting the data into train, validation, and test sets, cross-valida-
tion, etc.
Model fine-tuning: The next step is to fine-tune or improve the machine learning
model by adjusting its parameters or features based on the evaluation results. This can
involve regularization, feature selection, grid search, etc.
Model deployment: The final step is to deploy or use the machine learning model in a
real-world setting or application. This can involve saving, loading, updating, monitor-
ing, etc. the model.
101
Data is the raw material or input for machine learning. Data can be of different types depend-
ing on the nature and format of the information it contains. The basic types of data in machine
learning are:
o Numerical data: Data that consists of numbers or quantities that can be measured or
calculated. Numerical data can be further divided into two subtypes:
Continuous data: Data that can take any value within a range or interval, such
as height, weight, temperature, etc.
Discrete data: Data that can take only certain values or categories, such as
number of children, gender, zip code, etc.
o Categorical data: Data that consists of labels or names that represent groups or classes
of items or entities. Categorical data can be further divided into two subtypes:
Nominal data: Data that has no inherent order or ranking among the catego-
ries, such as color, nationality, animal, etc.
Ordinal data: Data that has a meaningful order or ranking among the catego-
ries, such as education level, satisfaction rating, movie genre, etc.
o Text data: Data that consists of words or sentences that convey meaning or infor-
mation. Text data can be in the form of natural language (such as English, French,
etc.) or artificial language (such as HTML, SQL, etc.). Text data requires special pre-
processing techniques for machine learning, such as:
Tokenization: Splitting the text into smaller units or tokens, such as words,
characters, symbols, etc.
Stemming: Reducing the tokens to their root or base form, such as playing ->
play, cats -> cat, etc.
Lemmatization: Converting the tokens to their canonical or dictionary form,
such as played -> play, mice -> mouse, etc.
Stopword removal: Removing the tokens that are very common or irrelevant
for the analysis, such as the, a, and, etc.
Vectorization: Converting the tokens into numerical vectors that represent
their frequency or importance in the text, such as bag-of-words (BOW), term
frequency-inverse document frequency (TF-IDF), word embeddings
(Word2Vec), etc.
Before applying machine learning algorithms to the data, it is important to explore or under-
stand the structure and characteristics of the data. This can help to identify patterns, trends,
outliers, relationships, etc. in the data and gain insights for further analysis. Some of the tech-
niques for exploring the structure of data are:
o Descriptive statistics: Calculating numerical summaries or measures that describe the
central tendency, variability, and distribution of the data. Some common descriptive
statistics are:
Mean: The average value of the data points.
Median: The middle value of the data points when sorted in ascending or de-
scending order.
Mode: The most frequent value of the data points.
Variance: The measure of how much the data points deviate from the mean.
Standard deviation: The square root of the variance.
Range: The difference between the maximum and minimum values of the
data points.
Percentile: The value below which a certain percentage of the data points fall.
Quartile: The value that divides the data points into four equal groups (Q1 =
25th percentile, Q2 = 50th percentile = median, Q3 = 75th percentile).
102
Data quality is an essential factor for machine learning success. Poor quality data can lead to
poor quality results and unreliable predictions or decisions. Therefore, it is important to check
and improve the quality of the data before applying machine learning algorithms. Some of the
common problems with data quality and their remediation techniques are:
o Missing data: Some data points may have missing values for some features or varia-
bles due to various reasons such as incomplete records, data entry errors, etc. Missing
data can affect the accuracy and completeness of the analysis and introduce bias or
uncertainty in the results. Some of the techniques for dealing with missing data are:
Imputation: Replacing the missing values with some estimated or plausible
values based on the available data. Some common imputation methods are:
Mean imputation: Replacing the missing values with the mean value
of the feature.
Median imputation: Replacing the missing values with the median
value of the feature.
103
Mode imputation: Replacing the missing values with the mode value
of the feature.
K-nearest neighbors (KNN) imputation: Replacing the missing val-
ues with the average value of the k most similar data points based on
some distance metric.
Deletion: Removing the data points or features that have missing values. This
can be done in two ways:
Listwise deletion: Removing the entire data point if any of its fea-
tures have missing values.
Pairwise deletion: Removing only the feature that has missing values
and keeping the rest of the data point.
o Outliers: Some data points may have extreme or abnormal values that deviate from
the rest of the data due to various reasons such as measurement errors, data entry er-
rors, natural variation, etc. Outliers can affect the robustness and stability of the anal-
ysis and introduce noise or distortion in the results. Some of the techniques for detect-
ing and handling outliers are:
Detection: Identifying the data points that are outliers based on some criteria
or threshold. Some common detection methods are:
Box plot: Using a box plot to visualize the distribution of a numerical
feature and finding the data points that lie outside the whiskers (or
1.5 * IQR) as outliers.
Z-score: Calculating the standardized score or z-score for each data
point as (value - mean) / standard deviation and finding the data
points that have z-scores greater than 3 or less than -3 as outliers.
Interquartile range (IQR): Calculating the interquartile range or IQR
for each feature as Q3 - Q1 and finding the data points that have val-
ues greater than Q3 + 1.5 * IQR or less than Q1 - 1.5 * IQR as outli-
ers.
Handling: Dealing with the data points that are outliers based on some strat-
egy or action. Some common handling methods are:
Removal: Removing the outliers from the data set.
Capping: Replacing the outliers with some maximum or minimum
value within a reasonable range.
Transformation: Applying some mathematical function to reduce or
eliminate the effect of outliers, such as log, square root, etc.
Data cleaning: Removing or correcting the errors or inconsistencies in the data due to various
reasons such as human errors, data entry errors, formatting errors, etc. Data cleaning can im-
prove the validity and reliability of the analysis and reduce the noise or distortion in the re-
sults. Some of the techniques for data cleaning are:
104
o Validation: Checking the data for errors or inconsistencies using some rules or crite-
ria, such as data type, data range, data format, etc.
o Correction: Fixing the errors or inconsistencies in the data using some methods or
tools, such as manual editing, automatic correction, etc.
o Standardization: Converting the data to a common or consistent format or standard,
such as date format, currency format, unit of measurement, etc.
9. Data Pre-Processing
Data pre-processing is the process of transforming or modifying the data to make it suitable
and ready for machine learning. Data pre-processing can improve the performance and gener-
alization of the machine learning algorithms and enhance the quality and usability of the re-
sults. Some of the techniques for data pre-processing are:
o Scaling and normalization: Changing the range or scale of the numerical features to
make them comparable or compatible with each other and with the machine learning
algorithms. Some common scaling and normalization methods are:
Min-max scaling: Scaling the features to a fixed range between 0 and 1 by
using the formula (value - min) / (max - min).
Standardization: Scaling the features to have zero mean and unit variance by
using the formula (value - mean) / standard deviation.
Normalization: Scaling the features to have unit norm (length) by dividing
each value by the square root of the sum of squares of all values.
105
o One-hot encoding: Converting the categorical features into numerical features by cre-
ating dummy variables for each category. For example, if a feature has three catego-
ries A, B, and C, then one-hot encoding will create three new features A’, B’, and C’
such that A’ = 1 if A is present and 0 otherwise, B’ = 1 if B is present and 0 otherwise,
and C’ = 1 if C is present and 0 otherwise.
o Handling imbalanced data: Dealing with the data that has unequal or disproportionate
distribution of classes or labels. For example, if a classification problem has two clas-
ses positive and negative, and 90% of the data points belong to the negative class and
only 10% belong to the positive class, then the data is imbalanced. Imbalanced data
can affect the accuracy and fairness of the machine learning algorithms and introduce
bias or skewness in the results. Some of the techniques for handling imbalanced data
are:
Oversampling: Increasing the number of data points in the minority class by
duplicating or generating new data points based on some methods or tech-
niques, such as random oversampling, synthetic minority oversampling tech-
nique (SMOTE), etc.
106
Model selection and evaluation are important steps in the machine learning workflow, where
the goal is to choose the best model among a set of candidate models and measure its perfor-
mance and generalization on new and unseen data.
Model evaluation metrics are quantitative measures that assess how well a model performs on
a given task or problem. Different metrics may be suitable for different types of tasks or prob-
lems, such as classification, regression, clustering, etc. Some common model evaluation met-
rics are:
o Accuracy: The proportion of correct predictions among the total number of predic-
tions. Accuracy is a simple and intuitive metric for classification problems, but it may
not be appropriate for imbalanced data or multiclass problems.
o Precision: The proportion of correct positive predictions among the total number of
positive predictions. Precision measures how precise or exact a model is in identify-
ing the positive class, but it may not account for the negative class or the false nega-
tives.
o Recall: The proportion of correct positive predictions among the total number of ac-
tual positives. Recall measures how complete or exhaustive a model is in identifying
the positive class, but it may not account for the negative class or the false positives.
o F1-score: The harmonic mean of precision and recall. F1-score balances both preci-
sion and recall and gives a single score that reflects the overall performance of a
model on the positive class.
o Mean squared error (MSE): The average of the squared differences between the ac-
tual and predicted values. MSE measures how close a model is to the true values, but
it may be sensitive to outliers or large errors.
o Root mean squared error (RMSE): The square root of the mean squared error. RMSE
measures how close a model is to the true values, but it may be easier to interpret than
MSE as it has the same unit as the values.
o R-squared: The proportion of the variance in the actual values that is explained by the
model. R-squared measures how well a model fits the data, but it may not indicate
how well a model predicts new data.
Cross-validation is a technique for validating or testing a model on different subsets of the
data to reduce overfitting or underfitting and improve generalization. Cross-validation in-
volves splitting the data into k folds or groups, using k-1 folds for training and one fold for
testing, and repeating this process k times with different folds for testing. Some common
cross-validation methods are:
o k-fold cross-validation: Splitting the data into k equal-sized folds and using each fold
as a test set once and as a training set k-1 times.
o
108
o Holdout method: Splitting the data into two parts, one for training and one for testing,
and using them only once. This is a simple and fast method, but it may not use all the
data or reflect all the variability in the data.
Selecting a model is the process of choosing the best model among a set of candidate models
based on some criteria or objectives. Selecting a model is important because different models
may have different strengths and weaknesses, advantages and disadvantages, and suitability
and applicability for different tasks or problems.
Criteria for choosing appropriate models are quantitative or qualitative measures that evaluate
or compare the performance and quality of different models. Different criteria may be rele-
vant for different types of tasks or problems, such as classification, regression, clustering, etc.
Some common criteria for choosing appropriate models are:
o Accuracy: The degree to which a model makes correct predictions or decisions on
new and unseen data. Accuracy is a simple and intuitive criterion for selecting a
model, but it may not account for other factors such as complexity, interpretability,
robustness, etc.
o Complexity: The degree to which a model has many parameters or features that affect
its behavior or output. Complexity is a trade-off criterion for selecting a model, as it
may affect both the accuracy and the interpretability of the model. A complex model
may have high accuracy but low interpretability, while a simple model may have low
accuracy but high interpretability.
o Interpretability: The degree to which a model can be understood or explained by hu-
mans. Interpretability is an important criterion for selecting a model, especially for
applications that require trust, transparency, or accountability. An interpretable model
can provide insights into the logic or reasoning behind its predictions or decisions,
while an uninterpretable model can be seen as a black box that produces outputs with-
out explanations.
o Robustness: The degree to which a model can handle noise, uncertainty, or variability
in the data or the environment. Robustness is a desirable criterion for selecting a
model, as it indicates how well a model can cope with real-world situations that may
not be ideal or perfect. A robust model can maintain its performance and quality un-
der different conditions or scenarios, while a non-robust model can be sensitive or un-
stable under different conditions or scenarios.
Bias-variance tradeoff is a fundamental concept in machine learning that describes the rela-
tionship between the complexity and the accuracy of a model. Bias-variance tradeoff states
that:
o Bias is the error or difference between the expected or average prediction of a model
and the true value. Bias measures how accurate a model is on average across different
data sets.
o Variance is the error or difference between the actual prediction of a model and the
expected prediction of a model. Variance measures how consistent a model is across
different data sets.
o There is an inverse relationship between bias and variance, such that increasing one
will decrease the other and vice versa. This means that there is a tradeoff between
bias and variance, such that reducing both at the same time is difficult or impossible.
o A high-bias model is a simple or underfitting model that has low complexity and high
error on both training and test data. A high-bias model does not capture the true rela-
tionship or function of interest and makes systematic errors.
o A high-variance model is a complex or overfitting model that has high complexity
and low error on training data but high error on test data. A high-variance model cap-
tures the noise or randomness in the data and makes random errors.
109
Training a model is the process of building or fitting a machine learning model to the data us-
ing a suitable algorithm and technique. Training a model is an essential step in supervised
learning, where the goal is to find a function that maps the input to the output based on la-
beled data.
Using training data to build a model involves providing input-output pairs (called training ex-
amples) to the machine learning algorithm and adjusting the parameters or weights of the
model to minimize the error or loss between the actual and predicted outputs. The training
data should be representative and sufficient for the problem at hand, as it determines how well
the model can learn from the data and generalize to new data.
Model representation and interpretability are important aspects of training a machine
learning model, as they determine how the model can be understood or explained by
humans. Different models may have different representations and interpretability, de-
pending on their complexity and structure. Some common types of model representa-
tion and interpretability are:
o Linear models: Models that have a linear or additive relationship between the
input and the output, such as linear regression, logistic regression, etc. Linear
models are easy to represent and interpret, as they can be expressed by a sim-
ple equation or formula that shows the contribution or effect of each input fea-
ture on the output.
o Tree-based models: Models that have a hierarchical or branching structure that
splits the input space into smaller regions based on some criteria or rules, such
as decision trees, random forests, etc. Tree-based models are moderately easy
to represent and interpret, as they can be visualized by a tree diagram or graph
that shows the path or sequence of decisions that lead to the output.
o Neural network models: Models that have a layered or networked structure
that consists of interconnected nodes or units that perform some computation
or transformation on the input, such as artificial neural networks, deep neural
networks, etc. Neural network models are difficult to represent and interpret,
as they can have many layers and nodes with complex and nonlinear interac-
tions that are hard to explain or understand.
110
Feature engineering is the process of creating or modifying the features or variables that are
used as input for machine learning models. Feature engineering can enhance the performance
and quality of the machine learning models by improving the representation and suitability of
the data for the problem at hand.
Techniques to improve feature representations are methods or strategies that can transform,
construct, or extract new or better features from the existing data. Different techniques may be
suitable for different types of data or problems, such as numerical, categorical, text, etc. Some
common techniques to improve feature representations are:
o Feature transformation: Changing the scale, shape, or distribution of the features to
make them more compatible or appropriate for the machine learning models. For ex-
ample, scaling, normalization, log, square root, etc.
o Feature construction: Creating new features from existing ones by using some logic,
computation, or combination. For example, adding, subtracting, multiplying, divid-
ing, etc.
o Feature extraction: Reducing the dimensionality or number of features by using some
technique that preserves the essential information or structure of the data. For exam-
ple, principal component analysis (PCA), singular value decomposition (SVD), latent
Dirichlet allocation (LDA), etc.
o Feature selection: Choosing a subset of features that are relevant or important for the
problem at hand by using some criterion or method. For example, correlation, mutual
information, chi-square test, etc.
Feature transformation is a technique for feature engineering that involves changing the scale,
shape, or distribution of the features to make them more compatible or appropriate for the ma-
chine learning models. Feature transformation can improve the model fit and performance by
reducing skewness, outliers, heteroscedasticity, multicollinearity, etc.
Transforming features to improve model fit involves applying some mathematical function or
operation to the features to change their values or properties. Different functions or operations
may have different effects or benefits on the features and the models. Some common func-
tions or operations for feature transformation are:
111
o Scaling: Changing the range or magnitude of the features to make them comparable
or consistent with each other and with the models. Scaling can help to avoid numeri-
cal issues or errors due to large or small values and improve the convergence or sta-
bility of the models. Some common scaling methods are:
Min-max scaling: Scaling the features to a fixed range between 0 and 1 by
using the formula (value - min) / (max - min).
Standardization: Scaling the features to have zero mean and unit variance by
using the formula (value - mean) / standard deviation.
Normalization: Scaling the features to have unit norm (length) by dividing
each value by the square root of the sum of squares of all values.
o Logarithm: Applying the logarithm function to the features to reduce their range or
magnitude and make them more symmetric or normal. Logarithm can help to handle
skewed or exponential features and reduce the effect of outliers or extreme values.
The logarithm function can be expressed as log(value).
Square root: Applying the square root function to the features to reduce their range or
magnitude and make them more symmetric or normal. Square root can help to handle
skewed or quadratic features and reduce the effect of outliers or extreme values. The
square root function can be expressed as sqrt(value).
o Power: Applying the power function to the features to increase their range or
magnitude and make them more skewed or exponential. Power can help to
handle symmetric or normal features and increase the effect of outliers or ex-
treme values. The power function can be expressed as value^p, where p is a
positive or negative exponent.
Feature construction is a technique for feature engineering that involves creating new features
from existing ones by using some logic, computation, or combination. Feature construction
can enhance the performance and quality of the machine learning models by adding more in-
formation or structure to the data and capturing more complex or nonlinear relationships be-
tween the features and the output.
Creating new features from existing ones involves applying some operation or function to the
existing features to generate new features that are more relevant or useful for the problem at
hand. Different operations or functions may have different effects or benefits on the features
and the models. Some common operations or functions for feature construction are:
o Arithmetic: Performing arithmetic operations such as addition, subtraction, multipli-
cation, division, etc. on the existing features to create new features that represent
some meaningful or interesting quantity or ratio. For example, creating a new feature
that represents the body mass index (BMI) by dividing the weight by the square of the
height.
o Logical: Performing logical operations such as and, or, not, etc. on the existing fea-
tures to create new features that represent some condition or rule. For example, creat-
ing a new feature that represents whether a person is obese or not by using a logical
expression such as BMI > 30.
o Polynomial: Performing polynomial operations such as raising to a power, taking a
root, etc. on the existing features to create new features that represent some higher-
order or lower-order term. For example, creating a new feature that represents the
square of the age by using a polynomial expression such as age^2.
o Trigonometric: Performing trigonometric operations such as sine, cosine, tangent, etc.
on the existing features to create new features that represent some periodic or cyclic
pattern. For example, creating a new feature that represents the sine of the hour by
using a trigonometric expression such as sin(hour).
112
Feature engineering for specific applications involves creating new features that are tailored
or customized for a particular domain or field that requires some domain knowledge or exper-
tise. For example, creating new features for text data may involve using natural language pro-
cessing techniques such as tokenization, stemming, lemmatization, etc., while creating new
features for image data may involve using computer vision techniques such as edge detection,
segmentation, feature extraction, etc.
Feature extraction is a technique for feature engineering that involves reducing the dimen-
sionality or number of features by using some technique that preserves the essential infor-
mation or structure of the data. Feature extraction can improve the performance and quality of
the machine learning models by removing noise, redundancy, or irrelevance from the data and
simplifying or optimizing the computation or storage of the models.
Dimensionality reduction techniques are methods or algorithms that can transform high-di-
mensional data into low-dimensional data by using some principle or criterion. Different tech-
niques may have different objectives or assumptions for dimensionality reduction, such as
variance, distance, correlation, etc. Some common dimensionality reduction techniques are:
o Principal component analysis (PCA): A technique that can reduce the dimensionality
of numerical data by finding linear combinations of the original features that capture
the maximum variance in the data. PCA can produce orthogonal and uncorrelated fea-
tures called principal components (PCs) that can be ranked by their importance or ex-
plained variance.
o Singular value decomposition (SVD): A technique that can decompose any matrix
into three matrices called U, S, and V, where U and V are orthogonal matrices and S
is a diagonal matrix. SVD can be used to reduce the dimensionality of numerical data
by selecting only the largest singular values in S and their corresponding columns in
U and V.
o Latent Dirichlet allocation (LDA): A technique that can reduce the dimensionality of
text data by finding probabilistic distributions of topics over words and documents
113
over topics. LDA can produce latent and interpretable features called topics that can
capture the main themes or concepts in the text data.
Feature selection methods are methods or algorithms that can select a subset of features that
are relevant or important for the problem at hand by using some criterion or method. Different
methods may have different approaches or strategies for feature selection, such as filter, wrap-
per, embedded, etc. Some common feature selection methods are:
o Correlation: A method that can measure how strongly two variables are related to
each other by using some statistic such as Pearson’s correlation coefficient, Spear-
man’s rank correlation coefficient, etc. Correlation can be used to select features that
have high correlation with the output and low correlation with each other.
o Mutual information: A method that can measure how much information two variables
share with each other by using some metric such as entropy, conditional entropy, etc.
Mutual information can be used to select features that have high mutual information
with the output and low mutual information with each other.
o Chi-square test: A method that can test how likely two categorical variables are inde-
pendent of each other by using some statistic such as chi-square statistic, p-value, etc.
Chi-square test can be used to select features that have high chi-square statistic or low
p-value with the output.
114
UNIT-III: Regression
17. Introduction to Regression Analysis
Regression analysis is a type of supervised learning that involves finding a function that best
fits the relationship between one or more input variables (called predictors or independent
variables) and one output variable (called response or dependent variable).
Regression analysis is useful for understanding how the output variable changes with respect
to the input variables, predicting the output variable for new or unseen input values, testing
hypotheses or theories about the relationship between the variables, etc.
Linear regression is the simplest and most common type of regression analysis, where the
function that models the relationship between the input and output variables is a linear or
straight line equation of the form:
y=β0+β1x+ϵ
where:
minimize the sum of squared errors (SSE) or the mean squared error (MSE) between
the actual and predicted output values.
Multiple linear regression is a type of linear regression that involves modeling one output
variable with two or more input variables. Multiple linear regression can capture more com-
plex or nonlinear relationships between the variables and explain more variation in the out-
put variable.
Multiple linear regression can be expressed by a linear equation of the form:
y=β0+β1x1+β2x2+...+βnxn+ϵ
where:
o Independence: The error terms are independent of each other and of the input vari-
ables.
o Homoscedasticity: The error terms have constant or equal variance across different
values of the input variables.
o Normality: The error terms follow a normal or Gaussian distribution with zero mean
and constant variance.
o No multicollinearity: The input variables are not correlated or related to each other.
Regression analysis is not a perfect or flawless technique that can solve any problem without
any difficulties or drawbacks. There are some main problems or challenges that need to be
addressed when applying regression analysis to real-world problems, such as overfitting and
underfitting, non-linearity in data, etc.
Overfitting and underfitting are two common problems that affect the accuracy and generali-
zation of regression models. Overfitting and underfitting are related to the bias-variance
tradeoff, which states that there is an inverse relationship between bias and variance, such
that increasing one will decrease the other and vice versa.
o Overfitting is the problem where the model fits the training data too well and cap-
tures the noise or randomness in the data, but fails to generalize to new or unseen
data. Overfitting results in high variance and low bias, which means that the model
has high sensitivity or instability across different data sets and low error or difference
between the expected and actual output values.
o Underfitting is the problem where the model fits the training data too poorly and
misses the true relationship or function of interest, but performs similarly on new or
unseen data. Underfitting results in low variance and high bias, which means that
the model has low sensitivity or instability across different data sets and high error
or difference between the expected and actual output values.
o An optimal model is a balanced or well-fitting model that fits the training data rea-
sonably well and generalizes to new or unseen data. An optimal model has moderate
variance and bias, which means that the model has moderate sensitivity or instabil-
ity across different data sets and low error or difference between the expected and
actual output values.
Polynomial regression is a type of regression analysis that involves modeling a non-linear re-
lationship between the input and output variables by using a polynomial function of the
form:
y=β0+β1x+β2x2+...+βnxn+ϵ
where:
Logistic regression is a type of regression analysis that involves modeling a binary classifica-
tion problem using a regression function. Binary classification is a supervised learning prob-
lem where the goal is to predict whether an input belongs to one of two classes or catego-
ries, such as yes or no, positive or negative, etc.
Logistic regression can model a binary classification problem by using a logistic function or
sigmoid function of the form:
p=1+e−(β0+β1x)1
where:
etc. The goal of these methods is to find the values of β0 and β1 that maximize the
likelihood or probability of observing the actual output values given the input values.
Probability estimation and decision threshold are important aspects of logistic regres-
sion, as they determine how to make predictions or decisions based on the logistic
function. Probability estimation and decision threshold involve:
βmini=1∑n(yi−(β0+β1xi))2+λ1j=1∑p∣βj∣+λ2j=1∑pβj2
where:
Classification is a type of supervised learning, where the goal is to predict the class label of
an input based on some features.
A class label is a categorical variable that represents the category or group that the input be-
longs to, such as “spam” or “not spam” for an email, or “positive” or “negative” for a senti-
ment analysis.
Features are numerical or categorical variables that describe some characteristics or proper-
ties of the input, such as the number of words, the presence of certain keywords, or the tone
of the text.
A classification model is a function that maps an input to a class label, based on some param-
eters that are learned from a set of training data.
Training data is a collection of inputs and their corresponding class labels, which are used
to train or fit the classification model.
A decision boundary is a surface that separates the input space into different regions, each
corresponding to a class label. For example, in a two-dimensional space, a decision boundary
can be a line, a curve, or a complex shape.
A classification model can be evaluated based on its accuracy, which is the proportion of in-
puts that are correctly classified by the model. Other metrics such as precision, recall,
and F1-score can also be used to measure the performance of a classification model.
One of the applications of classification is email spam detection, where the goal is to classify
an email as either “spam” or “not spam” based on its content.
The features for this task can be the frequency of certain words, the length of the email, the
sender’s address, etc.
The class label for this task is either “spam” or “not spam”, which can be represented by 1 or
0 respectively.
A classification model for this task can be a logistic regression model, which learns a set of
parameters that define a linear decision boundary in the feature space.
120
The training data for this task can be a collection of emails and their labels, which can be ob-
tained from public datasets or from user feedback.
The accuracy of the classification model can be measured by comparing its predictions with
the true labels on a separate set of test data, which is not used for training.
Data preparation is the process of transforming raw data into a suitable format for classifica-
tion learning. It involves the following steps:
o Data collection: obtaining relevant and sufficient data from various sources
o Data cleaning: removing or correcting missing, noisy, or inconsistent data
o Data exploration: analyzing and visualizing data to understand its characteristics and
distribution
o Data preprocessing: applying techniques such as feature extraction, feature selection,
feature scaling, feature encoding, etc. to enhance the quality and usability of data
o Data splitting: dividing data into training, validation, and test sets
Training a classification model is the process of finding the optimal parameters that minimize
a loss function or maximize an objective function on the training data. It involves the fol-
lowing steps:
o Model selection: choosing an appropriate classification algorithm and its hyperparam-
eters
o Model fitting: applying an optimization algorithm such as gradient descent, stochastic
gradient descent, etc. to update the parameters iteratively until convergence
o Model validation: using the validation data to tune the hyperparameters and select the
best model
121
Model Evaluation
Model evaluation is the process of assessing the performance and generalization ability of a
classification model on unseen data. It involves the following steps:
o Model testing: using the test data to measure the accuracy and other metrics of the
model
o Model comparison: comparing different models based on their metrics and trade-offs
o Model interpretation: explaining how the model works and why it makes certain pre-
dictions
Model Deployment
Model deployment is the process of making a classification model available for practical use
in real-world scenarios. It involves the following steps:
o Model integration: integrating the model with other systems or applications
o Model monitoring: tracking and updating the model based on new data and feedback
o Model maintenance: fixing and improving the model based on errors and changes
There are many classification algorithms that can be used for different tasks and data types.
Some of the common ones are:
kNN is a simple and intuitive classification algorithm that predicts the class label of an input
based on the k most similar inputs in the training data.
The similarity between inputs is measured by a distance metric, such as Euclidean distance,
Manhattan distance, etc.
The class label of an input is determined by a majority vote of its k nearest neighbors, or by
a weighted vote based on the inverse of the distances.
122
kNN is a lazy learner, which means it does not learn any parameters from the training data,
but rather stores the entire data and performs the computation at the time of prediction.
kNN is easy to implement and understand, but it can be slow and memory-intensive for large
datasets. It can also be sensitive to noise, outliers, and irrelevant features.
SVM is a powerful and flexible classification algorithm that learns a linear or non-linear de-
cision boundary that maximizes the margin between different classes.
The margin is the distance between the decision boundary and the closest points from each
class, which are called support vectors.
SVM can learn non-linear decision boundaries by using a kernel function, which transforms
the original feature space into a higher-dimensional space where the data becomes more sepa-
rable.
SVM can also handle multi-class classification by using strategies such as one-vs-one, one-
vs-all, etc.
SVM is effective and robust for high-dimensional and complex data, but it can be computa-
tionally expensive and sensitive to hyperparameters and kernel choice.
123
Random Forest
124
Ensemble learning is a technique that combines multiple base learners or weak learners to
create a strong learner or an ensemble that has better performance and generalization ability
than any individual learner.
A base learner is a simple and basic machine learning algorithm, such as a decision tree, a lo-
gistic regression, a kNN, etc.
A strong learner is a complex and powerful machine learning algorithm, such as a random for-
est, a boosted tree, a neural network, etc.
The motivation for ensemble learning is based on the wisdom of crowds principle, which
states that the collective opinion of a group of individuals is more accurate and reliable than
the opinion of any single individual.
There are three main techniques for ensemble learning: bagging, boosting, and stacking.
Bagging
Bagging stands for bootstrap aggregating, which is a technique that creates multiple base
learners by using bootstrap sampling and then aggregates their predictions by taking
the majority vote or the average probability.
Bootstrap sampling is a technique that creates different subsets of the training data by sam-
pling with replacement, which means that some samples may appear more than once or not at
all in each subset.
Bagging reduces the variance of the base learners, which means that it makes them less sen-
sitive to small changes in the data. It also reduces the risk of overfitting, which means that it
makes them more adaptable to unseen data.
An example of bagging is random forest, which is an ensemble of decision trees that are
trained on different bootstrap samples and different feature subsets.
Boosting
Boosting is a technique that creates multiple base learners by using sequential learning,
which means that each base learner is trained on a modified version of the training data that
125
depends on the performance of the previous base learners. The final prediction is obtained by
taking a weighted vote or a weighted sum of all the base learners.
Boosting increases the accuracy of the base learners, which means that it makes them more
capable of capturing complex patterns in the data. It also reduces the risk of underfitting,
which means that it makes them more expressive and flexible.
An example of boosting is AdaBoost, which is an adaptive boosting algorithm that assigns
higher weights to the samples that are misclassified by the previous base learners and lower
weights to the samples that are correctly classified.
Stacking
Stacking is a technique that creates multiple base learners by using any machine learning al-
gorithm and then trains another machine learning algorithm, called a meta learner or
a blender, on the predictions of the base learners. The final prediction is obtained by applying
the meta learner on the predictions of the base learners.
Stacking improves the diversity of the base learners, which means that it makes them more
complementary and independent from each other. It also improves the robustness of the en-
semble, which means that it makes it less prone to errors and biases.
An example of stacking is stacked generalization, which is a general framework for stacking
that can use any machine learning algorithm as a base learner or a meta learner.
26. AdaBoost
AdaBoost stands for adaptive boosting, which is one of the most popular and influential
boosting algorithms. It was proposed by Yoav Freund and Robert Schapire in 1996.
AdaBoost is based on the idea of creating a strong learner by combining multiple weak learn-
ers, where each weak learner is a binary classifier that has an accuracy slightly better than
random guessing.
AdaBoost works by iteratively adding weak learners to the ensemble, where each weak
learner is trained on a weighted version of the training data, where the weights are updated
based on the errors made by the previous weak learners. The final prediction is obtained by
taking a weighted vote of all the weak learners in the ensemble.
126
The concept and workflow of AdaBoost can be summarized as follows:
o Initialize all the sample weights to be equal and normalized, such that they sum up to
one
o For each iteration:
Train a weak learner on the weighted training data
Calculate the error rate and the weight coefficient of the weak learner
Update the sample weights by increasing the weights of the misclassified
samples and decreasing the weights of the correctly classified samples
Normalize the sample weights to make them sum up to one
o Output the final strong learner as a weighted combination of all the weak learners
Gradient boosting machines (GBM) are a type of boosting algorithm that use gradient de-
scent to optimize the loss function of the ensemble. They were proposed by Jerome Friedman
in 2001.
GBM are based on the idea of creating a strong learner by adding multiple base learners to the
ensemble, where each base learner is a regression tree that fits the negative gradient of the
loss function at each iteration. The final prediction is obtained by taking a weighted sum of all
the base learners in the ensemble.
A regression tree is a type of decision tree that predicts a continuous value instead of a class
label. It splits the feature space into regions based on a series of if-then rules that are learned
from the training data. The prediction of a regression tree is the average value of the samples
in each region.
The negative gradient of the loss function is the direction that points to the steepest descent of
the loss function. It indicates how much and in what direction the prediction should be ad-
justed to reduce the loss.
The concept and workflow of GBM can be summarized as follows:
o Initialize the prediction to be a constant value that minimizes the loss function
o For each iteration:
Calculate the negative gradient of the loss function for each sample
Train a regression tree on the negative gradient as the target value
Calculate the weight coefficient of the regression tree by using line search or
shrinkage
Update the prediction by adding the weighted regression tree
o Output the final strong learner as the sum of all the weighted regression trees
Calculate the negative gradient of the loss function for each sample: ri = -(yi -
Fm-1(xi))
Train a regression tree hm(x) on (xi, ri) as the target value
Calculate the weight coefficient of hm(x) by using line search or shrinkage:
am = argminc(sum((ri - c * hm(xi))^2 / 2)) or am = v * argminc(sum((ri - c *
hm(xi))^2 / 2)), where v is a small positive constant
Update the prediction by adding the weighted regression tree: Fm(x) = Fm-
1(x) + am * hm(x)
o Output the final strong learner: F(x) = F0(x) + sum(am * hm(x)) for m = 1, 2, …, M
XGBoost
XGBoost stands for extreme gradient boosting, which is an efficient and scalable implemen-
tation of GBM.
It was developed by Tianqi Chen and his team at the University of Washington in 2014.
XGBoost improves the performance and speed of GBM by using several techniques, such as:
o Regularization: adding a penalty term to the loss function to prevent overfitting and
improve generalization
o Sparsity-awareness: handling missing values and sparse features efficiently and au-
tomatically
o Weighted quantile sketch: using a novel data structure to find the optimal split
points for the regression trees
o Out-of-core computation: using external memory to handle large-scale data that
cannot fit into the main memory
o Parallel and distributed learning: using multiple CPU cores or machines to speed
up the training process
o Cache optimization: using hardware-aware design to optimize the memory access
and reduce the computation cost
XGBoost has become one of the most popular and widely used machine learning algorithms,
especially for Kaggle competitions, where it has won several awards and prizes. It has also
129
been adopted by many companies and organizations for various applications, such as web
search, recommendation systems, fraud detection, etc.
Reinforcement learning is a type of machine learning that deals with learning from ac-
tions and rewards, rather than from features and labels. It is inspired by the way humans and
animals learn from trial and error and feedback.
Reinforcement learning is based on the idea of creating an agent that can interact with an en-
vironment and learn an optimal policy that maximizes a long-term reward or value. The
agent does not have any prior knowledge or supervision about the environment or the task,
but rather learns from its own experience and exploration.
The concept and workflow of reinforcement learning can be summarized as follows:
o Define the agent, the environment, the state space, the action space, and the reward
function
o Initialize the agent’s policy or value function randomly or heuristically
o For each episode or iteration:
Observe the current state of the environment
Select an action based on the current policy or value function, possibly with
some exploration
Execute the action and observe the next state and the reward
Update the policy or value function based on the observed transition and re-
ward
o Output the final optimal policy or value function
One of the classic examples of reinforcement learning is the game of chess, where the goal is
to create an agent that can play chess against human or computer opponents.
The agent is the chess player, who can control the pieces on the board.
The environment is the chess board, which has 64 squares and 32 pieces of two colors: white
and black.
The state space is the set of all possible configurations of the board, which is very large (esti-
mated to be 10^120).
The action space is the set of all possible moves that can be made by the agent at each state,
which depends on the rules of chess and the position of the pieces.
The reward function is a scalar value that indicates how desirable a state or an action is for the
agent. For example, a simple reward function can be +1 for winning, -1 for losing, and 0 for
130
drawing or continuing. A more sophisticated reward function can take into account other fac-
tors, such as material advantage, positional advantage, checkmate threat, etc.
Q-learning
Q-learning is a type of reinforcement learning, which is a branch of machine learning that
deals with learning from actions and rewards. Q-learning is a model-free, off-policy algo-
rithm that learns the value of an action in a given state, without requiring a model of the envi-
ronment or the transition probabilities. Q-learning can find an optimal policy for any finite
Markov decision process, given enough exploration time and a partly-random policy1
The basic idea of Q-learning is to create a table, called a Q-table, that stores the expected fu-
ture reward, or Q-value, for each state-action pair. The Q-value represents how good it is to
take a certain action in a certain state. The Q-learning algorithm updates the Q-table itera-
tively, based on the observed rewards and the next states. The final Q-table can be used to se-
lect the best action in each state, by choosing the action with the highest Q-value.
Q-learning follows these steps:
The Bellman equation is a recursive formula that relates the Q-value of a state-action pair to
the Q-values of the next state-action pairs. It is given by:
Q(state, action) = reward + gamma * max Q(next_state, all_actions)
where gamma is a discount factor that controls how much the future rewards are valued.
Q-learning is a simple and powerful algorithm that can learn to solve complex problems with
stochastic and dynamic environments. However, it also has some limitations, such as:
It requires a large amount of memory and computation to store and update the Q-table for
large state and action spaces
It can be slow to converge or even diverge in some cases
It can be sensitive to the choice of parameters, such as learning rate, discount factor, and ex-
ploration strategy
To overcome these limitations, some extensions and variations of Q-learning have been pro-
posed, such as deep Q-learning, which uses a neural network to approximate the Q-function;
double Q-learning, which reduces the overestimation bias of Q-learning; and dueling Q-learn-
ing, which separates the estimation of state value and action advantage.
131
Artificial Intelligence (AI) is the study of how to create machines and systems that can per-
form tasks that normally require human intelligence, such as reasoning, learning, perception,
decision making, and problem solving.
AI is also the field that aims to understand the nature of intelligence and its underlying mech-
anisms.
AI is a broad and interdisciplinary field that draws from computer science, mathematics, psy-
chology, linguistics, philosophy, neuroscience, and other disciplines.
1997, Google’s AlphaGo beating the world Go champion in 2016, OpenAI’s GPT-3
generating natural language texts in 2020).
There are different ways to classify AI systems based on their capabilities and goals:
o Narrow AI: Also known as weak AI or applied AI. It refers to AI systems that are de-
signed to perform specific tasks or domains that require a limited amount of intelli-
gence, such as face recognition, speech recognition, spam filtering, self-driving cars,
etc. Most of the current AI systems fall into this category.
o General AI: Also known as strong AI or artificial general intelligence (AGI). It refers
to AI systems that can achieve human-level intelligence or beyond across a wide
range of tasks or domains that require general intelligence, such as common sense
reasoning, natural language understanding, creativity, planning, etc. This is the ulti-
mate goal of many AI researchers, but it is still far from being realized.
o
134
2. Intelligent Agents
2.1 Agents and Rationality: What is an Agent, Rational Behavior
An agent is anything that can perceive its environment through sensors and act upon it
through actuators.
An agent can be a physical entity (such as a robot, a car, or a human) or a software entity
(such as a chatbot, a game, or a program).
An agent can be simple (such as a thermostat or a calculator) or complex (such as a chess
player or a self-driving car).
An agent can be autonomous (such as a Mars rover or a Roomba) or interactive (such as a
personal assistant or a social media platform).
Rationality is the quality of doing the right thing, given what the agent knows and what the
agent wants.
Rationality depends on four factors: the agent’s performance measure, the agent’s prior
knowledge, the agent’s perceptual capabilities, and the agent’s actions.
A rational agent is an agent that always acts to achieve the best outcome or, when there is un-
certainty, the best expected outcome, according to its performance measure.
The interaction between an agent and its environment can be characterized by several proper-
ties:
o Fully observable vs. partially observable: An environment is fully observable if the
agent can access the complete state of the environment at each point in time. An envi-
ronment is partially observable if the agent can access only some aspects of the envi-
ronment or if its sensors are noisy or inaccurate.
o Deterministic vs. stochastic: An environment is deterministic if the next state of the
environment is completely determined by the current state and the action executed by
the agent. An environment is stochastic if there is some uncertainty about the next
state of the environment.
o Episodic vs. sequential: An environment is episodic if the agent’s experience is di-
vided into atomic episodes, where each episode consists of one perception and one
action, and the outcome of each action does not depend on previous actions. An envi-
ronment is sequential if the agent’s current decision affects all future decisions.
o Static vs. dynamic: An environment is static if the environment does not change while
the agent is deliberating. An environment is dynamic if the environment can change
while the agent is deliberating.
135
o Discrete vs. continuous: An environment is discrete if there are a finite number of dis-
tinct and clearly defined states, actions, and percepts. An environment is continuous if
there are an infinite number of possible states, actions, and percepts.
o Single-agent vs. multi-agent: An environment is single-agent if the agent is the only
entity that affects the environment. An environment is multi-agent if there are other
agents that affect the environment.
2.4 Types of Agents: Simple Reflex Agents, Model-Based Agents, Goal-Based Agents,
There are different types of agents that differ in their decision-making methods:
o Simple reflex agents: These are agents that act based on their current percept
only, without any memory or knowledge of the past or future. They use condi-
tion-action rules to map percepts to actions. For example, a simple reflex agent
for driving a car might have rules like:
If car-in-front-is-braking then initiate-braking
If light-is-green then accelerate
If light-is-red then stop
Simple reflex agents are easy to implement but they are limited by their lack
of memory and knowledge. They can only handle fully observable environ-
ments and they cannot plan ahead or learn from their experience.
o Model-based agents: These are agents that maintain an internal state that rep-
resents some aspects of the environment that are not directly observable. They
use a model of how the world works to update their state based on their per-
cepts and actions. They also use condition-action rules to map states to ac-
tions. For example, a model-based agent for driving a car might have rules
like:
Model-based agents are more flexible and robust than simple reflex agents, as they can han-
dle partially observable environments and adapt to changes. However, they still cannot plan
ahead or pursue long-term goals.
Goal-based agents: These are agents that have some explicit goals that they want to
achieve, and they act based on their current state and their goal state. They use a
search or planning algorithm to find a sequence of actions that leads from the current
state to the goal state. For example, a goal-based agent for driving a car might have a
goal like:
Reach destination X in the shortest time possible
Goal-based agents are more intelligent and rational than model-based agents, as they can plan
ahead and optimize their actions. However, they still cannot handle uncertain or complex en-
vironments or trade-off between conflicting goals.
137
Utility-based agents: These are agents that have some preferences or values that they
want to maximize, and they act based on their current state and their utility function.
They use a decision theory or reinforcement learning algorithm to find the best action
that maximizes their expected utility. For example, a utility-based agent for driving a
car might have a utility function like:
Utility = - (travel time) - (fuel cost) - (traffic violations) + (comfort) + (safety)
Utility-based agents are more flexible and realistic than goal-based agents, as they can handle
uncertain or complex environments and trade-off between conflicting goals. However, they
still cannot learn from their experience or improve their performance over time.
Learning agents: These are agents that can learn from their experience and improve
their performance over time. They have four components: a learning element, a per-
formance element, a critic, and a problem generator. The learning element uses feed-
back from the critic to improve the agent’s knowledge or behavior. The performance
element uses the agent’s knowledge or behavior to select actions. The critic evaluates
the agent’s actions and provides feedback. The problem generator suggests actions
that lead to new and informative experiences. For example, a learning agent for driv-
ing a car might use the following components:
138
Learning element: A neural network that learns to map states to actions based on re-
wards and penalties
Performance element: A controller that executes the actions suggested by the neural
network
Critic: A sensor that measures the distance to the destination, the fuel level, the traffic
rules, the comfort level, and the safety level
Problem generator: A randomizer that occasionally chooses different routes or speeds
Learning agents are the most advanced and adaptive type of agents, as they can learn from
their experience and improve their performance over time. They can also discover new
knowledge or behavior that was not programmed by the designer.
3. Problem Solving
3.1 Problems in AI: Well-defined Problems and Goal States
A problem is a situation that requires an agent to find a solution or a course of action that sat-
isfies some criteria or constraints.
A problem can be formalized as a tuple (S, A, T, G, C), where:
o S is the set of possible states of the world
o A is the set of possible actions that the agent can perform
o T is the transition function that maps a state and an action to a new state
o G is the goal test function that determines whether a state is a goal state or not
o C is the cost function that assigns a numeric value to each action or path
A well-defined problem is a problem that has a clear and complete specification of all the
components of the tuple. A well-defined problem has the following properties:
o The initial state is known and unique
o The goal state or states are known and well-defined
o The actions and their effects are known and deterministic
o The cost of each action or path is known and consistent
A well-defined problem can be solved by finding a sequence of actions that leads from the in-
itial state to a goal state, while minimizing the total cost.
Examples of well-defined problems in AI are:
o The 8-puzzle: The state is the configuration of eight tiles and a blank space on a 3x3
grid. The actions are moving the blank space up, down, left, or right. The transition
function swaps the blank space with the adjacent tile. The goal state is the configura-
tion where the tiles are ordered from 1 to 8. The cost of each action is 1.
o The Tower of Hanoi: The state is the configuration of n disks of different sizes on
three pegs. The actions are moving a disk from one peg to another. The transition
function moves the disk to the top of another peg. The goal state is the configuration
where all the disks are on the rightmost peg, with the largest disk at the bottom and
the smallest disk at the top. The cost of each action is 1.
o The Traveling Salesperson Problem: The state is the location of a salesperson who
has to visit n cities. The actions are traveling from one city to another. The transition
function changes the location of the salesperson to another city. The goal state is the
location where the salesperson has visited all the cities exactly once and returned to
the starting city. The cost of each action is the distance between the two cities.
A search space is a representation of all the possible states and actions that are relevant to
solving a problem.
A search space can be visualized as a graph, where:
o The nodes are the states
139
Problems can have different characteristics that affect the difficulty and the methods of solv-
ing them. Some of these characteristics are:
o Deterministic vs. stochastic: A problem is deterministic if the outcome of each action
is fully determined by the current state and the action. A problem is stochastic if there
is some uncertainty or randomness in the outcome of each action.
140
o Single agent vs. multi-agent: A problem is single agent if there is only one agent that
affects the environment. A problem is multi-agent if there are other agents that affect
the environment, either cooperatively or competitively.
Deterministic problems can be solved by using search algorithms that find a sequence of ac-
tions that leads to a goal state with certainty. Stochastic problems can be solved by using deci-
sion-making algorithms that find a policy or a strategy that maximizes the expected utility or
reward over time.
Single agent problems can be solved by assuming that the agent has full control over the envi-
ronment and its actions. Multi-agent problems can be solved by taking into account the ac-
tions and reactions of other agents, either by using game theory or negotiation techniques.
3.5 Issues in Designing Search Programs: Completeness, Optimality, Time Complexity, Space
Complexity
When designing search programs, there are some issues or criteria that need to be considered,
such as:
o Completeness: The ability of a search algorithm to find a solution if one exists.
o Optimality: The ability of a search algorithm to find the best solution among all possi-
ble solutions, according to some cost function.
o Time complexity: The amount of time or number of steps required by a search algo-
rithm to find a solution.
o Space complexity: The amount of memory or number of nodes stored by a search al-
gorithm to find a solution.
These issues or criteria are often interrelated and trade-off with each other. For example, a
search algorithm that is complete and optimal may have high time and space complexity,
while a search algorithm that has low time and space complexity may be incomplete or
suboptimal.
These issues or criteria also depend on some factors, such as:
o The size and shape of the search space
o The branching factor and depth of the search tree
o The presence or absence of cycles in the search graph
o The availability or quality of heuristics or domain knowledge
A problem-solving agent is an agent that can formulate and solve problems by finding a se-
quence of actions that leads from an initial state to a goal state.
A search algorithm is a method for finding a solution or a path to a goal state in a search
space.
Some common terminology used in search algorithms are:
o State: A representation of a situation or a configuration of the world
o Action: A transformation or a transition from one state to another
o Initial state: The state where the agent starts its search
o Goal state: The state where the agent wants to reach or satisfy some criteria
o Path: A sequence of states and actions that leads from the initial state to a goal state
141
4.2 Properties of Search Algorithms: Completeness, Optimality, Time and Space Complexity
When evaluating search algorithms, there are some properties or criteria that need to be con-
sidered, such as:
o Completeness: The ability of a search algorithm to find a solution if one exists
o Optimality: The ability of a search algorithm to find the best solution among all possi-
ble solutions, according to some cost function
o Time complexity: The amount of time or number of steps required by a search algo-
rithm to find a solution
o Space complexity: The amount of memory or number of nodes stored by a search al-
gorithm to find a solution
These properties or criteria are often interrelated and trade-off with each other. For example, a
search algorithm that is complete and optimal may have high time and space complexity,
while a search algorithm that has low time and space complexity may be incomplete or
suboptimal.
These properties or criteria also depend on some factors, such as:
o The size and shape of the search space
o The branching factor and depth of the search tree
o The presence or absence of cycles in the search graph
o The availability or quality of heuristics or domain knowledge
4.3 Types of Search Algorithms: Uninformed (Blind) Search and Informed (Heuristic) Search
There are two main types of search algorithms based on the amount and type of information
they use, such as:
o Uninformed search: Search algorithms that do not use any domain-specific
knowledge or heuristics, but only the problem definition. They are also called blind
search, as they explore the search space blindly without any guidance. Examples of
uninformed search algorithms are breadth-first search, depth-first search, uniform-
cost search, etc.
142
o
o Informed search: Search algorithms that use some domain-specific knowledge or heu-
ristics to guide the search process. They are also called heuristic search, as they use
heuristics to estimate the quality or the distance of each node to the goal state. Exam-
ples of informed search algorithms are greedy best-first search, A* search, hill-climb-
ing search, etc.
143
Breadth-first search (BFS) is an uninformed search algorithm that explores the nodes in the
order of their distance from the initial node, i.e., it expands the shallowest nodes first.
BFS uses a queue as its frontier data structure, i.e., it adds new nodes at the end of the queue
and removes nodes from the front of the queue.
BFS is complete, i.e., it can find a solution if one exists.
BFS is optimal if all actions have the same cost, i.e., it can find the lowest-cost solution
among all possible solutions.
BFS has high time and space complexity, i.e., it can take a long time and use a lot of memory
to find a solution. The time and space complexity of BFS are both O(b^d), where b is the
branching factor and d is the depth of the shallowest solution.
144
Depth-first search (DFS) is an uninformed search algorithm that explores the nodes in the or-
der of their depth from the initial node, i.e., it expands the deepest nodes first.
DFS uses a stack as its frontier data structure, i.e., it adds new nodes at the top of the stack
and removes nodes from the top of the stack.
DFS is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is infinite or contains cycles.
DFS is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if the search space is not uniform.
DFS has low space complexity, i.e., it uses a small amount of memory to find a solution. The
space complexity of DFS is O(bm), where b is the branching factor and m is the maximum
depth of the search space. However, DFS has high time complexity, i.e., it can take a long
time to find a solution. The time complexity of DFS is O(b^m), where b is the branching fac-
tor and m is the maximum depth of the search space.
Depth-limited search (DLS) is a variation of DFS that limits the depth of the search to a pre-
defined value l, i.e., it expands nodes only up to depth l and ignores nodes beyond that depth.
DLS uses a stack as its frontier data structure, like DFS.
DLS is complete if l is greater than or equal to d, where d is the depth of the shallowest solu-
tion, i.e., it can find a solution if one exists within the depth limit. Otherwise, DLS is incom-
plete, i.e., it may not find a solution even if one exists beyond the depth limit.
DLS is not optimal, like DFS, i.e., it may not find the lowest-cost solution among all possible
solutions.
DLS has low space complexity, like DFS, i.e., it uses a small amount of memory to find a so-
lution. The space complexity of DLS is O(bl), where b is the branching factor and l is the
depth limit. However, DLS has high time complexity, like DFS, i.e., it can take a long time to
find a solution. The time complexity of DLS is O(b^l), where b is the branching factor and l is
the depth limit.
145
Iterative deepening depth-first search (IDDFS) is a combination of BFS and DLS that itera-
tively increases the depth limit from 0 to infinity, i.e., it performs DLS with increasing depth
limits until finding a solution or exhausting all possibilities.
IDDFS uses a stack as its frontier data structure, like DFS and DLS.
IDDFS is complete, like BFS, i.e., it can find a solution if one exists.
IDDFS is optimal if all actions have the same cost, like BFS, i.e., it can find the lowest-cost
solution among all possible solutions.
IDDFS has low space complexity, like DFS and DLS, i.e., it uses a small amount of memory
to find a solution. The space complexity of IDDFS is O(bd), where b is the branching factor
and d is the depth of the shallowest solution. However, IDDFS has high time complexity, like
BFS, i.e., it can take a long time to find a solution. The time complexity of IDDFS is O(b^d),
where b is the branching factor and d is the depth of the shallowest solution.
146
Uniform cost search (UCS) is an uninformed search algorithm that explores the nodes in the
order of their path cost from the initial node, i.e., it expands the lowest-cost nodes first.
UCS uses a priority queue as its frontier data structure, i.e., it adds new nodes according to
their path cost and removes nodes with the lowest path cost.
UCS is complete, i.e., it can find a solution if one exists.
UCS is optimal for any action cost function, i.e., it can find the lowest-cost solution among all
possible solutions.
UCS has high time and space complexity, i.e., it can take a long time and use a lot of memory
to find a solution. The time and space complexity of UCS are both O(b^c*), where b is the
branching factor and c* is the cost of the optimal solution.
BDS is complete if both searches are breadth-first or uniform-cost, i.e., it can find a solution
if one exists.
BDS is optimal if both searches are uniform-cost, i.e., it can find the lowest-cost solution
among all possible solutions.
147
BDS has low time complexity, i.e., it can find a solution faster than unidirectional search. The
time complexity of BDS is O(b^(d/2)), where b is the branching factor and d is the depth of
the shallowest solution.
BDS has high space complexity, i.e., it uses a lot of memory to store the nodes from both
searches. The space complexity of BDS is O(b^(d/2)), where b is the branching factor and d is
the depth of the shallowest solution.
Greedy best-first search (GBFS) is an informed search algorithm that explores the nodes in
the order of their heuristic value, i.e., it expands the node that is closest to the goal state ac-
cording to some heuristic function.
GBFS uses a priority queue as its frontier data structure, i.e., it adds new nodes according to
their heuristic value and removes nodes with the lowest heuristic value.
GBFS is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is infinite or contains cycles.
GBFS is not optimal, i.e., it may not find the lowest-cost solution among all possible solu-
tions, especially if the heuristic function is not consistent or admissible.
GBFS has low time complexity, i.e., it can find a solution faster than uninformed search. The
time complexity of GBFS is O(b^m), where b is the branching factor and m is the maximum
depth of the search space.
GBFS has high space complexity, i.e., it uses a lot of memory to store the nodes in the priority
queue. The space complexity of GBFS is O(b^m), where b is the branching factor and m is
the maximum depth of the search space.
148
A* search (A*) is an informed search algorithm that explores the nodes in the order of their
evaluation function, i.e., it expands the node that has the lowest sum of path cost and heuristic
value.
A* uses a priority queue as its frontier data structure, like GBFS, but it uses a different evalu-
ation function: f(n) = g(n) + h(n), where g(n) is the path cost from the initial node to node n
and h(n) is the heuristic value of node n.
A* is complete, i.e., it can find a solution if one exists.
A* is optimal if the heuristic function is admissible, i.e., it never overestimates the true cost to
reach the goal state from any node. A* is also optimal if the heuristic function is consistent,
i.e., it satisfies the triangle inequality: h(n) <= c(n, n’) + h(n’), where c(n, n’) is the cost of
moving from node n to node n’.
A* has low time complexity, i.e., it can find a solution faster than uninformed search or
GBFS. The time complexity of A* depends on the quality of the heuristic function: the closer
h(n) is to the true cost, the faster A* will find a solution. The worst-case time complexity of
A* is O(b^m), where b is the branching factor and m is the maximum depth of the search
space.
A* has high space complexity, i.e., it uses a lot of memory to store the nodes in the priority
queue. The space complexity of A* is O(b^m), where b is the branching factor and m is the
maximum depth of the search space.
149
Hill climbing algorithm (HCA) is an informed search algorithm that explores the nodes in the
order of their heuristic value, like GBFS, but it only moves to a successor node if it has a
higher heuristic value than the current node, i.e., it always moves uphill.
HCA uses a stack as its frontier data structure, like DFS, but it only adds one node at a time to
the stack, i.e., it always chooses the best successor node.
HCA is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is not smooth or contains local maxima or plateaus.
HCA is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if the heuristic function is not consistent or admissible.
150
HCA has low time and space complexity, i.e., it can find a solution quickly and use a small
amount of memory. The time and space complexity of HCA are both O(bm), where b is the
branching factor and m is the maximum depth of the search space.
A constraint satisfaction problem (CSP) is a special type of problem that can be solved by us-
ing search algorithms. A CSP consists of three components:
o A set of variables: Each variable has a domain of possible values
o A set of constraints: Each constraint specifies some restrictions on the values of some
variables
o A goal test: A function that determines whether an assignment of values to variables
satisfies all the constraints
A CSP can be solved by using different methods, such as:
o Backtracking search: A recursive search algorithm that tries to assign values to varia-
bles one by one, and backtracks if a constraint is violated or no value is available
o Forward checking: An improvement of backtracking search that keeps track of the
remaining values for each variable and prunes them if they are inconsistent with the
current assignment
o Arc consistency: An improvement of forward checking that ensures that every possi-
ble value for a variable has a consistent value for another variable in every constraint
o Heuristics: Techniques that can improve the efficiency and effectiveness of search al-
gorithms, such as variable ordering (choosing which variable to assign next), value
ordering (choosing which value to assign to a variable), and constraint propagation
(reducing the domains of variables based on constraints)
151
Means-ends analysis (MEA) is an informed search algorithm that uses heuristics to reduce the
difference between the current state and the goal state, i.e., it identifies and solves subprob-
lems that are relevant to achieving the goal.
MEA uses a stack as its frontier data structure, like DFS, but it also uses another stack to store
subgoals and operators.
MEA works as follows:
o It compares the current state and the goal state and finds a difference
o It selects an operator that can reduce or eliminate the difference
o It pushes the operator and its preconditions (subgoals) onto the stack
o It pops and applies an operator from the stack if possible
o It repeats until reaching the goal state or no more operators are available
MEA is incomplete, i.e., it may not find a solution even if one exists, especially if there are
dead ends or irrelevant operators.
MEA is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if there are multiple ways to reduce or eliminate a difference.
MEA has low time complexity, i.e., it can find a solution faster than uninformed
search. The time complexity of MEA depends on the quality of the heuristics: the
more relevant and effective they are, the faster MEA will find a solution. The worst-
case time complexity of MEA is O(b^m), where b is the branching factor and m is the
maximum depth of the search space.
MEA has high space complexity, i.e., it uses a lot of memory to store the nodes and
operators in the stacks. The space complexity of MEA is O(b^m), where b is the
branching factor and m is the maximum depth of the search space.
152
153
Adversarial search is a type of search that involves more than one agent, where the agents
have conflicting goals or interests, i.e., they compete or cooperate with each other.
Game playing is a special case of adversarial search, where the agents follow some predefined
rules and try to maximize their payoffs or rewards.
Game playing can be formalized as a tuple (S, A, T, R, T), where:
o S is the set of possible states of the game
o A is the set of possible actions or moves that the agents can make
o T is the transition function that maps a state and an action to a new state
o R is the reward function that assigns a numeric value to each state or action for each
agent
o T is the terminal test function that determines whether a state is a terminal state or
not, i.e., whether the game is over or not
Game playing can have different characteristics, such as:
o Zero-sum vs. non-zero-sum: A game is zero-sum if the sum of the rewards for all the
agents is zero, i.e., one agent’s gain is another agent’s loss. A game is non-zero-sum if
the sum of the rewards for all the agents is not zero, i.e., the agents can have different
or shared interests.
o Deterministic vs. stochastic: A game is deterministic if the outcome of each action is
fully determined by the current state and the action. A game is stochastic if there is
some uncertainty or randomness in the outcome of each action.
o Perfect information vs. imperfect information: A game has perfect information if all
the agents have complete and accurate knowledge of the current state and the past ac-
tions. A game has imperfect information if some aspects of the current state or the
past actions are hidden or unknown to some agents.
o Turn-taking vs. simultaneous: A game is turn-taking if only one agent can act at a
time and the agents alternate their actions. A game is simultaneous if more than one
agent can act at the same time and the agents act independently or interdependently.
Minimax algorithm is a search algorithm that can be used to find the optimal strategy for two-
player zero-sum turn-taking deterministic perfect information games, such as chess, tic-tac-
toe, etc.
Minimax algorithm works as follows:
o It assumes that both players are rational and play optimally, i.e., they try to maximize
their own reward and minimize their opponent’s reward.
o It builds a search tree that represents all the possible states and actions from the cur-
rent state to the terminal states, where each node corresponds to a state and each edge
corresponds to an action.
o It assigns a value to each node based on the reward function, where positive values
favor one player (the maximizer) and negative values favor the other player (the mini-
mizer).
It propagates the values from the leaf nodes (the terminal states) to the root node (the current
state) by applying the min-max rule: at each level of the tree, it chooses the minimum value
among its children if it is a minimizer’s turn, or it chooses the maximum value among its chil-
dren if it is a maximizer’s turn.
154
It selects the best action at the root node, i.e., the action that leads to the child node with the
highest value for the maximizer or the lowest value for the minimizer.
Minimax algorithm is complete, i.e., it can find a solution if one exists.
Minimax algorithm is optimal, i.e., it can find the best solution among all possible solutions,
assuming that both players play optimally.
Minimax algorithm has high time and space complexity, i.e., it can take a long time and use a
lot of memory to find a solution. The time and space complexity of minimax algorithm are
both O(b^m), where b is the branching factor and m is the maximum depth of the search tree.
Alpha-beta pruning is an optimization technique that can be used to improve the efficiency of
minimax algorithm by pruning or eliminating branches of the search tree that are not relevant
to the final decision, i.e., branches that do not affect the value of the root node.
Alpha-beta pruning works as follows:
o It maintains two values for each node: alpha and beta. Alpha is the best value that the
maximizer can guarantee at that node or above. Beta is the best value that the mini-
mizer can guarantee at that node or below.
o It initializes alpha to negative infinity and beta to positive infinity at the root node.
o It updates alpha and beta as it traverses the search tree in a depth-first manner, using
the min-max rule: at each level of the tree, it sets alpha to the maximum value among
its children if it is a maximizer’s turn, or it sets beta to the minimum value among its
children if it is a minimizer’s turn.
o It prunes a branch when alpha is greater than or equal to beta, i.e., when there is no
need to explore further nodes in that branch because they cannot improve the value of
the root node.
o It returns the best action at the root node, like minimax algorithm.
155
Alpha-beta pruning does not affect the completeness or optimality of minimax algorithm, i.e.,
it can find a solution if one exists and it can find the best solution among all possible solu-
tions, assuming that both players play optimally.
Alpha-beta pruning reduces the time and space complexity of minimax algorithm, i.e.,
it can find a solution faster and use less memory than minimax algorithm. The time
and space complexity of alpha-beta pruning depend on the order of exploration of the
nodes in the search tree: the best case is O(b^(m/2)), where b is the branching factor
and m is the maximum depth of the search tree, and the worst case is O(b^m), like
minimax algorithm. The space complexity of alpha-beta pruning is O(bm), like mini-
max algorithm.
8. Knowledge Representation
8.1 Representations and Mappings: Encoding Knowledge for AI Systems
Knowledge representation is the process of encoding knowledge for AI systems, i.e., trans-
forming human knowledge into a form that can be manipulated and used by machines.
Knowledge representation involves two aspects: representations and mappings. Representa-
tions are the data structures or languages that are used to store and express knowledge. Map-
pings are the functions or algorithms that are used to create and manipulate knowledge.
Knowledge representation has three main objectives: to capture the meaning and structure of
human knowledge, to enable efficient and effective reasoning and inference, and to facilitate
communication and interaction between humans and machines.
Knowledge representation is a crucial and challenging task for AI systems, as it requires bal-
ancing trade-offs between expressiveness, efficiency, and uncertainty.
156
There are different approaches or paradigms to knowledge representation, each with its own
advantages and disadvantages. Some of the common approaches are:
o Logical: This approach uses formal logic systems, such as propositional logic, predi-
cate logic, modal logic, etc., to represent knowledge as a set of symbols and rules that
follow a well-defined syntax and semantics. Logical representations are precise, con-
sistent, and deductive, but they can also be complex, rigid, and incomplete.
o Semantic networks: This approach uses graphs or networks to represent knowledge as
a set of nodes and links that capture the concepts and relations in a domain. Semantic
networks are intuitive, flexible, and associative, but they can also be ambiguous, in-
consistent, and inefficient.
o Frames: This approach uses hierarchical structures to represent knowledge as a set of
frames or schemas that capture the attributes and values of entities or situations in a
domain. Frames are modular, structured, and inheritable, but they can also be redun-
dant, inflexible, and complex.
o Scripts: This approach uses sequences or scenarios to represent knowledge as a set of
scripts or stories that capture the events and actions in a domain. Scripts are natural,
dynamic, and contextual, but they can also be incomplete, stereotypical, and rigid.
o Ontologies: This approach uses formal vocabularies to represent knowledge as a set
of ontologies or taxonomies that capture the categories and subcategories in a do-
main. Ontologies are standardized, reusable, and sharable, but they can also be large,
complex, and evolving.
157
Uncertainty
There are some issues or challenges that need to be addressed when designing or choosing a
knowledge representation system, such as:
o Expressiveness: The ability of a representation system to capture the meaning
and structure of human knowledge in a domain. Expressiveness depends on
factors such as the richness of the vocabulary, the complexity of the syntax,
the clarity of the semantics,
Uncertainty: The ability of a representation system to handle the incompleteness, in-
consistency, or ambiguity of human knowledge in a domain. Uncertainty depends on
factors such as the sources of uncertainty, the types of uncertainty, the methods of un-
certainty representation, and the techniques of uncertainty reasoning.
o To bake a cake, you need to mix the ingredients, pour the batter into a pan, and bake
it in an oven
o To solve a math problem, you need to apply some formulas, operations, and methods
Declarative knowledge can be represented by using rules, facts, statements, or asser-
tions that describe the features or characteristics of a domain. For example:
o HasWheels(bike, 2)
o MadeOf(cake, [flour, eggs, sugar, butter])
o Equation(problem1, x + 2 = 5)
Logic programming is a paradigm of programming that uses logic as the basis for represent-
ing and manipulating knowledge.
Logic programming uses a logic system, such as predicate logic, to express knowledge as a
set of rules and facts that follow a well-defined syntax and semantics.
Logic programming uses a logic interpreter or engine to execute queries or goals that ask for
some information or solution based on the given knowledge.
Logic programming uses a logic inference or resolution method to derive new facts or an-
swers from the existing rules and facts by applying some logical rules or principles.
Prolog is one of the most popular and widely used logic programming languages. Prolog
stands for PROgramming in LOGic.
Prolog has the following features:
o It uses predicate logic as its logic system
o It uses Horn clauses as its rule and fact format
o It uses unification as its inference method
o It uses backtracking as its search strategy
Prolog has many applications in various domains, such as:
o Artificial intelligence: Prolog can be used to implement expert systems, natural lan-
guage processing, machine learning, computer vision, etc.
o Database: Prolog can be used to query and manipulate relational data using logical
operations
o Education: Prolog can be used to teach and learn logic, programming, and problem-
solving skills
160
Unification is a method of matching that compares two expressions based on their se-
mantic meaning and logical equivalence, i.e., it checks whether they can be made
identical by substituting some variables with some values. For example:
o Expression 1: IsBlue(x)
o Expression 2: IsBlue(sky)
o Unification: Success, x = sky
o Expression 1: Equals(x + y, 4)
o Expression 2: Equals(2 + 2, 4)
o Unification: Success, x = 2, y = 2
o Expression 1: Equals(x + y, 4)
o Expression 2: Equals(3 + z, 5)
o Unification: Success, x = 3, y = 1, z = 2
Matching can be used for various purposes, such as:
o Rule application: Matching can be used to apply rules to facts by matching the
condition part of the rule with the fact and deriving the action part of the rule
as a new fact.
o Query answering: Matching can be used to answer queries by matching the
query with the facts or rules and returning the values of the variables or the
truth value of the query.
o Substitution: Matching can be used to substitute variables with values by
matching an expression with another expression and replacing the variables
with the corresponding values.
162
Certainty factors are a method of representing and reasoning with uncertain knowledge in
rule-based systems. Certainty factors assign a numeric value between -1 and 1 to each fact or
rule, indicating how certain or confident it is to be true or false.
Certainty factors can be interpreted in different ways, such as:
o Probabilistic: Certainty factors are equivalent to probabilities, i.e., they measure the
likelihood or frequency of an event or proposition.
o Evidential: Certainty factors are based on evidence, i.e., they measure the strength or
quality of the support or opposition for an event or proposition.
o Fuzzy: Certainty factors are based on vagueness, i.e., they measure the degree or ex-
tent of membership or satisfaction for an event or proposition.
Certainty factors can be calculated using different methods, such as:
163
o Heuristic: Certainty factors are assigned by experts or users based on their intuition or
experience, using some rules of thumb or guidelines.
o Empirical: Certainty factors are derived from data or statistics, using some formulas
or algorithms.
o Learning: Certainty factors are learned from observations or feedback, using some
techniques or models.
Certainty factors can be used for various purposes, such as:
o Rule application: Certainty factors can be used to apply rules to facts by combining
the certainty factors of the condition and the action parts of the rule, using some oper-
ators or functions.
o Query answering: Certainty factors can be used to answer queries by aggregating the
certainty factors of the facts or rules that are relevant to the query, using some opera-
tors or functions.
o Decision making: Certainty factors can be used to make decisions under uncertainty,
by comparing the certainty factors of the alternatives or outcomes and choosing the
best one.
Bayesian networks are a method of representing and reasoning with uncertain knowledge
using probability and graph theory. Bayesian networks are also known as probabilistic graph-
ical models or belief networks.
Bayesian networks consist of two components: a directed acyclic graph (DAG) and a condi-
tional probability table (CPT). The DAG represents the variables and their dependencies in a
domain. The CPT represents the probabilities of each variable given its parents in the DAG.
Bayesian networks have the following features:
o They capture the joint probability distribution of all the variables in a domain, i.e.,
they measure the likelihood of any combination of values for the variables.
o They encode the conditional independence assumptions among the variables, i.e.,
they specify which variables are independent of each other given some other varia-
bles.
o They allow for efficient and effective inference and learning, i.e., they enable compu-
ting the posterior probabilities of some variables given some evidence or updating
the parameters of the network given some data.
Bayesian networks can be used for various purposes, such as:
o Diagnosis: Bayesian networks can be used to diagnose the causes or effects of some
symptoms or observations, by computing the probabilities of some hypotheses given
some evidence.
164
Dempster-Shafer theory has the following features:
o It captures the uncertainty and ignorance of the knowledge in a domain, i.e., it
measures how much information or lack of information is available for each proposi-
tion or hypothesis.
o It encodes the evidential support or opposition for each proposition or hypothesis, i.e.,
it specifies how much evidence or counter-evidence is provided by different sources
or agents.
165
o It allows for combining and updating uncertain information from multiple sources or
agents, i.e., it enables computing the combined mass and belief functions using some
operators or methods.
Dempster-Shafer theory can be used for various purposes, such as:
o Inference: Dempster-Shafer theory can be used to infer the degree of belief or confi-
dence in some propositions or hypotheses given some evidence or information.
o Learning: Dempster-Shafer theory can be used to learn the mass and belief functions
from data or observations, using some techniques or models.
o Decision making: Dempster-Shafer theory can be used to make optimal decisions un-
der uncertainty, by maximizing the expected utility or reward.
Fuzzy logic is a method of representing and reasoning with uncertain knowledge using fuzzy
sets and fuzzy rules. Fuzzy logic is also known as fuzzy set theory or fuzzy reasoning.
Fuzzy logic consists of two components: fuzzy sets and fuzzy rules. Fuzzy sets are sets that
have fuzzy boundaries or membership functions, i.e., they allow partial or gradual member-
ship for each element. Fuzzy rules are rules that have fuzzy antecedents or consequents, i.e.,
they allow partial or gradual truth values for each condition or action.
Fuzzy logic has the following features:
o It captures the vagueness and ambiguity of the knowledge in a domain, i.e., it
measures how vague or ambiguous each concept or relation is.
o It encodes the linguistic expressions and modifiers for each concept or relation, i.e., it
specifies how natural language terms and phrases can be translated into fuzzy sets and
rules.
o It allows for approximate and flexible reasoning and inference, i.e., it enables compu-
ting the fuzzy truth values and fuzzy outputs using some operators or functions.
Fuzzy logic can be used for various purposes, such as:
o Classification: Fuzzy logic can be used to classify objects or situations into fuzzy cat-
egories or classes, by computing the degree of membership for each category or class.
o Control: Fuzzy logic can be used to control systems or processes that have uncertain
inputs or outputs, by computing the fuzzy actions or commands based on fuzzy rules.
o Decision making: Fuzzy logic can be used to make optimal decisions under uncer-
tainty, by computing the fuzzy utilities or rewards of each alternative or action.
o
166
12. Learning
12.1 Overview of Different Forms of Learning: Supervised, Unsupervised, Reinforcement
Learning
Learning is the process of acquiring and improving knowledge or skills from data or experi-
ence.
Learning can be classified into different forms based on the type and amount of feedback or
guidance available, such as:
o Supervised learning: Learning from labeled data, i.e., data that has the correct or de-
sired output or answer for each input or example. Supervised learning aims to learn a
function or a model that can map the input to the output, and generalize to new or un-
seen data. Examples of supervised learning tasks are classification, regression, etc.
o Unsupervised learning: Learning from unlabeled data, i.e., data that has no output or
answer for each input or example. Unsupervised learning aims to learn the structure
or the distribution of the data, and discover hidden patterns or features. Examples of
unsupervised learning tasks are clustering, dimensionality reduction, etc.
o Reinforcement learning: Learning from trial and error, i.e., learning by interacting
with an environment and receiving rewards or penalties for each action or behavior.
Reinforcement learning aims to learn a policy or a strategy that can maximize the cu-
mulative reward or minimize the cumulative cost over time. Examples of reinforce-
ment learning tasks are game playing, robot control, etc.
A decision tree is a graphical representation of a function or a model that can be used for clas-
sification or regression tasks. A decision tree consists of nodes and branches that form a tree-
like structure. The nodes represent the features or attributes of the data, and the branches rep-
resent the values or ranges of the features or attributes. The leaf nodes represent the output or
the class of the data.
A decision tree can be learned from data by using different algorithms, such as ID3, C4.5,
CART, etc. The general steps of learning a decision tree are:
o Start with the entire data set as the root node
o Choose a feature or an attribute that best splits the data into subsets based on some
criterion or measure, such as information gain, gini index, etc.
o Create a branch for each value or range of the feature or attribute, and assign the cor-
responding subset of data to each branch
o Repeat the process recursively for each branch until reaching a stopping condition,
such as all data in a branch have the same output or class, there are no more features
or attributes to split on, etc.
o Assign the output or class to each leaf node based on the majority vote or the average
value of the data in that node
A decision tree can be used for various purposes, such as:
o Classification: A decision tree can be used to classify new or unseen data by follow-
ing the branches from the root node to a leaf node based on the values or ranges of
the features or attributes of the data, and returning the output or class of that leaf node
167
Regression: A decision tree can be used to predict the output or value of new or unseen data
by following the branches from the root node to a leaf node based on the values or ranges of
the features or attributes of the data, and returning the output or value of that leaf node
o Explanation: A decision tree can be used to explain the reasoning or logic behind a
classification or regression result, by showing the path or sequence of decisions that
led to that result
o Visualization: A decision tree can be used to visualize the structure or distribution of
the data, by showing the features or attributes and their values or ranges that are rele-
vant or important for the output or class
12.3 Neural Networks: Basics of Artificial Neural Networks and Their Applications in Learn-
ing
A neural network is a computational model that is inspired by the structure and function of
biological neural networks, such as the brain. A neural network consists of a large number of
interconnected units or nodes called neurons, that can process and transmit information.
A neural network can be represented by a graph or a matrix that shows the neurons and their
connections or weights. The neurons are organized into layers, such as input layer, hidden
layer, and output layer. The connections or weights are the values that determine how much
influence one neuron has on another.
A neural network can be learned from data by using different algorithms, such as backpropa-
gation, gradient descent, etc. The general steps of learning a neural network are:
o Initialize the weights randomly or with some heuristic
o Feed the input data to the input layer and propagate it forward through the hidden
layer(s) to the output layer, using some activation function or transfer function
o Compare the output with the desired output and calculate the error or loss function
o Adjust the weights backward from the output layer to the input layer, using some
learning rule or update rule
o Repeat the process until reaching a stopping condition, such as convergence, mini-
mum error, maximum iteration, etc.
A neural network can be used for various purposes, such as:
o Classification: A neural network can be used to classify data into categories or clas-
ses, by mapping the input to the output and assigning a label based on some threshold
or criterion
o Regression: A neural network can be used to predict the output or value of data, by
mapping the input to the output and returning a numeric value
168
o Clustering: A neural network can be used to group data into clusters or segments, by
learning the features or patterns that distinguish different groups of data
o Association: A neural network can be used to discover associations or correlations
among data, by learning the rules or relationships that link different items or variables