0% found this document useful (0 votes)
30 views

Ai Notes2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Ai Notes2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

1

Contents
UNIT-I: DATA COMMUNICATION COMPONENTS.................................................................................7
1. REPRESENTATION OF DATA: ANALOG VS. DIGITAL SIGNALS ....................................................................................... 7
1.1 Data Communication and Signals ....................................................................................................... 7
1.2 Analog Signals: Characteristics and Examples ..................................................................................... 7
1.3 Digital Signals: Characteristics and Examples ...................................................................................... 8
1.4 Analog-to-Digital Conversion .............................................................................................................. 8
1.5 Digital-to-Analog Conversion .............................................................................................................. 9
2. DATA ENCODING ........................................................................................................................................... 9
2.1 Importance of Data Encoding.............................................................................................................. 9
2.2 ASCII (American Standard Code for Information Interchange) .............................................................. 9

.............................................................................................................................................................. 10
2.3 Unicode and Multibyte Character Encoding ....................................................................................... 10
3. DATA FLOW NETWORKS ................................................................................................................................ 11
3.1 Simplex Communication ................................................................................................................... 11
3.2 Half-duplex Communication .............................................................................................................. 11
3.3 Full-duplex Communication............................................................................................................... 12
3.4 Comparison and Use Cases ............................................................................................................... 12
4. NETWORKS ................................................................................................................................................ 13
4.1 LAN (Local Area Network) ................................................................................................................. 13
4.2 MAN (Metropolitan Area Network) ................................................................................................... 13
4.3 WAN (Wide Area Network) ............................................................................................................... 14
4. CONNECTION TOPOLOGY ............................................................................................................................... 15
4.1 Bus Topology .................................................................................................................................... 15
4.2 Star Topology ................................................................................................................................... 15
4.3 Ring Topology ................................................................................................................................... 16
4.4 Mesh Topology ................................................................................................................................. 16
4.5 Hybrid Topology ............................................................................................................................... 17
5. PROTOCOLS AND STANDARDS ......................................................................................................................... 18
5.1 Importance of Protocols in Data Communication ............................................................................... 18
5.2 Overview of TCP/IP Protocol Suite ..................................................................................................... 18
5.3 Introduction to Ethernet and IEEE 802.3 ............................................................................................ 18
5.4 Other Networking Protocols.............................................................................................................. 19
6. OSI MODEL ............................................................................................................................................ 20
6.1 Seven Layers of the OSI Model and Their Functions ........................................................................... 20
6.2 Explanation of each layer: Physical, Data Link, Network, Transport, Session, Presentation, and
Application............................................................................................................................................. 20
6.3 Encapsulation and De-encapsulation................................................................................................. 23
7. TRANSMISSION MEDIA ............................................................................................................................ 24
7.1 Twisted Pair Cable: UTP vs. STP, Categories ....................................................................................... 24
2

7.2 Coaxial Cable: Thicknet vs. Thinnet, Uses .......................................................................................... 25


7.3 Fiber-optic Cable: Advantages, Types (Single-mode, Multi-mode) ...................................................... 26
7.4 Wireless Transmission: Radio Waves, Microwaves, Infrared, Bluetooth .............................................. 27
8. LAN (LOCAL AREA NETWORK) ............................................................................................................... 27
8.1 Definition and Scope of LAN .............................................................................................................. 27
8.2 Components of a LAN: Computers, Switches, Routers, Access Points, etc. ........................................... 28
8.4 BENEFITS AND LIMITATIONS OF LANS .................................................................................................... 29
Benefits.................................................................................................................................................. 29
Limitations ............................................................................................................................................. 29
9. WIRED LAN ........................................................................................................................................... 30
9.1 Introduction to Ethernet LAN ............................................................................................................ 30
9.2 Variations of Ethernet: Fast Ethernet (100 Mbps), Gigabit Ethernet (1 Gbps), 10 Gigabit Ethernet (10
Gbps) ..................................................................................................................................................... 30
9.3 Ethernet Frame Format..................................................................................................................... 31
10. WIRELESS LAN..................................................................................................................................... 32
10.1 IEEE 802.11 Standards for Wi-Fi ...................................................................................................... 32
10.2 Wi-Fi Technologies: 802.11a/b/g/n/ac/ax ....................................................................................... 32
Advantages ............................................................................................................................................ 33
Challenges ............................................................................................................................................. 33
11. CONNECTING LANS AND VIRTUAL LAN (VLAN)................................................................................... 33
11.1 Network Bridges and their Role ....................................................................................................... 33
11.2 Routers: Interconnecting LANs and WANs ....................................................................................... 34
11.3 VLAN Concepts: Logical Segmentation, Benefits, and Implementation ............................................. 35
UNIT-II: DATA LINK LAYER AND MEDIUM ACCESS SUB LAYER .................................................... 37
12. ERROR DETECTION AND ERROR CORRECTION ......................................................................................... 37
Learning Objectives ................................................................................................................................ 37
Introduction ........................................................................................................................................... 37
Block Coding .......................................................................................................................................... 37
Hamming Codes ..................................................................................................................................... 38
Cyclic Redundancy Check (CRC) ............................................................................................................... 39
Python Implementation .......................................................................................................................... 41
13. FLOW CONTROL .................................................................................................................................... 48
Learning Objectives ................................................................................................................................ 48
Introduction ........................................................................................................................................... 48
Stop-and-Wait........................................................................................................................................ 48
Sliding Window ...................................................................................................................................... 49
14. ERROR CONTROL PROTOCOLS ................................................................................................................ 50
Learning Objectives ................................................................................................................................ 50
Introduction ........................................................................................................................................... 51
Stop-and-Wait ARQ ................................................................................................................................ 51
Go-Back-N ARQ ...................................................................................................................................... 52
MULTIPLE ACCESS PROTOCOLS ................................................................................................................... 60
CSMA/CD (COLLISION DETECTION) ........................................................................................................... 60
CSMA/CA (COLLISION AVOIDANCE) .......................................................................................................... 61
COMPARISON AND USE CASES ..................................................................................................................... 62
SUMMARY .................................................................................................................................................. 63
UNIT-III: NETWORK LAYER .............................................................................................................................. 64
SWITCHING ................................................................................................................................................. 64
LOGICAL ADDRESSING ................................................................................................................................ 65
ADDRESS MAPPING ..................................................................................................................................... 66
DELIVERY ................................................................................................................................................... 69
3

FORWARDING .............................................................................................................................................. 70
UNICAST ROUTING PROTOCOLS ................................................................................................................... 71
UNIT-4 TRANSPORT LAYER ............................................................................................................................. 73
PROCESS TO PROCESS COMMUNICATION ...................................................................................................... 73
USER DATAGRAM PROTOCOL (UDP) ............................................................................................................ 74
TRANSMISSION CONTROL PROTOCOL (TCP)................................................................................................. 75
THREE-WAY HANDSHAKE AND CONNECTION TERMINATION ......................................................................... 77
STREAM CONTROL TRANSMISSION PROTOCOL (SCTP) ................................................................................. 79
CONGESTION CONTROL ............................................................................................................................... 80
EXPLICIT CONGESTION NOTIFICATION (ECN) .............................................................................................. 81
QUALITY OF SERVICE (QOS) ........................................................................................................................ 82
QOS IMPROVING TECHNIQUES ..................................................................................................................... 83
UNIT-V: APPLICATION LAYER .......................................................................................................................... 85
DOMAIN NAME SPACE (DNS) ...................................................................................................................... 85
DYNAMIC DNS (DDNS) ............................................................................................................................. 87
TELNET .................................................................................................................................................... 89
FILE TRANSFER PROTOCOL (FTP) ................................................................................................................ 90
WORLD WIDE WEB (WWW) ....................................................................................................................... 91
SIMPLE NETWORK MANAGEMENT PROTOCOL (SNMP)................................................................................. 92
BLUETOOTH ............................................................................................................................................... 93

.................................................................................................................................................................. 94
FIREWALLS .................................................................................................................................................... 94
UNIT-I: INTRODUCTION TO MACHINE LEARNING........................................................................... 96
1. DEFINITION OF MACHINE LEARNING ........................................................................................................ 96
2. TYPES OF MACHINE LEARNING ................................................................................................................ 96
3. APPLICATIONS OF MACHINE LEARNING .................................................................................................... 98
4. CHALLENGES AND ISSUES IN MACHINE LEARNING .................................................................................... 99
5. UNDERSTANDING THE MACHINE LEARNING WORKFLOW ........................................................................ 100
4

6. BASIC TYPES OF DATA IN MACHINE LEARNING ....................................................................................... 101


7. EXPLORING THE STRUCTURE OF DATA.................................................................................................... 101
8. DATA QUALITY AND REMEDIATION ......................................................................................................... 102
9. DATA PRE-PROCESSING ......................................................................................................................... 104
UNIT-II: MODELLING & EVALUATION, BASICS OF FEATURE ENGINEERING .......................... 107
10. INTRODUCTION TO MODEL SELECTION AND EVALUATION ...................................................................... 107
11. SELECTING A MODEL ........................................................................................................................... 108
12. TRAINING A MODEL (FOR SUPERVISED LEARNING) ................................................................................ 109
13. INTRODUCTION TO FEATURE ENGINEERING ........................................................................................... 110
14. FEATURE TRANSFORMATION ................................................................................................................ 110
15. FEATURE CONSTRUCTION .................................................................................................................... 111
16. FEATURE EXTRACTION ........................................................................................................................ 112
UNIT-III: REGRESSION................................................................................................................................... 114
17. INTRODUCTION TO REGRESSION ANALYSIS...................................................................................................... 114
18. MULTIPLE LINEAR REGRESSION ................................................................................................................... 115
19. MAIN PROBLEMS IN REGRESSION ANALYSIS.................................................................................................... 116
20. LOGISTIC REGRESSION .............................................................................................................................. 117
UNIT-IV: SUPERVISED LEARNING: CLASSIFICATION .................................................................... 119
22. INTRODUCTION TO CLASSIFICATION ..................................................................................................... 119
Example: Email Spam Classification ...................................................................................................... 119
23. CLASSIFICATION LEARNING STEPS ....................................................................................................... 120
Data Preparation for Classification ....................................................................................................... 120
Training a Classification Model ............................................................................................................. 120
Model Evaluation ................................................................................................................................. 121
Model Deployment ............................................................................................................................... 121
24. COMMON CLASSIFICATION ALGORITHMS ............................................................................................. 121
k-Nearest Neighbors (kNN) ................................................................................................................... 121
Support Vector Machines (SVM) ........................................................................................................... 122
Random Forest ..................................................................................................................................... 123
UNIT-V: OTHER TYPES OF LEARNING ............................................................................................... 124
25. ENSEMBLE LEARNING ......................................................................................................................... 124
Bagging ............................................................................................................................................... 124
Boosting............................................................................................................................................... 124
Stacking ............................................................................................................................................... 125
26. ADABOOST ......................................................................................................................................... 125
Example: AdaBoost Algorithm for Binary Classification ......................................................................... 126
27. GRADIENT BOOSTING MACHINES (GBM) ............................................................................................. 127
Example: GBM Algorithm for Regression............................................................................................... 127
XGBoost ............................................................................................................................................... 128
28. REINFORCEMENT LEARNING ................................................................................................................ 129
Example: Reinforcement Learning Paradigm ......................................................................................... 129
Q-learning............................................................................................................................................ 130
UNIT - I: INTRODUCTION TO AI, INTELLIGENT AGENTS, AND PROBLEM SOLVING .............. 131
1. INTRODUCTION TO AI ............................................................................................................................ 131
1.1 Definition of Artificial Intelligence ................................................................................................... 131
1.2 Applications of AI in Various Fields .................................................................................................. 131
1.3 History of AI: Key Milestones and Evolution ..................................................................................... 132
1.4 Types of AI: Narrow AI vs. General AI .............................................................................................. 133
5

2. INTELLIGENT AGENTS ........................................................................................................................... 134


2.1 Agents and Rationality: What is an Agent, Rational Behavior .......................................................... 134
2.2 Structure of Agents: Perception, Decision-making, Action ................................................................ 134
2.3 Agent-Environment Interaction: Interaction with the Surrounding Environment ............................... 134
2.4 Types of Agents: Simple Reflex Agents, Model-Based Agents, Goal-Based Agents,............................ 135
3. PROBLEM SOLVING ................................................................................................................................ 138
3.1 Problems in AI: Well-defined Problems and Goal States ................................................................... 138
3.2 Search Spaces: Defining Problems as State Space Search ................................................................. 138
3.3 Production System: Rule-Based Approach to Problem Solving .......................................................... 139
3.4 Problem Characteristics: Deterministic vs. Stochastic, Single Agent vs. Multi-Agent ......................... 139
3.5 Issues in Designing Search Programs: Completeness, Optimality, Time Complexity, Space Complexity
............................................................................................................................................................ 140
UNIT - II: SEARCH ALGORITHMS........................................................................................................ 140
4. SEARCH ALGORITHMS ........................................................................................................................... 140
4.1 Problem-Solving Agents and Search Algorithms Terminology ........................................................... 140
4.2 Properties of Search Algorithms: Completeness, Optimality, Time and Space Complexity ................. 141
4.3 Types of Search Algorithms: Uninformed (Blind) Search and Informed (Heuristic) Search ................. 141
5. UNIFORMED/BLIND SEARCH ALGORITHMS ............................................................................................. 143
5.1 Breadth-First Search (BFS) .............................................................................................................. 143
5.2 Depth-First Search (DFS) ................................................................................................................. 144
5.3 Depth-Limited Search ..................................................................................................................... 144
5.4 Iterative Deepening Depth-First Search (IDDFS) ............................................................................... 145
5.5 Uniform Cost Search ....................................................................................................................... 146
5.6 Bidirectional Search ........................................................................................................................ 146
6. INFORMED/HEURISTIC SEARCH ALGORITHMS ......................................................................................... 147
6.1 Greedy Best-First Search Algorithm ................................................................................................. 147
6.2 A* Search Algorithm ....................................................................................................................... 148
6.3 Hill Climbing Algorithm ................................................................................................................... 149
6.4 Constraint Satisfaction Problem (CSP) ............................................................................................. 150
6.5 Means-Ends Analysis ...................................................................................................................... 151
UNIT - III: ADVERSARIAL SEARCH AND KNOWLEDGE REPRESENTATION .............................. 153
7. ADVERSARIAL SEARCH/GAME PLAYING ................................................................................................. 153
7.1 Introduction to Adversarial Search and Game Playing ..................................................................... 153
7.2 Minimax Algorithm: Decision-Making in Two-Player Games ............................................................ 153
7.3 Alpha-Beta Pruning: Reducing the Search Space in Game Trees ....................................................... 154
8. KNOWLEDGE REPRESENTATION ............................................................................................................. 155
8.1 Representations and Mappings: Encoding Knowledge for AI Systems ............................................... 155
8.2 Approaches to Knowledge Representation: Logical, Semantic Networks, Frames, etc. ...................... 156
8.3 Issues in Knowledge Representation: Expressiveness, Efficiency, ...................................................... 157
UNIT - IV: KNOWLEDGE REPRESENTATION USING PREDICATE LOGIC AND RULES ............ 157
9. KNOWLEDGE REPRESENTATION USING PREDICATE LOGIC ....................................................................... 157
9.1 Representing Simple Facts in Logic: Propositions, Predicates, and Connectives................................. 157
10. REPRESENTING KNOWLEDGE USING RULES .......................................................................................... 158
10.1 Procedural vs. Declarative Knowledge ........................................................................................... 158
10.2 Logic Programming: Prolog and Its Applications ............................................................................ 159
10.3 Forward vs. Backward Reasoning: Inference Control in Rule-Based Systems ................................... 160
10.4 Matching: Pattern Matching and Unification ................................................................................. 160
UNIT - V: UNCERTAIN KNOWLEDGE AND REASONING AND LEARNING ................................... 162
11. UNCERTAIN KNOWLEDGE AND REASONING ........................................................................................... 162
6

11.1 Probability and Bayes’ Theorem: Dealing with Uncertainty............................................................ 162


11.2 Certainty Factors and Rule-Based Systems..................................................................................... 162
11.3 Bayesian Networks: Probabilistic Graphical Models for Representing Uncertain Knowledge ........... 163
11.4 Dempster-Shafer Theory: Theory of Evidence for Combining Uncertain Information ....................... 164
11.5 Fuzzy Logic: Handling Vagueness and Gradual Truth Values........................................................... 165
12. LEARNING ........................................................................................................................................... 166
12.1 Overview of Different Forms of Learning: Supervised, Unsupervised, Reinforcement Learning ........ 166
12.2 Learning Decision Trees: Building Decision Trees from Data ........................................................... 166
12.3 Neural Networks: Basics of Artificial Neural Networks and Their Applications in Learning .............. 167
7

Computer Networks
UNIT-I: Data Communication Components
1. Representation of Data: Analog vs. Digital Signals
1.1 Data Communication and Signals

 Data communication is the process of transferring information from one point to another us-
ing a medium such as a wire, cable, or wireless channel.
 A signal is a variation of a physical quantity that conveys information. Signals can be classi-
fied into two types: analog and digital.
 Analog signals are continuous signals that can have any value within a range. For example,
sound waves, light waves, and temperature are analog signals.
 Digital signals are discrete signals that can have only two values: 0 or 1. For example, binary
numbers, Morse code, and on-off switches are digital signals.

1.2 Analog Signals: Characteristics and Examples

 Analog signals have four main characteristics: amplitude, frequency, phase, and bandwidth.


 Amplitude is the height or strength of the signal. It is measured in volts (V) or decibels (dB).
 Frequency is the number of cycles or oscillations of the signal per second. It is measured in
hertz (Hz) or kilohertz (kHz).
 Phase is the position or angle of the signal relative to a reference point. It is measured in de-
grees (°) or radians (rad).
 Bandwidth is the range of frequencies that the signal occupies. It is measured in hertz (Hz) or
kilohertz (kHz).
 Some examples of analog signals are:
o Voice: Voice is an analog signal that varies in amplitude and frequency according to
the sound produced by the speaker. The human voice has a frequency range of about
300 Hz to 3400 Hz.
o Music: Music is an analog signal that consists of multiple frequencies and amplitudes
that create different sounds and melodies. The musical notes have a frequency range
of about 20 Hz to 20 kHz.
o Video: Video is an analog signal that consists of multiple frames or images that
change rapidly to create motion. Each frame has a different amplitude and frequency
that represent the color and brightness of the pixels. The video signal has a bandwidth
of about 4.2 MHz.
8

1.3 Digital Signals: Characteristics and Examples

 Digital signals have two main characteristics: bit rate and bit interval.
 Bit rate is the number of bits (0 or 1) transmitted per second. It is measured in bits per second
(bps) or megabits per second (Mbps).
 Bit interval is the time required to transmit one bit. It is measured in seconds (s) or millisec-
onds (ms).


 Some examples of digital signals are:
o Text: Text is a digital signal that consists of a sequence of characters encoded using a
standard code such as ASCII or Unicode. Each character has a fixed number of bits
that represent its value. For example, ASCII uses 7 bits to encode 128 characters,
while Unicode uses 8, 16, or 32 bits to encode over a million characters.
o Image: Image is a digital signal that consists of a matrix of pixels that represent the
color and brightness of the picture. Each pixel has a fixed number of bits that repre-
sent its value. For example, a black-and-white image uses 1 bit per pixel, while a
color image uses 8, 16, or 24 bits per pixel.
o Audio: Audio is a digital signal that consists of a sequence of samples that represent
the amplitude and frequency of the sound wave. Each sample has a fixed number of
bits that represent its value. For example, CD-quality audio uses 16 bits per sample at
a sampling rate of 44.1 kHz.

1.4 Analog-to-Digital Conversion

 Analog-to-digital conversion (ADC) is the process of converting an analog signal into a digi-
tal signal.
 ADC involves two steps: sampling and quantization.
 Sampling is the process of taking snapshots or measurements of the analog signal at regular
intervals called sampling rate. The sampling rate determines how often the analog signal is
sampled and how accurate the digital signal is. The higher the sampling rate, the more sam-
ples are taken and the more accurate the digital signal is. However, higher sampling rates also
require more bandwidth and storage space.
 Quantization is the process of assigning a discrete value or level to each sample based on its
amplitude. The quantization level determines how many bits are used to represent each sam-
ple and how precise the digital signal is. The higher the quantization level, the more bits are
used and the more precise the digital signal is. However, higher quantization levels also in-
crease the noise and distortion in the digital signal.
 ADC is used for various applications such as digital audio recording, digital photography, dig-
ital video, and digital communication.
9

1.5 Digital-to-Analog Conversion

 Digital-to-analog conversion (DAC) is the process of converting a digital signal into an ana-
log signal.
 DAC involves two steps: reconstruction and filtering.
 Reconstruction is the process of generating a continuous signal from the discrete samples us-
ing a technique called interpolation. Interpolation is the process of filling in the gaps between
the samples using a mathematical function or a curve. The interpolation function determines
how smooth and accurate the analog signal is. The most common interpolation function is the
linear interpolation, which connects the samples with straight lines.
 Filtering is the process of removing the unwanted frequencies or noise from the reconstructed
signal using a device called a filter. A filter is a circuit that allows only certain frequencies to
pass through and blocks others. The filter determines how clean and clear the analog signal is.
The most common filter is the low-pass filter, which allows only the low frequencies to pass
through and blocks the high frequencies.
 DAC is used for various applications such as digital audio playback, digital video display, and
analog communication.

2. Data Encoding
2.1 Importance of Data Encoding

 Data encoding is the process of transforming data into a format that can be transmitted,
stored, or processed by a system.
 Data encoding is important for several reasons:
o It enables data compression, which reduces the size of data and saves bandwidth and
storage space.
o It enables data encryption, which protects data from unauthorized access and modifi-
cation.
o It enables data error detection and correction, which ensures data integrity and relia-
bility.
o It enables data modulation, which adapts data to the characteristics of the transmis-
sion medium.

2.2 ASCII (American Standard Code for Information Interchange)

 ASCII is a standard code that assigns a 7-bit binary number to each character in the English
alphabet, digits, punctuation marks, and some control characters.
 ASCII can encode 128 characters in total, with values ranging from 0 to 127.
 ASCII is widely used for text-based communication and data processing.
 Some examples of ASCII characters and their binary codes are:

Character Binary Code

A 01000001

B 01000010

C 01000011

1 00110001
10

2 00110010

3 00110011

! 00100001

? 00111111

\n 00001010

2.3 Unicode and Multibyte Character Encoding

 Unicode is a standard code that assigns a unique number to each character in almost every
language and writing system in the world.
 Unicode can encode over a million characters in total, with values ranging from 0 to 1114111
(hexadecimal 10FFFF).
 Unicode uses different formats to represent characters using different numbers of bytes. These
formats are called UTF (Unicode Transformation Format).
 Some examples of UTF formats are:
o UTF-8: This format uses 1 to 4 bytes to encode each character. It is compatible with
ASCII for the first 128 characters. It is widely used for web pages and email mes-
sages.
o UTF-16: This format uses 2 or 4 bytes to encode each character. It is commonly used
for text files and software applications.
o UTF-32: This format uses 4 bytes to encode each character. It is rarely used because it
requires more space than other formats.
 Multibyte character encoding is a general term that refers to any encoding scheme that uses
more than one byte to encode each character. Unicode is an example of multibyte character
encoding, but there are other schemes as well.
 Some examples of non-Unicode multibyte character encoding are:
o GB2312: This scheme uses 1 or 2 bytes to encode each character in simplified Chi-
nese. It can encode about 7400 characters in total.
o Shift-JIS: This scheme uses 1 or 2 bytes to encode each character in Japanese. It can
encode about 7000 characters in total.
11

3. Data Flow Networks


3.1 Simplex Communication

 Simplex communication is a type of data flow network that allows data to flow in only one
direction, from the sender to the receiver.
 Simplex communication is simple and cheap, but it has limited functionality and efficiency.


 Some examples of simplex communication are:
o Radio broadcast: The radio station transmits audio signals to the listeners, but the lis-
teners cannot send any feedback to the station.
o Keyboard: The keyboard sends keystrokes to the computer, but the computer does not
send any signals back to the keyboard.
o Printer: The computer sends print commands to the printer, but the printer does not
send any status information back to the computer.

3.2 Half-duplex Communication

 Half-duplex communication is a type of data flow network that allows data to flow in both
directions, but not at the same time. The sender and the receiver have to take turns to transmit
and receive data.


 Half-duplex communication is more flexible and efficient than simplex communication, but it
still has some drawbacks such as delay and collision.
 Some examples of half-duplex communication are:
o Walkie-talkie: The users can talk and listen to each other, but they have to press a but-
ton to switch between transmitting and receiving modes.
o Ethernet: The devices can send and receive data over a shared cable, but they have to
use a protocol called CSMA/CD (Carrier Sense Multiple Access with Collision De-
tection) to avoid interfering with each other’s transmissions.
o Bluetooth: The devices can exchange data wirelessly, but they have to use a protocol
called TDD (Time Division Duplexing) to divide the time into slots for transmitting
and receiving.
12

3.3 Full-duplex Communication

 Full-duplex communication is a type of data flow network that allows data to flow in both di-
rections simultaneously. The sender and the receiver can transmit and receive data at the same
time without any interruption or interference.


 Full-duplex communication is the most advanced and efficient type of data flow network, but
it also requires more complex and expensive hardware and software.
 Some examples of full-duplex communication are:
o Telephone: The users can talk and listen to each other at the same time without any
delay or noise.
o Fiber-optic cable: The devices can send and receive data over a pair of light signals
that travel in opposite directions without any collision or attenuation.
o Wi-Fi: The devices can exchange data wirelessly using a protocol called OFDM (Or-
thogonal Frequency Division Multiplexing) that splits the frequency into subcarriers
for transmitting and receiving.

3.4 Comparison and Use Cases

 The table below summarizes the main features and advantages of each type of data flow net-
work:

TYPE DIRECTION SIMULTANEITY HARDWARE SOFTWARE ADVANTAGE

SIMPLEX One-way No Simple Simple Cheap

HALF-DUPLEX Two-way No Moderate Moderate Flexible

FULL-DUPLEX Two-way Yes Complex Complex Efficient

 The choice of data flow network depends on the requirements and constraints of the applica-
tion. Some factors that influence the decision are:
o Data rate: How fast does the data need to be transmitted or received?
o Data volume: How much data needs to be transmitted or received?
o Data quality: How important is the accuracy and reliability of the data?
13

o Data security: How sensitive is the data and how vulnerable is it to unauthorized ac-
cess or modification?
o Cost: How much budget is available for the hardware and software components?
o Availability: How easy is it to obtain and maintain the hardware and software compo-
nents?

4. Networks
4.1 LAN (Local Area Network)

 A LAN is a network that connects devices within a small geographic area, such as a home,
office, or campus.
 A LAN typically uses wired or wireless technologies, such as Ethernet, Wi-Fi, or Bluetooth,
to transmit data over short distances at high speeds.
 A LAN usually has a single owner or administrator who controls the network configuration
and security policies.
 A LAN can support various applications, such as file sharing, printer sharing, email, web
browsing, video conferencing, gaming, etc.

4.2 MAN (Metropolitan Area Network)

 A MAN is a network that connects devices within a large geographic area, such as a city or a
region.
 A MAN typically uses fiber-optic cables, microwave links, or satellite links to transmit data
over long distances at moderate speeds.
 A MAN usually has multiple owners or operators who cooperate or compete with each other
to provide network services and resources.
 A MAN can support various applications, such as telephony, cable TV, internet access, public
safety, etc.
14

4.3 WAN (Wide Area Network)

 A WAN is a network that connects devices across a vast geographic area, such as a country or
a continent.
 A WAN typically uses a combination of wired and wireless technologies, such as copper
wires, fiber-optic cables, microwave links, satellite links, cellular networks, etc., to transmit
data over very long distances at low speeds.
 A WAN usually has many owners or operators who follow international standards and proto-
cols to ensure interoperability and compatibility among different networks.
 A WAN can support various applications, such as email, web browsing, online shopping, so-
cial media, etc.


15

4. Connection Topology
4.1 Bus Topology

 A bus topology is a type of connection topology that connects all the devices on a network
using a single cable called a bus.
 A bus topology has the following advantages and disadvantages:
o Advantages:
 It is simple and cheap to install and maintain.
 It can easily accommodate new devices by adding more taps or connectors
to the bus.
 It does not require any central device or switch to control the data flow.
o Disadvantages:
 It has low performance and reliability due to signal degradation and interfer-
ence.
 It has low security and privacy due to data broadcast and snooping.
 It has low scalability and fault tolerance due to limited cable length and sin-
gle point of failure.

4.2 Star Topology

 A star topology is a type of connection topology that connects all the devices on a network
to a central device called a hub or a switch.
 A star topology has the following advantages and disadvantages:
o Advantages:
 It has high performance and reliability due to dedicated links and isolation.
 It has high security and privacy due to data transmission and filtering.
 It has high scalability and fault tolerance due to easy addition and removal of
devices and multiple paths.
o Disadvantages:
 It is complex and expensive to install and maintain.
 It requires a central device or switch to control the data flow.
 It depends on the functionality and availability of the central device or
switch.
16

4.3 Ring Topology

 A ring topology is a type of connection topology that connects all the devices on a network in
a circular fashion using a single cable.
 A ring topology has the following advantages and disadvantages:
o Advantages:
 It is simple and cheap to install and maintain.
 It does not require any central device or switch to control the data flow.
 It can support high data rates and long distances due to signal regeneration.
o Disadvantages:
 It has low performance and reliability due to data collision and delay.
 It has low security and privacy due to data circulation and snooping.
 It has low scalability and fault tolerance due to limited cable length and sin-
gle point of failure.

4.4 Mesh Topology

 A mesh topology is a type of connection topology that connects all the devices on a network
directly or indirectly using multiple cables.
 A mesh topology has the following advantages and disadvantages:
o Advantages:
 It has high performance and reliability due to dedicated links and redun-
dancy.
 It has high security and privacy due to data encryption and routing.
 It has high scalability and fault tolerance due to easy addition and removal of
devices and multiple paths.
17


o Disadvantages:
 It is complex and expensive to install and maintain.
 It requires a lot of cables, ports, and switches to connect all the devices.
 It may cause network congestion and overhead due to excessive routing.

4.5 Hybrid Topology

 A hybrid topology is a type of connection topology that combines two or more different to-
pologies to form a network.
 A hybrid topology has the following advantages and disadvantages:


o Advantages:
 It can leverage the benefits of different topologies according to the needs of
the network.
 It can provide flexibility, diversity, and compatibility among different net-
works.
 It can enhance the performance, reliability, security, scalability, and fault tol-
erance of the network by using appropriate topologies for different seg-
ments or layers of the network.
o Disadvantages:
 It is complex and expensive to design, install, maintain, and troubleshoot.
 It requires careful planning, coordination, integration, and management
among different topologies, devices, protocols, standards, etc.
18

5. Protocols and Standards


5.1 Importance of Protocols in Data Communication

 A protocol is a set of rules or conventions that governs how data is exchanged between two
or more entities on a network.
 A protocol defines the format, structure, content, meaning, timing, sequence, order, direc-
tion, error handling, etc., of data transmission or reception.
 A protocol ensures that data communication is consistent, reliable, efficient, secure, interop-
erable, compatible, etc., among different entities on a network.
 Some examples of protocols are HTTP (Hypertext Transfer Protocol), FTP (File Transfer Proto-
col), SMTP (Simple Mail Transfer Protocol), TCP (Transmission Control Protocol), IP (Internet
Protocol), etc.

5.2 Overview of TCP/IP Protocol Suite

 TCP/IP protocol suite is a collection of protocols that enables data communication over the
internet or any other network that follows the internet standards.
 TCP/IP protocol suite consists of four layers: application layer, transport layer, internet layer,
and network access layer.
 Each layer performs a specific function and interacts with the adjacent layers using well-de-
fined interfaces.
 Each layer uses one or more protocols to perform its function and provides services to the
upper layer or receives services from the lower layer.
 The figure below shows the TCP/IP protocol suite and some of its protocols:

5.3 Introduction to Ethernet and IEEE 802.3

 Ethernet is a family of protocols that defines how data is transmitted and received over a
LAN using a bus or a star topology.
 Ethernet is based on the IEEE 802.3 standard, which specifies the physical and data link lay-
ers of the TCP/IP protocol suite.
 Ethernet uses a technique called CSMA/CD (Carrier Sense Multiple Access with Collision De-
tection) to share the medium and avoid collisions among multiple devices on a network.
19

 Ethernet supports various data rates, such as 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps, etc., de-
pending on the type and quality of the cable, the length of the cable, the number of devices,
etc.
 Ethernet uses a 48-bit address called MAC (Media Access Control) address to identify each
device on a network.
 Ethernet uses a frame format to encapsulate data packets from the upper layer and add
header and trailer information for transmission and reception.
 The figure below shows the Ethernet frame format:

5.4 Other Networking Protocols

 Besides TCP/IP and Ethernet, there are many other networking protocols that are used for
different purposes, such as:
o IPX/SPX: This protocol suite is used for data communication over Novell NetWare
networks. It provides connectionless and connection-oriented services at the net-
work and transport layers, respectively.
o NetBEUI: This protocol is used for data communication over Microsoft Windows net-
works. It provides connectionless service at the transport layer and relies on MAC
addresses for addressing.
o ATM (Asynchronous Transfer Mode): This protocol is used for data communication
over high-speed networks that use fiber-optic cables or satellite links. It provides
connection-oriented service at the network layer and uses fixed-length cells for
transmission and switching.
o MPLS (Multiprotocol Label Switching): This protocol is used for data communication
over WANs that use different underlying technologies, such as IP, ATM, Frame Relay,
etc. It provides connection-oriented service at the network layer and uses labels for
routing and forwarding.
20

6. OSI Model
6.1 Seven Layers of the OSI Model and Their Functions

 OSI (Open Systems Interconnection) model is a conceptual framework that defines how data
communication occurs between different systems or devices on a network.
 OSI model consists of seven layers, each of which performs a specific function and interacts
with the adjacent layers using well-defined interfaces.
 The seven layers of the OSI model are:

Layer Name Function

7 Application Provides services and interfaces to the user applications, such as email, web browsing,
file transfer, etc.

6 Presenta- Translates, encrypts, compresses, and formats the data for transmission or reception.
tion

5 Session Establishes, maintains, and terminates the connection between the communicating en-
tities.

4 Transport Ensures reliable and efficient delivery of data between the source and destination.

3 Network Determines the best path and routes the data packets across different networks.

2 Data Link Transfers data frames between adjacent nodes on the same network.

1 Physical Transmits and receives raw bits over the physical medium.

6.2 Explanation of each layer: Physical, Data Link, Network, Transport, Session, Presenta-
tion, and Application

 Physical layer: This layer is responsible for converting the digital data into electrical, optical,
or radio signals and vice versa. It also defines the characteristics of the physical medium, such
as voltage levels, frequency range, modulation scheme, connector type, cable type, etc. Some
examples of physical layer protocols are RS-232, V.35, Ethernet, Wi-Fi, etc.


21

 Data Link layer: This layer is responsible for transferring data frames between adjacent nodes
on the same network. It also provides error detection and correction, flow control, and me-
dium access control. It consists of two sublayers: logical link control (LLC) and media access
control (MAC). LLC provides services to the upper layer and controls the frame synchroniza-
tion and sequencing. MAC provides services to the lower layer and controls the access to the
shared medium. Some examples of data link layer protocols are Ethernet, HDLC (High-Level
Data Link Control), PPP (Point-to-Point Protocol), etc.


 Network layer: This layer is responsible for determining the best path and routing the data
packets across different networks. It also provides logical addressing, fragmentation and reas-
sembly, congestion control, and network management. Some examples of network layer pro-
tocols are IP (Internet Protocol), ICMP (Internet Control Message Protocol), ARP (Address
Resolution Protocol), RIP (Routing Information Protocol), OSPF (Open Shortest Path First),
etc.


 Transport layer: This layer is responsible for ensuring reliable and efficient delivery of data
between the source and destination. It also provides port addressing, segmentation and reas-
sembly, flow control, error control, and connection management. Some examples of transport
layer protocols are TCP (Transmission Control Protocol), UDP (User Datagram Protocol),
SCTP (Stream Control Transmission Protocol), etc.
22


 Session layer: This layer is responsible for establishing, maintaining, and terminating the con-
nection between the communicating entities. It also provides synchronization, dialogue con-
trol, session recovery, and authentication. Some examples of session layer protocols are RPC
(Remote Procedure Call), NFS (Network File System), SQL (Structured Query Language),
etc.

Presentation layer: This layer is responsible for translating, encrypting, compressing, and for-
matting the data for transmission or reception. It also provides data representation, conver-
sion, encryption, compression, and decompression. It also provides character set conversion,
data encryption, data compression, and data formatting. Some examples of presentation layer
protocols are ASCII, Unicode, JPEG, MPEG, SSL (Secure Sockets Layer), etc.
23

 Application layer: This layer is responsible for providing services and interfaces to the user
applications, such as email, web browsing, file transfer, etc. It also provides network access,
resource sharing, remote access, directory services, and network management. Some exam-
ples of application layer protocols are HTTP (Hypertext Transfer Protocol), FTP (File Trans-
fer Protocol), SMTP (Simple Mail Transfer Protocol), DNS (Domain Name System), SNMP
(Simple Network Management Protocol), etc.

6.3 Encapsulation and De-encapsulation

 Encapsulation is the process of adding header and trailer information to the data as it moves
down the layers of the OSI model. Each layer adds its own header and trailer to the data re-
ceived from the upper layer. The header contains information such as source and destination
addresses, sequence numbers, error codes, etc. The trailer contains information such as check-
sums, end-of-frame markers, etc.
 De-encapsulation is the process of removing header and trailer information from the data as it
moves up the layers of the OSI model. Each layer removes its own header and trailer from the
data received from the lower layer. The header and trailer are used to verify, interpret, and
process the data before passing it to the upper layer.
24

 The figure below shows the encapsulation and de-encapsulation process:

7. Transmission Media
7.1 Twisted Pair Cable: UTP vs. STP, Categories

 A twisted pair cable is a type of transmission media that consists of two insulated copper
wires twisted together to reduce electromagnetic interference and crosstalk.
 A twisted pair cable can be classified into two types: unshielded twisted pair (UTP) and
shielded twisted pair (STP).
o UTP is a twisted pair cable that does not have any additional shielding or protection.
It is cheaper and easier to install, but it is more susceptible to noise and interference.
o STP is a twisted pair cable that has a metallic shield or foil around each pair or the
entire cable. It is more expensive and difficult to install, but it provides better noise
and interference immunity.
25

o
 A twisted pair cable can also be classified into different categories based on its data rate,
bandwidth, and quality. The table below shows some of the common categories of twisted
pair cables:

Category Data Rate Bandwidth Quality

Cat 3 10 Mbps 16 MHz Voice grade

Cat 5 100 Mbps 100 MHz Data grade

Cat 5e 1 Gbps 100 MHz Enhanced data grade

Cat 6 10 Gbps 250 MHz High performance data grade

Cat 6a 10 Gbps 500 MHz Augmented high performance data grade

7.2 Coaxial Cable: Thicknet vs. Thinnet, Uses

 A coaxial cable is a type of transmission media that consists of a central copper core sur-
rounded by an insulating layer, a braided metal shield, and an outer cover.
 A coaxial cable can be classified into two types: thick coaxial cable and thin coaxial cable.
o Thick coaxial cable, also known as 10Base5 or RG-8, is a thick and rigid coaxial ca-
ble that can support data rates up to 10 Mbps and distances up to 500 meters. It is
used for backbone networks and long-distance connections.
o Thin coaxial cable, also known as 10Base2 or RG-58, is a thin and flexible coaxial
cable that can support data rates up to 10 Mbps and distances up to 185 meters. It is
used for local area networks and short-distance connections.
26

o
 A coaxial cable has the following advantages and disadvantages:
o Advantages:
 It has high bandwidth and data rate compared to twisted pair cables.
 It has low attenuation and signal loss compared to twisted pair cables.
 It has high noise and interference immunity compared to twisted pair cables.
o Disadvantages:
 It is more expensive and difficult to install and maintain compared to twisted
pair cables.
 It is less flexible and scalable compared to twisted pair cables.
 It is more prone to security breaches due to tapping compared to twisted pair
cables.

7.3 Fiber-optic Cable: Advantages, Types (Single-mode, Multi-mode)

 A fiber-optic cable is a type of transmission media that consists of one or more thin strands of
glass or plastic that carry light signals.
 A fiber-optic cable has the following advantages over other types of transmission media:
o It has very high bandwidth and data rate compared to copper cables.
o It has very low attenuation and signal loss compared to copper cables.
o It has very high noise and interference immunity compared to copper cables.
o It has very high security and privacy compared to copper cables.
 A fiber-optic cable can be classified into two types based on the mode of light propagation:
single-mode fiber and multi-mode fiber.
o Single-mode fiber, also known as SMF, is a fiber-optic cable that allows only one
mode or path of light to travel through it. It has a very thin core diameter of about 8 to
10 micrometers. It can support very high data rates and long distances up to several
kilometers. It is used for backbone networks and long-distance connections.
o Multi-mode fiber, also known as MMF, is a fiber-optic cable that allows multiple
modes or paths of light to travel through it. It has a larger core diameter of about 50 to
62.5 micrometers. It can support moderate data rates and short distances up to several
hundred meters. It is used for local area networks and short-distance connections.
27

7.4 Wireless Transmission: Radio Waves, Microwaves, Infrared, Bluetooth

 Wireless transmission is a type of transmission media that uses electromagnetic waves or sig-
nals to transmit data through air or space without any physical medium or cable.
 Wireless transmission can use different types of electromagnetic waves or signals depending
on the frequency, wavelength, range, and application. Some of the common types of wireless
transmission are:
o Radio waves: These are electromagnetic waves that have frequencies ranging from 3
kHz to 300 GHz and wavelengths ranging from 1 mm to 100 km. They can penetrate
through walls and obstacles and travel long distances. They are used for various ap-
plications, such as AM/FM radio, TV, cellular phones, Wi-Fi, etc.
o Microwaves: These are electromagnetic waves that have frequencies ranging from
300 MHz to 300 GHz and wavelengths ranging from 1 mm to 1 m. They can travel in
straight lines and require line-of-sight communication. They are used for various ap-
plications, such as satellite communication, radar, microwave ovens, etc.
o Infrared: These are electromagnetic waves that have frequencies ranging from 300
GHz to 400 THz and wavelengths ranging from 1 mm to 700 nm. They can be
blocked by solid objects and have a short range. They are used for various applica-
tions, such as remote controls, optical communication, thermal imaging, etc.
o Bluetooth: This is a wireless technology that uses radio waves in the 2.4 GHz fre-
quency band to transmit data over short distances up to 10 meters. It is used for vari-
ous applications, such as wireless headphones, keyboards, mice, printers, etc.

8. LAN (Local Area Network)


8.1 Definition and Scope of LAN

 A LAN (Local Area Network) is a network that connects devices within a small geographic
area, such as a home, office, or campus.
28


 A LAN typically uses wired or wireless technologies, such as Ethernet, Wi-Fi, or Bluetooth,
to transmit data over short distances at high speeds.
 A LAN usually has a single owner or administrator who controls the network configuration
and security policies.

 A LAN can support various applications, such as file sharing, printer sharing, email, web
browsing, video conferencing, gaming, etc.

8.2 Components of a LAN: Computers, Switches, Routers, Access Points, etc.

 A LAN consists of various components that perform different functions and roles on the net-
work. Some of the common components of a LAN are:
o Computers: These are the devices that generate, process, store, and consume data on
the network. They can be desktops, laptops, tablets, smartphones, etc. They can run
various operating systems, such as Windows, Linux, MacOS, Android, iOS, etc. They
can use various applications, such as browsers, email clients, word processors, games,
etc.
o Switches: These are the devices that connect multiple computers on the same network
using cables or ports. They can forward data frames between the computers based on
their MAC addresses. They can also divide the network into smaller segments or sub-
nets to reduce traffic and improve performance.
o Routers: These are the devices that connect multiple networks using different proto-
cols or technologies. They can route data packets between the networks based on their
IP addresses. They can also perform various functions, such as NAT (Network Ad-
dress Translation), DHCP (Dynamic Host Configuration Protocol), firewall, VPN
(Virtual Private Network), etc.
o Access Points: These are the devices that provide wireless connectivity to the com-
puters on the network using radio waves or signals. They can broadcast a wireless
29

network name or SSID (Service Set Identifier) and a password or key to authenticate
the computers. They can also support various wireless standards or protocols, such as
Wi-Fi 802.11a/b/g/n/ac/ax, Bluetooth 802.15.1/4/5/6, etc.

8.4 Benefits and Limitations of LANs

 A LAN (Local Area Network) has many benefits and limitations for the users and the network
administrators. Some of the benefits and limitations are:

Benefits

 A LAN provides fast and reliable data communication within a small geographic area, such as
a home, office, or campus.
 A LAN allows the users to share resources, such as files, printers, scanners, cameras, etc.,
among the devices on the network.
 A LAN enables the users to access various applications, such as email, web browsing, video
conferencing, gaming, etc., on the network or the internet.
 A LAN reduces the cost and complexity of data communication by using common hardware
and software components and protocols.
 A LAN increases the security and privacy of data communication by using encryption, au-
thentication, firewall, VPN, etc., on the network or the internet.
 A LAN improves the performance and efficiency of data communication by using switches,
routers, access points, etc., to optimize the data flow and reduce traffic and congestion on the
network.

Limitations

 A LAN has a limited geographic scope and cannot connect devices over long distances or
across different networks.
 A LAN requires a lot of maintenance and management by the network administrator to ensure
the proper functioning and security of the network.
 A LAN may face various challenges and issues, such as network failure, device malfunction,
data loss, data corruption, data theft, data breach, etc., on the network or the internet.
 A LAN may have compatibility and interoperability problems with other networks or devices
that use different hardware and software components and protocols.
30

9. Wired LAN
9.1 Introduction to Ethernet LAN

 A wired LAN (Local Area Network) is a network that connects devices using cables or wires
as the transmission medium.
 A wired LAN typically uses Ethernet as the protocol to define how data is transmitted and re-
ceived over the network.
 Ethernet is a family of protocols that defines the physical and data link layers of the TCP/IP
protocol suite.


 Ethernet uses a technique called CSMA/CD (Carrier Sense Multiple Access with Collision
Detection) to share the medium and avoid collisions among multiple devices on the network.
 Ethernet supports various data rates, such as 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps, etc., de-
pending on the type and quality of the cable, the length of the cable, the number of devices,
etc.
 Ethernet uses a 48-bit address called MAC (Media Access Control) address to identify each
device on the network.
 Ethernet uses a frame format to encapsulate data packets from the upper layer and add header
and trailer information for transmission and reception.

9.2 Variations of Ethernet: Fast Ethernet (100 Mbps), Gigabit Ethernet (1 Gbps), 10 Gigabit
Ethernet (10 Gbps)

 Ethernet has evolved over time to meet the increasing demands of data communication. Some
of the variations of Ethernet are:
o Fast Ethernet: This is a variation of Ethernet that supports data rates up to 100 Mbps.
It uses the same frame format as the original Ethernet, but it requires higher quality
cables, such as Category 5 twisted pair or fiber-optic cables. It is also known as
100Base-T or 100Base-FX.
o Gigabit Ethernet: This is a variation of Ethernet that supports data rates up to 1 Gbps.
It uses a slightly modified frame format than the original Ethernet, but it is compati-
ble with Fast Ethernet. It requires higher quality cables, such as Category 5e twisted
pair or fiber-optic cables. It is also known as 1000Base-T or 1000Base-X.
o 10 Gigabit Ethernet: This is a variation of Ethernet that supports data rates up to 10
Gbps. It uses a different frame format than the original Ethernet, but it is compatible
31

with Gigabit Ethernet. It requires higher quality cables, such as Category 6 twisted
pair or fiber-optic cables. It is also known as 10GBase-T or 10GBase-X.

9.3 Ethernet Frame Format

 An Ethernet frame is a unit of data that is transmitted and received over an Ethernet network.
It consists of various fields that contain information such as source and destination addresses,
type of data, error detection, etc.
 An Ethernet frame has two main parts: header and payload. The header contains information
that is used by the data link layer to deliver the frame to the correct destination. The payload
contains information that is used by the upper layers to process the data.
 An Ethernet frame has different formats depending on the variation of Ethernet. The table be-
low shows some of the common formats of Ethernet frames:

Format Header Fields Payload Fields

Origi- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Frame
nal nation MAC Address (6 bytes), Source MAC Address (6 Check Sequence (4 bytes)
bytes), Length/Type (2 bytes)

Fast Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Frame
nation MAC Address (6 bytes), Source MAC Address (6 Check Sequence (4 bytes)
bytes), Length/Type (2 bytes)

Giga- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (46 to 1500 bytes), Pad (0
bit nation MAC Address (6 bytes), Source MAC Address (6 to 42 bytes), Frame Check Se-
bytes), Length/Type (2 bytes) quence (4 bytes)

10 Gi- Preamble (7 bytes), Start Frame Delimiter (1 byte), Desti- Data (64 to 1500 bytes), Pad (0
gabit nation MAC Address (6 bytes), Source MAC Address (6 to 36 bytes), Frame Check Se-
bytes), Length/Type (2 bytes) quence (4 bytes)

 The figure below shows an example of an original Ethernet frame format:


32

10. Wireless LAN


10.1 IEEE 802.11 Standards for Wi-Fi

 A wireless LAN (Local Area Network) is a network that connects devices using radio waves
or signals as the transmission medium.
 A wireless LAN typically uses Wi-Fi as the protocol to define how data is transmitted and re-
ceived over the network.
 Wi-Fi is a family of protocols that is based on the IEEE 802.11 standard, which specifies the
physical and data link layers of the TCP/IP protocol suite.
 Wi-Fi uses a technique called CSMA/CA (Carrier Sense Multiple Access with Collision
Avoidance) to share the medium and avoid collisions among multiple devices on the network.
 Wi-Fi supports various data rates, such as 11 Mbps, 54 Mbps, 600 Mbps, 1.3 Gbps, etc., de-
pending on the type and quality of the signal, the distance between the devices, the number of
devices, etc.
 Wi-Fi uses a 48-bit address called MAC (Media Access Control) address to identify each de-
vice on the network.
 Wi-Fi uses a frame format to encapsulate data packets from the upper layer and add header
and trailer information for transmission and reception.

10.2 Wi-Fi Technologies: 802.11a/b/g/n/ac/ax

 Wi-Fi has evolved over time to meet the increasing demands of data communication. Some of
the variations of Wi-Fi are:
o 802.11a: This is a variation of Wi-Fi that operates in the 5 GHz frequency band and
supports data rates up to 54 Mbps. It has less interference and more channels than
802.11b, but it has shorter range and higher cost.
o 802.11b: This is a variation of Wi-Fi that operates in the 2.4 GHz frequency band and
supports data rates up to 11 Mbps. It has longer range and lower cost than 802.11a,
but it has more interference and fewer channels than 802.11a.
o 802.11g: This is a variation of Wi-Fi that operates in the 2.4 GHz frequency band and
supports data rates up to 54 Mbps. It is compatible with 802.11b, but it has higher
performance and security than 802.11b.
o 802.11n: This is a variation of Wi-Fi that operates in both the 2.4 GHz and 5 GHz fre-
quency bands and supports data rates up to 600 Mbps. It uses a technique called
MIMO (Multiple Input Multiple Output) to increase the number of antennas and
streams for transmission and reception. It also uses a technique called OFDM (Or-
thogonal Frequency Division Multiplexing) to split the frequency into subcarriers for
transmission and reception.
o 802.11ac: This is a variation of Wi-Fi that operates in the 5 GHz frequency band and
supports data rates up to 1.3 Gbps. It uses a technique called MU-MIMO (Multi-User
Multiple Input Multiple Output) to allow multiple devices to transmit and receive
data simultaneously. It also uses a technique called Wider Channel Bandwidth to in-
crease the channel width from 20 MHz to 80 MHz or 160 MHz for transmission and
reception.
o 802.11ax: This is a variation of Wi-Fi that operates in both the 2.4 GHz and 5 GHz
frequency bands and supports data rates up to10 Gbps. It uses a technique called
OFDMA (Orthogonal Frequency Division Multiple Access) to divide the subcarriers
into smaller units called resource units for transmission and reception. It also uses a
technique called BSS Coloring to reduce the interference from neighboring networks.

Wireless LANs have many advantages and challenges for the users and the network adminis-
trators. Some of the advantages and challenges are:
33

Advantages

 Wireless LANs provide mobility and flexibility to the users, as they can access the network
from anywhere within the coverage area without any cables or wires.
 Wireless LANs reduce the cost and complexity of data communication, as they do not require
any physical infrastructure or installation of cables or wires.
 Wireless LANs enable the users to connect various devices, such as laptops, tablets,
smartphones, etc., to the network using different wireless standards or protocols, such as Wi-
Fi, Bluetooth, etc.
 Wireless LANs support various applications, such as email, web browsing, video conferenc-
ing, gaming, etc., on the network or the internet.

Challenges

 Wireless LANs have a limited range and capacity, as they depend on the signal strength and
quality, the distance between the devices, the number of devices, etc.
 Wireless LANs require a lot of security and management by the network administrator to en-
sure the proper functioning and protection of the network.
 Wireless LANs may face various challenges and issues, such as signal interference, noise, at-
tenuation, multipath fading, hidden node problem, etc., on the network or the internet.
 Wireless LANs may have compatibility and interoperability problems with other networks or
devices that use different wireless standards or protocols.

11. Connecting LANs and Virtual LAN (VLAN)


11.1 Network Bridges and their Role

 A network bridge is a device that connects two or more LANs that use the same protocol or
technology, such as Ethernet.


 A network bridge operates at the data link layer of the OSI model and forwards data frames
between the LANs based on their MAC addresses.
 A network bridge has the following roles and functions:
o It extends the range and capacity of a LAN by connecting multiple segments or sub-
nets.
o It reduces the traffic and congestion on a LAN by filtering and forwarding only the
relevant frames to the destination segment or subnet.
o It improves the performance and reliability of a LAN by dividing it into smaller colli-
sion domains.
o It maintains a table of MAC addresses and ports to keep track of the devices on each
segment or subnet.
34

11.2 Routers: Interconnecting LANs and WANs

 A router is a device that connects two or more networks that use different protocols or tech-
nologies, such as Ethernet, Wi-Fi, ATM, etc.
 A router operates at the network layer of the OSI model and routes data packets between the
networks based on their IP addresses.
 A router has the following roles and functions:
o It enables data communication across different networks or internetworks, such as
LANs, WANs, or the internet.
o It determines the best path and routes the data packets using various algorithms, such
as shortest path, least cost, etc.
o It performs various functions, such as NAT (Network Address Translation), DHCP
(Dynamic Host Configuration Protocol), firewall, VPN (Virtual Private Network),
etc.
o It maintains a table of IP addresses and interfaces to keep track of the networks and
devices connected to it.
35

11.3 VLAN Concepts: Logical Segmentation, Benefits, and Implementation

 A VLAN (Virtual LAN) is a logical grouping of devices on a physical LAN that share com-
mon characteristics or requirements, such as location, department, function, security, etc.
 A VLAN operates at the data link layer of the OSI model and segments a physical LAN into
multiple logical LANs using software configuration rather than hardware wiring.


 A VLAN has the following concepts, benefits, and implementation methods:
o Logical segmentation: This is the process of dividing a physical LAN into multiple
logical LANs based on various criteria, such as IP address, MAC address, port num-
ber, protocol type, etc. Each logical LAN is assigned a unique identifier called VLAN
ID or VID. The devices on each logical LAN can communicate with each other as if
they are on the same physical LAN, but they cannot communicate with the devices on
other logical LANs unless they use a router or a switch that supports inter-VLAN
routing.
o Benefits: This is the advantages of using VLANs over traditional LANs. Some of the
benefits are:
 It increases the security and privacy of data communication by isolating dif-
ferent groups of devices from each other and preventing unauthorized access
or eavesdropping.
36

 It improves the performance and efficiency of data communication by reduc-


ing traffic and congestion on the physical LAN and enhancing quality of ser-
vice (QoS) for different applications or services.
 It provides flexibility and scalability of data communication by allowing easy
addition, removal, or relocation of devices without affecting the physical
LAN structure or configuration.
 It reduces the cost and complexity of data communication by using fewer ca-
bles, switches, routers, etc., and simplifying network management and trou-
bleshooting.
o Implementation: This is the methods of creating and managing VLANs on a physical
LAN. Some of the common methods are:
 Port-based VLAN: This is a method of assigning devices to VLANs based on
their physical ports on a switch. Each port can belong to only one VLAN.
This method is simple and easy to implement, but it is inflexible and ineffi-
cient when devices need to move or change VLANs frequently.
 MAC-based VLAN: This is a method of assigning devices to VLANs based
on their MAC addresses. Each MAC address can belong to only one VLAN.
This method is flexible and efficient when devices need to move or change
VLANs frequently, but it is complex and difficult to implement and maintain
when there are many devices with dynamic MAC addresses.
 IP-based VLAN: This is a method of assigning devices to VLANs based on
their IP addresses or subnets. Each IP address or subnet can belong to only
one VLAN. This method is flexible and efficient when devices need to move
or change VLANs frequently, but it requires coordination with the network
administrator and the router configuration to ensure proper routing and ad-
dressing.
37

UNIT-II: Data Link Layer and Medium Access Sub Layer


12. Error Detection and Error Correction
Learning Objectives

 Understand the importance of error detection and correction in data communication


 Learn the basic concepts and principles of block coding, hamming codes, and cyclic redun-
dancy check (CRC)
 Implement hamming codes and CRC in Python

Introduction

 Data communication is the process of transmitting and receiving data over a communication
channel, such as a wired or wireless network
 Data communication can be affected by various types of errors, such as noise, interference,
distortion, or corruption, that can alter or damage the transmitted data
 Error detection and correction are techniques that enable the sender and the receiver to detect
and correct errors in the transmitted data, ensuring reliable and accurate data communication
 Error detection is the process of identifying errors in the received data by using some extra
information or redundancy added by the sender
 Error correction is the process of recovering the original data from the received data by using
some additional information or redundancy added by the sender or by requesting retransmis-
sion

Block Coding

 Block coding is a technique that divides the data into fixed-size blocks of bits and adds some
extra bits to each block to form a code word
 The extra bits are called parity bits or check bits, and they are calculated based on some
rules or algorithms applied to the data bits
 The parity bits provide redundancy that can be used to detect and correct errors in the code
words
 The ratio of data bits to code bits is called the code rate, and it indicates the efficiency of the
block coding scheme
 A higher code rate means less redundancy and more efficiency, but also less error detection
and correction capability
38

 A lower code rate means more redundancy and less efficiency, but also more error detection
and correction capability

Hamming Codes
 Hamming codes are a type of block coding scheme that can detect and correct single-
bit errors in code words
 Hamming codes use even parity for calculating the parity bits, which means that the
number of 1s in a code word should be even
 Hamming codes follow these steps to generate code words from data bits:
o Determine the number of parity bits (p) needed for a given number of data bits (d) by
solving the equation: 2^p >= d + p + 1
o Assign the parity bits to positions that are powers of 2 in the code word, such as 1, 2,
4, 8, etc.
o Fill in the remaining positions with the data bits
o Calculate the value of each parity bit by using an exclusive OR (XOR) operation on
all the bits that have a 1 in its position
o For example, to generate a hamming code for d = 4 data bits (1011), we need p = 3
parity bits, and we get the following code word: p1 p2 d1 p3 d2 d3 d4 = 0111001
o The value of each parity bit is calculated as follows:
 p1 = d1 XOR d2 XOR d4 = 1 XOR 0 XOR 1 = 0
 p2 = d1 XOR d3 XOR d4 = 1 XOR 1 XOR 1 = 1
 p3 = d2 XOR d3 XOR d4 = 0 XOR 1 XOR 1 = 0
 Hamming codes follow these steps to detect and correct errors in code words:
o Recalculate the value of each parity bit by using an exclusive OR (XOR) operation on
all the bits that have a 1 in its position
o Compare the recalculated parity bits with the received parity bits
o If all the parity bits match, then there is no error in the code word
o If some parity bits do not match, then there is an error in the code word
o To locate the error bit, add up all the positions that have a mismatched parity bit
o The sum gives the position of the error bit in the code word
o To correct the error bit, flip its value from 0 to 1 or from 1 to 0
o For example, to detect and correct an error in a hamming code with p = 3 parity bits
and d = 4 data bits (0111001), we get the following steps:
 Recalculate the value of each parity bit as follows:
 p1’ = d1 XOR d2 XOR d4 = 1 XOR 0 XOR 1 = 0
 p2’ = d1 XOR d3 XOR d4 = 1 XOR 1 XOR 1 = 1
 p3’ = d2 XOR d3 XOR d4 = 0 XOR 1 XOR 1 = 0
 Compare the recalculated parity bits with the received parity bits as follows:
 p1’ = p1 (no mismatch)
 p2’ = p2 (no mismatch)
 p3’ != p3 (mismatch)
39

 Locate the error bit by adding up the positions that have a mismatched parity
bit as follows:
 Error bit position = 4 (only p3 has a mismatch)
 Correct the error bit by flipping its value as follows:
 Error bit value = d2 = 0
 Corrected bit value = d2’ = 1
 Corrected code word = 0111101

Cyclic Redundancy Check (CRC)


 Cyclic redundancy check (CRC) is another technique that can detect errors in data
transmission by using polynomial division
 CRC uses a predefined generator polynomial or divisor that is known to both the
sender and the receiver
 The generator polynomial has a degree of n, which means it has n + 1 terms and n + 1
coefficients
 The generator polynomial is represented by a binary string of n + 1 bits, where each
bit corresponds to a coefficient of the polynomial
 For example, the generator polynomial x^3 + x + 1 can be represented by the binary
string 1011
 CRC follows these steps to generate code words from data bits:
o Append n zeros to the right of the data bits, where n is the degree of the gener-
ator polynomial
o Perform a binary division of the modified data bits by the generator polyno-
mial, using XOR operations instead of subtraction
o The remainder of the division is called the CRC or checksum, and it has n
bits
o Append the CRC to the right of the original data bits to form the code word
o For example, to generate a CRC code for d = 7 data bits (1101011) using the
generator polynomial x^3 + x + 1 (1011), we need n = 3 zeros and we get the
following code word: d CRC = 1101011 001
o The binary division is performed as follows:
40

o 1101011000 / 1011 = 1010100 (quotient)


o ____
o 1011 |1101011000
o ---- 1011
o 0110 ----
o ---- 01100
o 0110 ----
o ---- 011000
o 0100 ----
o ---- 010000
o 0100 ----
o ---- 010001 (remainder or CRC)
o 0001
o ----
o 001
 CRC follows these steps to detect errors in code words:
o Perform a binary division of the received code word by the same generator
polynomial, using XOR operations instead of subtraction
o If the remainder of the division is zero, then there is no error in the code word
o If the remainder of the division is non-zero, then there is an error in the code
word

o
o For example, to detect an error in a CRC code with d = 7 data bits and n = 3
checksum bits (1101011 001) using the generator polynomial x^3 + x + 1
(1011), we get the following steps:
 Perform a binary division as follows:

1101011001 / 1011 = 1010100 (quotient)

____

1011 |1101011001

---- 1011

0110 ----
41

---- 01101

0110 ----

---- 011010

0100 ----

---- 010010

0100 ----

---- 010011 (remainder or CRC)

0011

----

???

 The remainder is non-zero (0011), which means there is an error in the code
word

Python Implementation

 Here is a Python program that implements hamming codes and CRC for error detection and
correction

Python

# Define a function to calculate hamming codes for data bits

def hamming_code(data_bits):

# Convert data bits to a list of integers

data_bits = [int(b) for b in data_bits]

# Determine the number of parity bits needed by solving:

# p >= log2(d + p + 1), where p is parity bits and d is data bits

# Use a while loop to increment p until the condition is satisfied


42

p = 0 # Initialize parity bits to zero

# Define a function to calculate hamming codes for data bits

def hamming_code(data_bits):

# Convert data bits to a list of integers

data_bits = [int(b) for b in data_bits]

# Determine the number of parity bits needed by solving:

# p >= log2(d + p + 1), where p is parity bits and d is data bits

# Use a while loop to increment p until the condition is satisfied

p = 0 # Initialize parity bits to zero

while 2**p < len(data_bits) + p + 1:

p += 1 # Increment parity bits by one

# Assign the parity bits to positions that are powers of 2 in the code word

# Fill in the remaining positions with the data bits

# Use a list comprehension to generate the code word as a list of integers

code_word = [0] * (len(data_bits) + p) # Initialize code word with zeros

j = 0 # Initialize index for data bits

for i in range(1, len(code_word) + 1): # Loop through each position in code


word

if i & (i - 1) == 0: # Check if i is a power of 2 using bitwise AND operation

code_word[i - 1] = -1 # Assign -1 to indicate a parity bit position

else:

code_word[i - 1] = data_bits[j] # Assign a data bit to the position


43

j += 1 # Increment index for data bits

# Calculate the value of each parity bit by using an XOR operation on all the bits
that have a 1 in its position

# Use a for loop to iterate through each parity bit position

for i in range(p): # Loop through each parity bit position

pos = 2**i # Calculate the actual position of the parity bit in the code word

val = 0 # Initialize the value of the parity bit to zero

for j in range(1, len(code_word) + 1): # Loop through each bit in the code
word

if j & pos == pos: # Check if the bit has a 1 in the parity bit position using
bitwise AND operation

val ^= code_word[j - 1] # Perform an XOR operation with the bit value


and the parity bit value

code_word[pos - 1] = val # Assign the calculated value to the parity bit posi-
tion

# Return the code word as a string of bits

return "".join([str(b) for b in code_word])

# Define a function to detect and correct errors in hamming codes

def hamming_error(code_word):

# Convert code word to a list of integers

code_word = [int(b) for b in code_word]

# Determine the number of parity bits by finding the largest power of 2 that is
less than or equal to the length of the code word
44

p = int(math.log(len(code_word), 2)) # Use math.log function to calculate loga-


rithm

# Recalculate the value of each parity bit by using an XOR operation on all the
bits that have a 1 in its position

# Compare the recalculated parity bits with the received parity bits

# Use a list to store the positions that have a mismatched parity bit

error_pos = [] # Initialize an empty list to store error positions

for i in range(p): # Loop through each parity bit position

pos = 2**i # Calculate the actual position of the parity bit in the code word

val = 0 # Initialize the value of the parity bit to zero

for j in range(1, len(code_word) + 1): # Loop through each bit in the code
word

if j & pos == pos: # Check if the bit has a 1 in the parity bit position using
bitwise AND operation

val ^= code_word[j - 1] # Perform an XOR operation with the bit value


and the parity bit value

if val != code_word[pos - 1]: # Compare the recalculated value with the re-
ceived value

error_pos.append(pos) # Append the position to the error list

# Locate the error bit by adding up all the positions that have a mismatched par-
ity bit

error_bit = sum(error_pos) # Sum up all the error positions

# Correct the error bit by flipping its value from 0 to 1 or from 1 to 0

if error_bit != 0: # Check if there is an error bit


45

code_word[error_bit - 1] ^= 1 # Flip the value of the error bit using XOR op-
eration

# Return the corrected code word and the error bit position as a tuple

return ("".join([str(b) for b in code_word]), error_bit)

# Define a function to calculate CRC codes for data bits using a generator polyno-
mial

def crc_code(data_bits, generator):

# Convert data bits and generator to lists of integers

data_bits = [int(b) for b in data_bits]

generator = [int(b) for b in generator]

# Determine the degree of the generator polynomial by finding the length of the
generator minus one

n = len(generator) - 1

# Append n zeros to the right of the data bits

modified_data = data_bits + [0] * n

# Perform a binary division of the modified data by the generator, using XOR
operations instead of subtraction

# Use a for loop to iterate through each bit of the modified data, starting from the
leftmost bit

# Use a list to store the remainder of the division

remainder = [] # Initialize an empty list to store the remainder

for i in range(len(modified_data)): # Loop through each bit of the modified data


46

bit = modified_data[i] # Get the current bit

if len(remainder) == 0: # Check if the remainder is empty

remainder.append(bit) # Append the bit to the remainder

else:

if remainder[0] == 1: # Check if the leftmost bit of the remainder is 1

remainder = [r ^ g for r, g in zip(remainder + [bit], generator + [0])] #


Perform an XOR operation on the remainder and the generator, and append the bit

while remainder[0] == 0 and len(remainder) > 1: # Remove any leading


zeros from the remainder

remainder.pop(0)

else:

remainder.append(bit) # Append the bit to the remainder

# Append the remainder to the right of the original data bits to form the code
word

code_word = data_bits + remainder

# Return the code word as a string of bits

return "".join([str(b) for b in code_word])

# Define a function to detect errors in CRC codes using a generator polynomial

def crc_error(code_word, generator):

# Convert code word and generator to lists of integers

code_word = [int(b) for b in code_word]

generator = [int(b) for b in generator]


47

# Perform a binary division of the received code word by the same generator,
using XOR operations instead of subtraction

# Use a for loop to iterate through each bit of the code word, starting from the
leftmost bit

# Use a list to store the remainder of the division

remainder = [] # Initialize an empty list to store the remainder

for i in range(len(code_word)): # Loop through each bit of the code word

bit = code_word[i] # Get the current bit

if len(remainder) == 0: # Check if the remainder is empty

remainder.append(bit) # Append the bit to the remainder

else:

if remainder[0] == 1: # Check if the leftmost bit of the remainder is 1

remainder = [r ^ g for r, g in zip(remainder + [bit], generator + [0])] #


Perform an XOR operation on the remainder and the generator, and append the bit

while remainder[0] == 0 and len(remainder) > 1: # Remove any leading


zeros from the remainder

remainder.pop(0)

else:

remainder.append(bit) # Append the bit to the remainder

# Check if there is an error in the code word by comparing the remainder with
zero

error = any(remainder) # Use any function to check if any element in the remain-
der is non-zero

# Return a boolean value indicating whether there is an error or not

return error
48

```

13. Flow Control


Learning Objectives

 Understand the need for flow control in data communication


 Learn the basic concepts and principles of flow control mechanisms, such as stop-and-wait
and sliding window
 Compare the advantages and limitations of different flow control techniques

Introduction

 Flow control is a technique that regulates the amount and rate of data transmission between a
sender and a receiver
 Flow control ensures that the sender does not overwhelm the receiver with more data than it
can process or store
 Flow control also prevents data loss or corruption due to buffer overflow, congestion, or er-
rors in the communication channel
 Flow control can be implemented at different layers of the network architecture, such as the
data link layer or the transport layer
 Flow control can be classified into two types: feedback-based and rate-based
o Feedback-based flow control relies on the receiver to send feedback messages to the
sender, indicating its readiness or capacity to receive more data
o Rate-based flow control relies on the sender to estimate the available bandwidth or
congestion level of the channel, and adjust its transmission rate accordingly

Stop-and-Wait

 Stop-and-wait is a simple and basic feedback-based flow control mechanism that works as
follows:
o The sender sends one frame of data to the receiver and waits for an acknowledgment
(ACK) from the receiver before sending the next frame
o The receiver sends an ACK to the sender after receiving and processing a frame suc-
cessfully
o If the sender does not receive an ACK within a specified time interval, called
the timeout, it assumes that the frame was lost or corrupted, and retransmits the same
frame
o If the receiver receives a duplicate frame, it discards it and sends an ACK for the pre-
vious frame
o To distinguish between original and duplicate frames, each frame is assigned a se-
quence number (0 or 1) that alternates between successive frames
49

 Stop-and-wait has some advantages and limitations, such as:


o Advantages:
 It is easy to implement and understand
 It guarantees reliable delivery of data frames
 It avoids buffer overflow at the receiver side
o Limitations:
 It has low efficiency and utilization of the channel bandwidth, as the sender
has to wait for an ACK after each frame
 It suffers from long delays due to propagation time, processing time, and
timeout intervals
 It cannot handle multiple senders or receivers simultaneously

Sliding Window

 Sliding window is a more advanced and efficient feedback-based flow control mechanism that
works as follows:
o The sender maintains a window of frames that it can send without waiting for an
ACK from the receiver
o The window size is determined by the receiver’s buffer capacity, which is communi-
cated to the sender through feedback messages
o The sender slides its window forward as it receives ACKs from the receiver, allowing
it to send new frames
o The receiver also maintains a window of frames that it can receive and process
o The receiver slides its window forward as it sends ACKs to the sender, indicating its
readiness to receive new frames
o If a frame is lost or corrupted, the sender retransmits all the frames in its window after
a timeout, or after receiving a negative acknowledgment (NAK) from the receiver
o To distinguish between different frames, each frame is assigned a sequence number
that ranges from 0 to 2^n - 1, where n is the number of bits used for sequence num-
bers
50

 Sliding window has some advantages and limitations, such as:


o Advantages:
 It improves efficiency and utilization of the channel bandwidth, as the sender
can send multiple frames without waiting for an ACK from the receiver
 It reduces delays due to propagation time, processing time, and timeout inter-
vals
 It can handle multiple senders or receivers simultaneously by using different
sequence numbers or addresses
o Limitations:
 It is more complex to implement and understand than stop-and-wait
 It requires more memory and computation at both ends to maintain and up-
date windows and sequence numbers
 It may still cause buffer overflow or congestion if the window size is too
large or too small

14. Error Control Protocols


Learning Objectives

 Understand the difference between error detection and error control in data communication
 Learn the basic concepts and principles of error control protocols, such as stop-and-wait
ARQ, go-back-N ARQ, and selective repeat ARQ
 Compare the performance and trade-offs of different error control protocols
51

Introduction

 Error control is a technique that ensures reliable and accurate data transmission between a
sender and a receiver
 Error control involves both error detection and error correction, as well as retransmission and
acknowledgment of data frames
 Error control protocols are algorithms that define the rules and procedures for error detection,
correction, retransmission, and acknowledgment
 Error control protocols can be classified into two types: automatic repeat request
(ARQ) and forward error correction (FEC)
o ARQ protocols rely on the receiver to send feedback messages to the sender, indicat-
ing whether a frame was received correctly or not
o If a frame was received incorrectly, the sender retransmits the frame until it is re-
ceived correctly
o ARQ protocols can be further divided into three types: stop-and-wait ARQ, go-back-
N ARQ, and selective repeat ARQ
o FEC protocols rely on the sender to add extra information or redundancy to the data
frames, enabling the receiver to correct errors without requesting retransmission
o FEC protocols use techniques such as block coding, hamming codes, or CRC for er-
ror correction

Stop-and-Wait ARQ

 Stop-and-wait ARQ is a simple and basic ARQ protocol that works as follows:
o The sender sends one frame of data to the receiver and waits for an acknowledgment
(ACK) from the receiver before sending the next frame
o The receiver sends an ACK to the sender after receiving and processing a frame suc-
cessfully

o
52

o If the sender does not receive an ACK within a specified time interval, called
the timeout, it assumes that the frame was lost or corrupted, and retransmits the same
frame
o If the receiver receives a duplicate frame, it discards it and sends an ACK for the pre-
vious frame
o To distinguish between original and duplicate frames, each frame is assigned a se-
quence number (0 or 1) that alternates between successive frames
 Stop-and-wait ARQ has some advantages and limitations, such as:
o Advantages:
 It is easy to implement and understand
 It guarantees reliable delivery of data frames
 It avoids buffer overflow at the receiver side
o Limitations:
 It has low efficiency and utilization of the channel bandwidth, as the sender
has to wait for an ACK after each frame
 It suffers from long delays due to propagation time, processing time, and
timeout intervals
 It cannot handle multiple senders or receivers simultaneously

Go-Back-N ARQ

 Go-back-N ARQ is a more advanced and efficient ARQ protocol that works as follows:
o The sender maintains a window of frames that it can send without waiting for an
ACK from the receiver
o The window size is determined by the maximum number of frames that can be sent
before receiving an ACK, called the window size
o The sender slides its window forward as it receives ACKs from the receiver, allowing
it to send new frames
o The receiver sends a cumulative ACK to the sender after receiving a frame success-
fully, indicating the next expected sequence number
o If a frame is lost or corrupted, the receiver discards all subsequent frames until it re-
ceives the missing frame
o The sender retransmits all the frames in its window after a timeout, or after receiving
a duplicate ACK from the receiver
o To distinguish between different frames, each frame is assigned a sequence number
that ranges from 0 to 2^n - 1, where n is the number of bits used for sequence num-
bers
53

 Go-back-N ARQ has some advantages and limitations, such as:


o Advantages:
 It improves efficiency and utilization of the channel bandwidth, as the sender
can send multiple frames without waiting for an ACK from the receiver
 It reduces delays due to propagation time, processing time, and timeout inter-
vals
 It can handle multiple senders or receivers simultaneously by using different
sequence numbers or addresses
o Limitations:
 It is more complex to implement and understand than stop-and-wait ARQ
 It requires more memory and computation at both ends to maintain and up-
date windows and sequence numbers
 It may cause unnecessary retransmissions of frames that were received cor-
rectly by the receiver but discarded due to a missing frame

15. Sliding Window Protocol

Learning Objectives

 Understand the concepts and principles of sliding window technique for data transmission
 Learn how to determine the optimal window size and its impact on data transmission effi-
ciency and reliability
 Compare and contrast the two window management strategies: selective repeat and go-back-
N

Introduction

 Sliding window protocol is a technique that regulates the amount and rate of data transmis-
sion between a sender and a receiver
 Sliding window protocol uses a window of frames that can be sent or received at a time, with-
out waiting for an acknowledgment (ACK) or a negative acknowledgment (NAK)
54


 Sliding window protocol ensures reliable and efficient data transmission by using feedback
messages, sequence numbers, timers, and retransmission mechanisms
 Sliding window protocol can be implemented at different layers of the network architecture,
such as the data link layer or the transport layer
 Sliding window protocol can be classified into two types: sender-initiated and receiver-initi-
ated
o Sender-initiated sliding window protocol relies on the sender to determine the win-
dow size and slide the window forward as it receives ACKs from the receiver
o Receiver-initiated sliding window protocol relies on the receiver to determine the
window size and slide the window forward as it sends ACKs to the sender

Window Size and its Impact on Data Transmission

 Window size is a parameter that determines how many frames can be sent or received at a
time, without waiting for an acknowledgment or a negative acknowledgment
55


 Window size affects the efficiency and reliability of data transmission, as well as the com-
plexity and overhead of sliding window protocol
 Window size can be determined by various factors, such as:
o The capacity of the sender’s and receiver’s buffers
o The bandwidth of the communication channel
o The propagation delay of the communication channel
o The error rate of the communication channel
o The feedback mechanism used by the sender and receiver
 Window size can be calculated by using various formulas, such as:
o Window size = 1 + 2 * (bandwidth * delay product), where bandwidth is the data rate
of the channel in bits per second, and delay is the round-trip time of a frame in sec-
onds
o Window size = 1 + 2 * (bandwidth * delay product) / (frame size), where frame size
is the number of bits in a frame
o Window size = min(sender buffer, receiver buffer), where sender buffer and receiver
buffer are the number of frames that can be stored at each end
 Window size has some advantages and limitations, such as:
o Advantages:
 A larger window size can improve efficiency and utilization of the channel
bandwidth, as more frames can be sent or received without waiting for feed-
back messages
 A larger window size can reduce delays due to propagation time, processing
time, and timeout intervals, as fewer feedback messages are needed
 A smaller window size can improve reliability and accuracy of data transmis-
sion, as fewer frames are affected by errors or losses
 A smaller window size can avoid buffer overflow or congestion at either end,
as fewer frames are stored in memory
o Limitations:
 A larger window size can increase complexity and overhead of sliding win-
dow protocol, as more sequence numbers, timers, and retransmission mecha-
nisms are needed
 A larger window size can cause unnecessary retransmissions of frames that
were received correctly but discarded due to a missing frame
56

 A smaller window size can decrease efficiency and utilization of the channel
bandwidth, as fewer frames can be sent or received without waiting for feed-
back messages
 A smaller window size can increase delays due to propagation time, pro-
cessing time, and timeout intervals, as more feedback messages are needed

Piggybacking

 Piggybacking is a technique that optimizes the acknowledgment messages in data frames


by combining them with data frames sent in the opposite direction.

 Piggybacking reduces the overhead and improves the efficiency of data transmission
by avoiding sending separate acknowledgment frames.

 Piggybacking works as follows:

o Suppose there are two stations, A and B, that want to exchange data frames.

o Station A sends a data frame to station B and waits for an acknowledgment.

o Station B receives the data frame and stores the acknowledgment in a buffer.

o If station B has a data frame to send to station A, it piggybacks the acknowledgment


on the data frame and sends it to station A.

o Station A receives the data frame and extracts the acknowledgment from it.
o If station B does not have a data frame to send to station A, it waits for a timeout pe-
riod before sending a separate acknowledgment frame to station A.
 The advantage of piggybacking is that it reduces the number of frames sent on the channel
and saves bandwidth.

 The disadvantage of piggybacking is that it increases the delay of acknowledgment messages


and may cause unnecessary retransmissions if the timeout period is too short.

 An example of piggybacking is shown in the following diagram:


57

Random Access

 Random access protocols are a class of medium access control (MAC) protocols that allow
multiple stations to share a common channel without any coordination or reservation.

 Random access protocols are suitable for bursty and unpredictable traffic, where stations
have variable and independent data rates.

 Random access protocols are based on the principle of contention, where stations compete
for the channel and resolve any collisions that may occur.

 Random access protocols can be classified into two types: unslotted and slotted.

o Unslotted protocols do not divide the channel into fixed time slots, and stations can
transmit at any time.

o Slotted protocols divide the channel into fixed time slots, and stations can transmit
only at the beginning of a slot.

Pure ALOHA and Slotted ALOHA

 Pure ALOHA and Slotted ALOHA are two examples of random access protocols that were de-
veloped for wireless networks.

 Pure ALOHA works as follows:

o Stations transmit their frames whenever they have data to send, without checking
the channel status.

o If a station receives an acknowledgment from the receiver, it knows that the trans-
mission was successful.

o If a station does not receive an acknowledgment within a specified time, it assumes


that a collision occurred and retransmits the frame after a random delay.

 Slotted ALOHA works as follows:

o Stations synchronize their clocks and transmit their frames only at the beginning of a
slot, which is equal to the frame transmission time.

o If a station receives an acknowledgment from the receiver, it knows that the trans-
mission was successful.

o If a station does not receive an acknowledgment within a slot, it assumes that a colli-
sion occurred and retransmits the frame after a random delay.

 The advantage of Slotted ALOHA over Pure ALOHA is that it reduces the collision probabil-
ity by avoiding overlapping transmissions.

 The disadvantage of Slotted ALOHA over Pure ALOHA is that it requires clock synchroniza-
tion among stations.

 The maximum efficiency of Pure ALOHA is 18%, which means that only 18% of the channel
capacity can be utilized by successful transmissions.

 The maximum efficiency of Slotted ALOHA is 37%, which means that only 37% of the channel
capacity can be utilized by successful transmissions.
58

 An example of Pure ALOHA and Slotted ALOHA is shown in the following diagram:

Carrier Sense Multiple Access (CSMA)

 Carrier Sense Multiple Access (CSMA) is another example of random access protocol that
was developed for wired networks.

 CSMA works as follows:

o Stations sense the channel before transmitting their frames, and transmit only if the
channel is idle.

o If a station detects a collision while transmitting, it aborts the transmission and re-
transmits the frame after a random delay.

 CSMA can be further classified into three types: 1-persistent CSMA, non-persistent CSMA,
and p-persistent CSMA.
59

o 1-persistent CSMA works as follows:

 Stations sense the channel continuously, and transmit as soon as the channel
becomes idle.

 If a collision occurs, stations retransmit the frame after a random delay.

 1-persistent CSMA is aggressive and has a high collision probability.

o Non-persistent CSMA works as follows:

 Stations sense the channel at discrete intervals, and transmit if the channel
is idle.

 If the channel is busy, stations wait for a random delay and sense the chan-
nel again.

 If a collision occurs, stations retransmit the frame after a random delay.

 Non-persistent CSMA is conservative and has a low collision probability, but


also a high channel idle time.

o p-persistent CSMA works as follows:

 Stations sense the channel at discrete intervals, and transmit with a proba-
bility p if the channel is idle.

 If the channel is busy or the station decides not to transmit, stations wait for
the next slot and repeat the process.

 If a collision occurs, stations retransmit the frame after a random delay.

 p-persistent CSMA is adaptive and has a moderate collision probability and


a moderate channel idle time.

 The advantage of CSMA over ALOHA is that it reduces the collision probability by sensing
the channel before transmitting.

 The disadvantage of CSMA over ALOHA is that it requires carrier sensing capability among
stations.

 An example of CSMA is shown in the following diagram:


60

Summary

 In this lecture, we have learned about two techniques for data transmission: piggybacking
and random access.

 Piggybacking is a technique that optimizes the acknowledgment messages in data frames by


combining them with data frames sent in the opposite direction.

 Random access protocols are a class of medium access control protocols that allow multiple
stations to share a common channel without any coordination or reservation.

 We have discussed two examples of random access protocols for wireless networks: Pure
ALOHA and Slotted ALOHA, which are based on contention and retransmission.

 We have also discussed another example of random access protocol for wired networks: Car-
rier Sense Multiple Access (CSMA), which is based on sensing and transmission.

Multiple Access Protocols

 Multiple access protocols are a class of medium access control (MAC) protocols that allow
multiple stations to share a common channel without interfering with each other.
 Multiple access protocols are essential for efficient and fair utilization of the channel re-
sources and for reliable and secure communication among stations.
 Multiple access protocols can be classified into three categories: channel partitioning, ran-
dom access, and demand access.
o Channel partitioning protocols divide the channel into smaller units, such as time
slots, frequency bands, or codes, and assign them to different stations.
o Random access protocols allow stations to transmit their frames whenever they have
data to send, without any reservation or coordination, and resolve any collisions that
may occur.
o Demand access protocols allow stations to request the channel before transmitting
their frames, and grant the channel to one or more stations based on some criteria.

CSMA/CD (Collision Detection)

 CSMA/CD (Carrier Sense Multiple Access with Collision Detection) is a type of random ac-
cess protocol that is used in Ethernet networks.
61

 CSMA/CD works as follows:


o Stations sense the channel before transmitting their frames, and transmit only if the
channel is idle.
o Stations monitor the channel while transmitting their frames, and detect any collisions
that may occur.
o If a station detects a collision, it aborts the transmission and sends a jamming sig-
nal to notify other stations of the collision.
o Stations wait for a random delay, called backoff time, before attempting to retransmit
their frames.
 The advantage of CSMA/CD is that it adapts to the traffic load and maximizes the channel
utilization when the traffic is low or moderate.
 The disadvantage of CSMA/CD is that it wastes bandwidth and increases delay when the
traffic is high or the channel is long.
 An example of CSMA/CD is shown in the following diagram:

CSMA/CA (Collision Avoidance)

 CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) is another type of ran-
dom access protocol that is used in Wi-Fi networks.
 CSMA/CA works as follows:
o Stations sense the channel before transmitting their frames, and transmit only if the
channel is idle for a certain period, called Distributed Interframe Space (DIFS).
o Stations perform a random delay, called backoff time, before transmitting their
frames, to reduce the collision probability.
o Stations send a short frame, called Request to Send (RTS), to the receiver, and wait
for a short frame, called Clear to Send (CTS), from the receiver.
o Stations transmit their frames only after receiving the CTS from the receiver, and wait
for an acknowledgment from the receiver.
o Stations defer their transmissions for a certain period, called Network Allocation
Vector (NAV), after hearing an RTS or CTS from other stations.
 The advantage of CSMA/CA is that it avoids collisions by using RTS/CTS handshake and
NAV mechanism.
62

 The disadvantage of CSMA/CA is that it introduces overhead and reduces throughput by


using RTS/CTS handshake and backoff time.
 An example of CSMA/CA is shown in the following diagram:

Comparison and Use Cases

 CSMA/CD and CSMA/CA are two different approaches to deal with the problem of collisions
in random access protocols.
 CSMA/CD relies on detecting collisions and retransmitting frames after a random delay. It is
suitable for wired networks where collision detection is feasible and propagation delay is
small. It is used in Ethernet networks that use coaxial cables or twisted pair cables as the
physical medium.
 CSMA/CA relies on avoiding collisions by using RTS/CTS handshake and NAV mechanism.
It is suitable for wireless networks where collision detection is difficult and propagation delay
is large. It is used in Wi-Fi networks that use radio waves as the physical medium.
63

Summary

 In this lecture, we have learned about multiple access protocols, which are a class of medium
access control protocols that allow multiple stations to share a common channel without inter-
fering with each other.
 We have discussed two examples of random access protocols: CSMA/CD and CSMA/CA,
which are used in Ethernet and Wi-Fi networks respectively.
 We have compared the advantages and disadvantages of CSMA/CD and CSMA/CA, and ex-
plained their use cases.
64

UNIT-III: Network Layer


Switching

 Switching is a process that connects multiple devices in a network and transfers data from
one device to another.
 Switching can be classified into two types: circuit switching and packet switching.
o Circuit switching is a type of switching that establishes a dedicated physical
path between the source and the destination devices for the duration of the communi-
cation.
o Packet switching is a type of switching that divides the data into smaller
units called packets and sends them independently over the network.
 Circuit switching and packet switching can be further classified into two types: virtual cir-
cuits and datagram networks.
o Virtual circuits are a type of circuit switching or packet switching that maintains a
logical connection between the source and the destination devices for the duration of
the communication.
o Datagram networks are a type of packet switching that does not maintain any con-
nection between the source and the destination devices and treats each packet sepa-
rately.
 The advantages and disadvantages of circuit switching and packet switching are as follows:
o Circuit switching has the advantages of guaranteed quality of service, no conges-
tion, and simple routing, but the disadvantages of low utilization, high setup time,
and lack of flexibility.
o Packet switching has the advantages of high utilization, low setup time, and flexi-
bility, but the disadvantages of variable quality of service, congestion, and complex
routing.
 An example of circuit switching is shown in the following diagram:

 An example of packet switching is shown in the following diagram:


65

Logical Addressing

 Logical addressing is a process that assigns a unique identifier to each device in a network
and enables communication among devices across different networks.
 Logical addressing can be classified into two types: IPv4 addressing and IPv6 addressing.
o IPv4 addressing is a type of logical addressing that uses a 32-bit binary number to
identify each device in an IPv4 network. An IPv4 address consists of two parts: net-
work ID and host ID. The network ID identifies the network to which the device be-
longs, and the host ID identifies the device within the network. An IPv4 address is
usually written in dotted decimal notation, where each byte is converted to its decimal
equivalent and separated by dots. For example, 192.168.1.100 is an IPv4 address.
o IPv6 addressing is a type of logical addressing that uses a 128-bit binary number to
identify each device in an IPv6 network. An IPv6 address consists of eight groups of
four hexadecimal digits, separated by colons. Each group represents 16 bits of the ad-
dress. An IPv6 address can be abbreviated by omitting leading zeros within each
group and replacing consecutive groups of zeros with a double colon. For example,
2001:0db8:0000:0000:0000:ff00:0042:8329 is an IPv6 address, which can be abbrevi-
ated as 2001:db8::ff00:42:8329.
 IPv6 addressing has some features and advantages over IPv4 addressing, such as:
o Larger address space: IPv6 can support 2^128 (approximately 3.4 x 10^38) addresses,
compared to IPv4’s 2^32 (approximately 4.3 x 10^9) addresses, which allows more
devices to connect to the Internet and avoids address exhaustion.
o Hierarchical structure: IPv6 has a hierarchical structure that allows efficient routing
and aggregation of addresses. An IPv6 address consists of three parts: global routing
prefix, subnet ID, and interface ID. The global routing prefix identifies the network
provider, the subnet ID identifies the subnet within the provider’s network, and the
interface ID identifies the device within the subnet.
o Stateless address autoconfiguration: IPv6 allows devices to automatically configure
their own addresses without relying on a server, such as DHCP. A device can generate
its own interface ID based on its MAC address or a random number, and append it to
the subnet prefix obtained from a router advertisement message.
o Improved security: IPv6 supports IPsec, which is a protocol suite that provides au-
thentication, encryption, and integrity protection for IP packets. IPsec can be applied
66

to all IPv6 communications, unlike IPv4, where it is optional and requires additional
configuration.
 Subnetting and supernetting are two techniques that allow efficient allocation and manage-
ment of IP addresses in a network.
o Subnetting is a technique that divides a large network into smaller subnetworks, or
subnets, by extending the network ID portion of an IP address. Subnetting allows bet-
ter utilization of the address space, reduces network congestion, and enhances secu-
rity and administration.
o Supernetting is a technique that combines multiple contiguous networks into a larger
network, or supernetwork, by reducing the network ID portion of an IP address. Su-
pernetting allows better aggregation of the address space, reduces routing table size,
and improves routing performance.
 An example of subnetting is shown in the following diagram:

 An example of supernetting is shown in the following diagram:

Address Mapping

 Address mapping is a process that converts one type of address to another in a network.
Address mapping can be classified into four types: Address Resolution Protocol (ARP), Re-
verse Address Resolution Protocol (RARP), Bootstrap Protocol (BOOTP), and Dynamic
Host Configuration Protocol (DHCP).
o Address Resolution Protocol (ARP) is a protocol that maps an IPv4 address to a
MAC address in a local area network (LAN). ARP allows devices to communicate
67

with each other within the same LAN without knowing their MAC addresses before-
hand. ARP works as follows:
 A device that wants to communicate with another device sends an ARP re-
quest message, which contains the sender’s IPv4 and MAC addresses and the
target’s IPv4 address, to the broadcast MAC address (FF:FF:FF:FF:FF:FF).
 All devices in the LAN receive the ARP request message, but only the device
that has the target IPv4 address responds with an ARP reply message, which
contains the sender’s and target’s IPv4 and MAC addresses.


 The device that sent the ARP request message receives the ARP reply mes-
sage and updates its ARP cache, which is a table that stores the mappings of
IPv4 and MAC addresses. The device then uses the target MAC address to
send data frames to the target device.
o Reverse Address Resolution Protocol (RARP) is a protocol that maps a MAC address
to an IPv4 address in a LAN. RARP allows devices that do not have an IPv4 address,
such as diskless workstations, to obtain one from a RARP server. RARP works as fol-
lows:
 A device that wants to obtain an IPv4 address sends a RARP request mes-
sage, which contains the sender’s MAC address and a placeholder for the
sender’s IPv4 address, to the broadcast MAC address (FF:FF:FF:FF:FF:FF).
68


 A RARP server in the LAN receives the RARP request message and searches
its RARP table, which is a table that stores the mappings of MAC and IPv4
addresses. If it finds a match, it responds with a RARP reply message, which
contains the sender’s MAC and IPv4 addresses.
 The device that sent the RARP request message receives the RARP reply
message and configures its IPv4 address accordingly.
o Bootstrap Protocol (BOOTP) is a protocol that maps a MAC address to an IPv4 ad-
dress and other configuration parameters in a LAN or a wide area network (WAN).
BOOTP allows devices that do not have an IPv4 address or other configuration pa-
rameters, such as diskless workstations or routers, to obtain them from a BOOTP
server. BOOTP works as follows:
 A device that wants to obtain an IPv4 address and other configuration param-
eters sends a BOOTP request message, which contains the sender’s MAC ad-
dress and other information, such as requested parameters and relay agent in-
formation, to the broadcast IP address (255.255.255.255).
 A BOOTP server in the LAN or WAN receives the BOOTP request message
and searches its BOOTP table, which is a table that stores the mappings of
MAC addresses and configuration parameters. If it finds a match, it responds
with a BOOTP reply message, which contains the sender’s MAC address and
configuration parameters.
 The device that sent the BOOTP request message receives the BOOTP reply
message and configures its IPv4 address and other parameters accordingly.
o Dynamic Host Configuration Protocol (DHCP) is a protocol that dynamically assigns
an IPv4 address and other configuration parameters to devices in a LAN or WAN.
DHCP allows devices to obtain an IPv4 address and other configuration parameters
without manual intervention or preconfiguration. DHCP works as follows:
 A device that wants to obtain an IPv4 address and other configuration param-
eters sends a DHCP discover message, which contains the sender’s MAC ad-
dress and other information, such as requested parameters and relay agent in-
formation, to the broadcast IP address (255.255.255.255).
 One or more DHCP servers in the LAN or WAN receive the DHCP discover
message and respond with a DHCP offer message, which contains an availa-
ble IPv4 address and other configuration parameters for the sender.
69

 The device that sent the DHCP discover message receives one or more DHCP offer
messages and selects one of them based on some criteria. It then sends a DHCP re-
quest message, which contains the selected IPv4 address and other configuration pa-
rameters, to the broadcast IP address (255.255.255.255).
 The DHCP server that offered the selected IPv4 address receives the DHCP request
message and verifies that the IPv4 address is still available. It then sends a DHCP
acknowledge message, which confirms the assignment of the IPv4 address and other
configuration parameters to the sender.
 The device that sent the DHCP request message receives the DHCP acknowledge
message and configures its IPv4 address and other parameters accordingly. It also up-
dates its lease time, which is the duration for which the IPv4 address is valid.
 An example of DHCP is shown in the following diagram:

Delivery

 Delivery is a process that transfers packets from the source device to the destination device
in a network.
 Delivery can be classified into two types: direct delivery and indirect delivery.
o Direct delivery is a type of delivery that occurs when the source and the destination
devices are in the same network. In direct delivery, the source device sends the
packet to the destination device using its MAC address as the destination address.
o Indirect delivery is a type of delivery that occurs when the source and the destination
devices are in different networks. In indirect delivery, the source device sends the
packet to a router, which forwards the packet to another router or the destination de-
vice using its MAC address as the destination address.
 An example of direct delivery is shown in the following diagram:

 An example of indirect delivery is shown in the following diagram:


70

Forwarding

 Forwarding is a process that determines the next hop for a packet in a network and sends
the packet to that hop.
 Forwarding is performed by routers, which are devices that connect multiple networks and
forward packets between them.
 Forwarding relies on a data structure called a forwarding table, which stores the mappings of
destination network IDs and next hop addresses.
 A forwarding table can be created and updated by using static or dynamic methods.
o Static methods involve manually configuring the forwarding table entries by an ad-
ministrator. Static methods are simple, secure, and consistent, but they are also inflex-
ible, error-prone, and inefficient.
o Dynamic methods involve automatically updating the forwarding table entries by us-
ing routing protocols. Dynamic methods are flexible, adaptive, and efficient, but they
are also complex, insecure, and inconsistent.
 An example of a forwarding table is shown in the following table:

Destination Network ID Next Hop Address

192.168.1.0/24 192.168.1.1

192.168.2.0/24 192.168.2.1

192.168.3.0/24 192.168.3.1

0.0.0.0/0 192.168.4.1

 The forwarding process works as follows:


o A router receives a packet from a source device or another router.
71

o The router extracts the destination IP address from the packet header and matches it
with the destination network ID in the forwarding table.
o If there is an exact match, the router sends the packet to the next hop address corre-
sponding to that destination network ID.
o If there is no exact match, but there is a default entry (0.0.0.0/0), the router sends the
packet to the next hop address corresponding to that default entry.
o If there is no match at all, the router drops the packet and sends an error message to
the source device.

Unicast Routing Protocols

 Unicast routing protocols are protocols that exchange routing information among routers
and build forwarding tables for unicast packets, which are packets that have a single desti-
nation device.
 Unicast routing protocols can be classified into two types: interior gateway protocols
(IGPs) and exterior gateway protocols (EGPs).
o Interior gateway protocols (IGPs) are protocols that exchange routing information
within an autonomous system (AS), which is a group of networks under a single ad-
ministrative authority. IGPs can be further classified into two types: distance vector
protocols and link state protocols.
 Distance vector protocols are protocols that exchange routing information
based on the distance (or cost) and direction (or vector) to each destination
network. Distance vector protocols use a distributed algorithm called Bell-
man-Ford algorithm to calculate the shortest paths to each destination net-
work. Distance vector protocols are simple, easy to implement, and scalable,
but they are also slow to converge, prone to loops, and inefficient in band-
width usage.
 Link state protocols are protocols that exchange routing information based on
the state (or status) of each link (or connection) in the network. Link state
protocols use a centralized algorithm called Dijkstra’s algorithm to calcu-
late the shortest paths to each destination network based on a complete map
of the network topology. Link state protocols are fast to converge, loop-free,
and efficient in bandwidth usage, but they are also complex, difficult to im-
plement, and resource-intensive.
o Exterior gateway protocols (EGPs) are protocols that exchange routing information
between autonomous systems (ASes). EGPs can be further classified into two
types: path vector protocols and policy-based routing protocols.
 Path vector protocols are protocols that exchange routing information based
on the path (or sequence) of ASes to each destination network. Path vector
protocols use an extension of the distance vector protocol called Border
Gateway Protocol (BGP) to calculate the best paths to each destination net-
work based on various attributes, such as AS path length, origin, local prefer-
ence, and MED. Path vector protocols are robust, flexible, and scalable, but
they are also complex, slow to converge, and prone to instability.
 Policy-based routing protocols are protocols that exchange routing infor-
mation based on the policies (or rules) of each AS. Policy-based routing pro-
tocols use a mechanism called route filtering to select or reject routes based
on various criteria, such as source, destination, protocol, port, or traffic type.
Policy-based routing protocols are secure, customizable, and efficient, but
they are also subjective, inconsistent, and unpredictable.
 An example of a distance vector protocol is Routing Information Protocol (RIP), which is
an IGP that uses hop count as the distance metric and exchanges routing information every 30
72

seconds. RIP has a maximum hop count of 15, which limits its scalability. RIP also uses vari-
ous techniques to prevent loops and speed up convergence, such as split horizon, poison re-
verse, triggered updates, and hold-down timers.
 An example of a link state protocol is Open Shortest Path First (OSPF), which is an IGP
that uses cost as the link state metric and exchanges routing information based on events.
OSPF has no maximum hop count, which enhances its scalability. OSPF also uses various
features to improve performance and reliability, such as hierarchical structure, designated
routers, equal-cost multipath, authentication, and multicast.
 An example of a path vector protocol is Border Gateway Protocol (BGP), which is an EGP
that uses AS path as the path vector attribute and exchanges routing information based on
events. BGP has no maximum AS path length, which enhances its scalability. BGP also uses
various features to improve performance and stability, such as route aggregation, route reflec-
tors, confederations, communities, and route dampening.
 An example of a policy-based routing protocol is Cisco IOS Policy-Based Routing (PBR),
which is a mechanism that allows network administrators to define policies for routing pack-
ets based on various criteria. PBR can be configured using access lists, route maps, and set
commands. PBR can be used to implement various functions, such as load balancing, traffic
engineering, quality of service, and security.
73

Unit-4 Transport Layer


Process to Process Communication

 Process to process communication is a process that enables data exchange between two or
more processes running on the same or different devices in a network.
 Process to process communication can be achieved by using ports and sockets.
o Ports are logical identifiers that distinguish different processes or applications on a
device. Ports are associated with the transport layer protocols, such as TCP or UDP,
and are represented by 16-bit numbers. For example, port 80 is used for HTTP, port
25 is used for SMTP, and port 53 is used for DNS.
o Sockets are endpoints of communication between processes or applications on differ-
ent devices. Sockets are created by the operating system and are identified by a com-
bination of IP address and port number. For example, a socket can be represented as
192.168.1.100:80, which means the process or application running on the device with
IP address 192.168.1.100 and using port 80.

o
 Socket APIs are application programming interfaces that provide functions and data struc-
tures for creating, managing, and using sockets in various programming languages. Socket
APIs allow programmers to implement process to process communication without worrying
about the low-level details of the network protocols. Some examples of socket APIs are:

o Berkeley sockets: A socket API that was developed at the University of California,
Berkeley, and is widely used in Unix-like operating systems, such as Linux, macOS,
and FreeBSD. Berkeley sockets support both TCP and UDP protocols, as well as
other protocols, such as ICMP and IGMP. Berkeley sockets use functions such as
socket(), bind(), listen(), accept(), connect(), send(), receive(), close(), and so on.
74

o Winsock: A socket API that was developed by Microsoft and is used in Windows op-
erating systems. Winsock is based on Berkeley sockets, but has some differences and
extensions, such as support for overlapped I/O, asynchronous notification, and lay-
ered service providers. Winsock uses functions such as WSAStartup(), WSAC-
leanup(), WSASocket(), WSAConnect(), WSASend(), WSARecv(),
WSACloseSocket(), and so on.
o Java sockets: A socket API that was developed by Sun Microsystems and is used in
Java programming language. Java sockets are part of the java.net package and pro-
vide an object-oriented approach to socket programming. Java sockets support both
TCP and UDP protocols, as well as multicast and secure sockets. Java sockets use
classes such as Socket, ServerSocket, DatagramSocket, MulticastSocket, SSLSocket,
and so on.

User Datagram Protocol (UDP)

 User Datagram Protocol (UDP) is a transport layer protocol that provides unrelia-
ble and connectionless data transmission between processes or applications in a network.
 Features and characteristics of UDP are as follows:
o UDP is unreliable, which means that it does not guarantee the delivery, order, or in-
tegrity of the data packets. UDP does not perform any error detection, correction, or
retransmission of the data packets. UDP leaves these tasks to the application layer or
the user.
o UDP is connectionless, which means that it does not establish or maintain any logical
connection between the source and the destination processes or applications. UDP
does not perform any handshake, synchronization, or termination of the communica-
tion. UDP treats each packet independently and statelessly.
o UDP is simple, which means that it has a minimal overhead and complexity. UDP has
a fixed header size of 8 bytes, which contains only four fields: source port, destina-
tion port, length, and checksum. UDP does not have any options or flags in its header.
o UDP is fast, which means that it has a low latency and high throughput. UDP does
not have any congestion control or flow control mechanisms that may slow down or
block the data transmission. UDP can send data packets as fast as the network allows.
 Advantages and disadvantages of UDP are as follows:
o UDP has the advantages of simplicity, efficiency, and flexibility, but the disad-
vantages of unreliability, lack of security, and lack of quality of service.
o Simplicity: UDP is easy to implement and understand, as it has a minimal overhead
and complexity. UDP does not require any connection establishment or maintenance,
which reduces the processing time and resource consumption.
o Efficiency: UDP is fast and scalable, as it has a low latency and high throughput.
UDP does not have any congestion control or flow control mechanisms that may slow
down or block the data transmission. UDP can handle bursty and unpredictable traffic
better than TCP.
o Flexibility: UDP is adaptable and customizable, as it leaves the reliability and quality
of service tasks to the application layer or the user. UDP can support various types of
applications that have different requirements and preferences, such as real-time, mul-
timedia, or interactive applications.
o Unreliability: UDP does not guarantee the delivery, order, or integrity of the data
packets. UDP does not perform any error detection, correction, or retransmission of
the data packets. UDP may cause data loss, duplication, corruption, or reordering,
which may affect the performance and functionality of the applications.
o Lack of security: UDP does not provide any security features, such as authentication,
encryption, or integrity protection for the data packets. UDP is vulnerable to various
attacks, such as spoofing, modification, or replaying of the data packets. UDP may
compromise the confidentiality, integrity, or availability of the communication.
75

o Lack of quality of service: UDP does not provide any quality of service features, such
as bandwidth allocation, delay control, jitter control, or packet loss control for the
data packets. UDP does not differentiate between different types of data or applica-
tions that may have different levels of priority or importance. UDP may cause poor
quality of service for some applications that are sensitive to delay, jitter, or packet
loss.
 Use cases of UDP are as follows:
o UDP is suitable for applications that require speed, simplicity, and flexibility over
reliability, security, and quality of service. Some examples of such applications are:
 Real-time applications: These are applications that require timely and contin-
uous delivery of data, such as voice over IP (VoIP), video conferencing,
online gaming, or live streaming. These applications can tolerate some data
loss, but not delay or jitter.
 Multimedia applications: These are applications that require efficient and
scalable delivery of data, such as audio and video streaming, file sharing, or
web browsing. These applications can adapt to the network conditions and
use error correction or compression techniques to improve the quality of ser-
vice.
 Interactive applications: These are applications that require responsive and
user-friendly delivery of data, such as online chat, instant messaging, or re-
mote desktop. These applications can handle some data loss or duplication,
but not reordering or corruption.

Transmission Control Protocol (TCP)

 Transmission Control Protocol (TCP) is a transport layer protocol that provides relia-
ble and connection-oriented data transmission between processes or applications in a net-
work.
 Features and characteristics of TCP are as follows:
o TCP is reliable, which means that it guarantees the delivery, order, and integrity of
the data packets. TCP performs various mechanisms to ensure the reliability of the
data transmission, such as error detection, correction, retransmission, acknowledg-
ment, sequencing, and checksum.
o TCP is connection-oriented, which means that it establishes and maintains a logical
connection between the source and the destination processes or applications for the
duration of the communication. TCP performs various mechanisms to ensure the con-
nection-oriented nature of the data transmission, such as handshake, synchronization,
termination, state management, and window management.
o TCP is complex, which means that it has a high overhead and complexity. TCP has a
variable header size of 20 to 60 bytes, which contains many fields and options. TCP
also has many flags and states in its header and operation.
o TCP is slow, which means that it has a high latency and low throughput. TCP has var-
ious congestion control and flow control mechanisms that may slow down or block
the data transmission. TCP also has a slow start and a three-way handshake that may
delay the data transmission.
o
 TCP header format and flags are as follows:
o TCP header format is shown in the following diagram:
76

- TCP header consists of the following fields:


- Source port: A 16-bit field that identifies the port number of the
source process or application.
- Destination port: A 16-bit field that identifies the port number of
the destination process or application.
- Sequence number: A 32-bit field that identifies the sequence number
of the first byte in the data segment. The sequence number is used to order
and reassemble the data segments at the receiver side.
- Acknowledgment number: A 32-bit field that identifies the sequence
number of the next expected byte from the sender. The acknowledgment number
is used to confirm the receipt of the data segments at the receiver side.
- Data offset: A 4-bit field that specifies the size of the TCP header
in 32-bit words. The data offset is used to locate the beginning of the
data segment in the packet.
- Reserved: A 6-bit field that is reserved for future use and must be
set to zero.
- Flags: A 6-bit field that contains six flags that indicate various
control information for the TCP operation. The flags are:
- URG: Urgent pointer flag. It indicates that the packet contains
urgent data that must be processed immediately by the receiver. The urgent
pointer field specifies where the urgent data ends in the packet.

 It indicates that the packet contains an acknowledgment number that confirms the receipt of
the previous data segments from the sender. The acknowledgment number field specifies the
sequence number of the next expected byte from the sender.
 PSH: Push flag. It indicates that the packet contains data that must be delivered to the applica-
tion layer as soon as possible by the receiver. The push function is used to avoid buffering de-
lays at the receiver side.
 RST: Reset flag. It indicates that the packet contains a request or a response to reset the con-
nection due to an error or a refusal. The reset function is used to abort the connection and re-
lease the resources.
 SYN: Synchronize flag. It indicates that the packet contains a request or a response to estab-
lish a connection between the sender and the receiver. The synchronize function is used to ini-
tiate and negotiate the connection parameters, such as sequence numbers and window sizes.
77

 FIN: Finish flag. It indicates that the packet contains a request or a response to terminate the
connection between the sender and the receiver. The finish function is used to gracefully close
the connection and release the resources.
 Window size: A 16-bit field that specifies the size of the receive window in bytes. The receive
window is the amount of data that the receiver can accept at a time. The window size is used
to implement flow control, which is a mechanism that prevents the sender from overwhelm-
ing the receiver with too much data.
 Checksum: A 16-bit field that contains a value that is calculated based on the contents of the
TCP header and data segment. The checksum is used to detect any errors or corruption in the
packet during transmission.
 Urgent pointer: A 16-bit field that specifies the offset of the last byte of urgent data in the
packet. The urgent pointer is used in conjunction with the URG flag to indicate and locate ur-
gent data in the packet.
 Options: A variable-length field that contains optional parameters for TCP operation, such as
maximum segment size, window scaling, selective acknowledgment, timestamp, and so on.
The options field is used to enhance the performance and functionality of TCP.
 Padding: A variable-length field that contains zeros to make the TCP header a multiple of 32
bits. The padding field is used to align the TCP header with the data segment.

Three-Way Handshake and Connection Termination

 Three-way handshake is a process that establishes a connection between two processes or


applications using TCP in a network.
 Three-way handshake works as follows:
o The client process or application that wants to initiate a connection sends a SYN seg-
ment to the server process or application, which contains its initial sequence number
(ISN) and other parameters, such as maximum segment size, window size, and so on.
o The server process or application that receives the SYN segment responds with a
SYN+ACK segment, which contains its ISN, an acknowledgment number equal to
the client’s ISN plus one, and other parameters, such as maximum segment size, win-
dow size, and so on.
o The client process or application that receives the SYN+ACK segment responds with
an ACK segment, which contains an acknowledgment number equal to the server’s
ISN plus one, and other parameters, such as window size, and so on.
o After completing this three-way handshake, both processes or applications have syn-
chronized their sequence numbers and parameters, and are ready to exchange data
segments over an established connection.
 An example of three-way handshake is shown in the following diagram:
78

 Connection termination is a process that closes a connection between two processes or appli-
cations using TCP in a network.
 Connection termination works as follows:
o The process or application that wants to terminate a connection sends a FIN segment
to the other process or application, which indicates that it has no more data to send
and requests to close the connection.
 The process or application that receives the FIN segment responds with an ACK seg-
ment, which acknowledges the receipt of the FIN segment and confirms the closure of
the connection from its side.
o The process or application that receives the ACK segment enters a TIME-
WAIT state, which lasts for twice the maximum segment lifetime (MSL), to
ensure that the ACK segment has reached the other process or application and
to handle any delayed or duplicated segments.
o The process or application that sent the FIN segment enters a FIN-WAIT-2
state, which lasts until it receives a FIN segment from the other process or ap-
plication, indicating that it also wants to close the connection from its side.
o The process or application that receives the FIN segment responds with an
ACK segment, which acknowledges the receipt of the FIN segment and con-
firms the closure of the connection from its side.
o The process or application that sent the FIN segment enters a CLOSED state,
which means that it has successfully terminated the connection and released
the resources.
o The process or application that receives the ACK segment enters a CLOSED
state, which means that it has successfully terminated the connection and re-
leased the resources.
79

 An example of connection termination is shown in the following diagram:

Stream Control Transmission Protocol (SCTP)

 Stream Control Transmission Protocol (SCTP) is a transport layer protocol that provides re-
liable and connection-oriented data transmission between processes or applications in a net-
work, similar to TCP, but with some additional features and advantages.
 Features and advantages of SCTP over TCP are as follows:
o SCTP supports multihoming, which means that it allows a process or application to
have multiple IP addresses and establish multiple paths to the other process or appli-
cation. Multihoming enhances the availability and reliability of the communication,
as it can switch to an alternate path in case of a failure or congestion in the primary
path.
o SCTP supports multistreaming, which means that it allows a process or application
to send multiple streams of data within a single connection. Multistreaming improves
the performance and efficiency of the communication, as it can avoid head-of-line
blocking, which is a problem that occurs when a lost or delayed packet in one stream
blocks the delivery of packets in other streams.
o SCTP supports partial reliability, which means that it allows a process or application
to specify a lifetime for each data chunk. Partial reliability enables the communica-
tion to discard outdated or irrelevant data chunks, which can save bandwidth and re-
duce latency.
o SCTP supports message-oriented delivery, which means that it preserves the bound-
aries and order of the data chunks sent by the process or application. Message-ori-
ented delivery simplifies the data processing and reduces the overhead at the applica-
tion layer.
 An example of SCTP is shown in the following diagram:
80

Congestion Control

 Congestion control is a process that prevents or mitigates network congestion, which is a


situation where the demand for network resources exceeds the available capacity.
 Congestion control can be performed by using various techniques, such as:
o End-to-end congestion control: This is a technique that relies on the feedback from
the network or the receiver to adjust the sending rate of the sender. End-to-end con-
gestion control can be further classified into two types: reactive and proactive.
 Reactive congestion control is a technique that reacts to the signs of conges-
tion, such as packet loss, delay, or duplicate acknowledgments, and reduces
the sending rate accordingly. Reactive congestion control can be implemented
by using algorithms such as Additive Increase Multiplicative Decrease
(AIMD), Slow Start, Congestion Avoidance, and Fast Retransmit.
 Proactive congestion control is a technique that anticipates the onset of con-
gestion, such as by measuring the available bandwidth, queue length, or
packet marking, and adjusts the sending rate accordingly. Proactive conges-
tion control can be implemented by using algorithms such as Traffic Shap-
ing, Leaky Bucket, Token Bucket, and Explicit Congestion Notification
(ECN).
o Network-assisted congestion control: This is a technique that relies on the coopera-
tion of the network devices, such as routers or switches, to regulate the traffic flow
and avoid congestion. Network-assisted congestion control can be further classified
into two types: open-loop and closed-loop.
 Open-loop congestion control is a technique that prevents congestion before
it happens, such as by limiting the admission of new flows, dropping packets
selectively, or marking packets accordingly. Open-loop congestion control
can be implemented by using mechanisms such as Admission Control, Ran-
dom Early Detection (RED), and Active Queue Management (AQM).
 Closed-loop congestion control is a technique that alleviates congestion after
it happens, such as by notifying the sender or receiver about the congestion
status, rerouting traffic to less congested paths, or allocating resources dy-
namically. Closed-loop congestion control can be implemented by using
mechanisms such as Backpressure, Explicit Congestion Notification
(ECN), and Resource Reservation Protocol (RSVP).
 Congestion window and slow start are two concepts related to end-to-end reactive congestion
control using TCP.
o Congestion window is a variable that determines how many packets can be sent by
the sender before receiving an acknowledgment from the receiver. The size of the
congestion window reflects the sender’s estimation of the network capacity and con-
gestion level. The sender adjusts the size of the congestion window based on various
algorithms, such as AIMD, Slow Start, Congestion Avoidance, and Fast Retransmit.
 Slow start is an algorithm that initializes the size of the congestion window to one
segment and increases it exponentially for every acknowledgment received from the
receiver. The slow start algorithm allows the sender to probe the network capacity and
avoid sending too many packets that may cause congestion. The slow start algorithm
stops when the congestion window reaches a threshold value or a packet loss occurs.
 An example of slow start is shown in the following diagram:
81

Explicit Congestion Notification (ECN)

 Explicit Congestion Notification (ECN) is a mechanism that signals the occurrence of con-
gestion in the network to the sender and the receiver of the data packets, without dropping the
packets.
 ECN works as follows:
o ECN requires the support of both the transport layer protocols, such as TCP or SCTP,
and the network layer protocols, such as IPv4 or IPv6, to implement its functionality.
o ECN uses two bits in the IP header, called the ECN field, to indicate the congestion
status of the packet. The ECN field can have four values: 00 (Not-ECT), 01 (ECT(1)),
10 (ECT(0)), and 11 (CE).
 Not-ECT means that the packet is not ECN-capable and should be dropped if
congestion occurs.
 ECT(1) and ECT(0) mean that the packet is ECN-capable and can be marked
if congestion occurs. The difference between ECT(1) and ECT(0) is used for
experimental purposes.
 CE means that the packet has been marked by a router as experiencing con-
gestion.
o ECN also uses two bits in the TCP header, called the ECE and CWR flags, to com-
municate the congestion status between the sender and the receiver. The ECE flag is
used by the receiver to notify the sender that it has received a packet with CE mark.
The CWR flag is used by the sender to notify the receiver that it has reduced its send-
ing rate in response to the ECE flag.
o The sender and the receiver negotiate the use of ECN during the connection establish-
ment phase, by setting the ECN field to ECT(0) or ECT(1) in the SYN and
SYN+ACK segments. If both sides agree to use ECN, they set their ECN field to
ECT(0) or ECT(1) for all subsequent data segments.
o When a router detects congestion in its queue, it randomly selects one or more ECN-
capable packets and sets their ECN field to CE, instead of dropping them. The router
forwards the marked packets to their destination.
o When the receiver receives a packet with CE mark, it sets its ECE flag to 1 in the next
ACK segment that it sends to the sender. The receiver also echoes back the sequence
number of the marked packet in the ACK segment.
82

o When the sender receives an ACK segment with ECE flag set to 1, it infers that there
is congestion in the network and reduces its sending rate accordingly, by using algo-
rithms such as AIMD or Slow Start. The sender also sets its CWR flag to 1 in the next
data segment that it sends to the receiver, to acknowledge that it has received the con-
gestion notification.
o When the receiver receives a data segment with CWR flag set to 1, it clears its ECE
flag to 0 in the next ACK segment that it sends to the sender. This completes one
round of ECN feedback loop.
 The advantages of ECN over traditional congestion control mechanisms are as follows:
o ECN reduces packet loss and retransmission, which improves throughput and effi-
ciency.
o ECN reduces delay and jitter, which improves quality of service and user experience.
o ECN reduces network oscillation and instability, which improves fairness and robust-
ness.
 An example of ECN is shown in the following diagram:

Quality of Service (QoS)

 Quality of Service (QoS) is a process that ensures different levels of service for different
types of data or applications in a network, according to their requirements and preferences.
 QoS can be achieved by using various techniques, such as:
o Classification: This is a technique that identifies and categorizes different types of
data or applications based on various criteria, such as source, destination, protocol,
port, or traffic type.

 Classification allows the network to apply different policies and mechanisms for different
classes of data or applications, such as prioritization, scheduling, or shaping.
o Prioritization: This is a technique that assigns different levels of priority or im-
portance to different classes of data or applications, based on their requirements and
preferences. Prioritization allows the network to allocate more resources and provide
better service to higher-priority classes, while reducing or limiting the service to
lower-priority classes.
o Scheduling: This is a technique that determines the order and timing of transmitting
different classes of data or applications, based on their priority and characteristics.
83

Scheduling allows the network to optimize the utilization and performance of the net-
work resources and avoid congestion and starvation.
o Shaping: This is a technique that regulates the rate and volume of transmitting differ-
ent classes of data or applications, based on their characteristics and network condi-
tions. Shaping allows the network to smooth out the traffic fluctuations and match the
traffic profile with the network capacity and policies.
 QoS can be measured by using various parameters, such as:
o Bandwidth: This is a parameter that measures the amount of data that can be trans-
mitted or received per unit time in a network. Bandwidth is usually expressed in bits
per second (bps) or multiples thereof, such as kilobits per second (Kbps), megabits
per second (Mbps), or gigabits per second (Gbps). Bandwidth affects the throughput
and efficiency of the communication.
o Delay: This is a parameter that measures the time it takes for a data packet to travel
from the source to the destination in a network. Delay is usually expressed in milli-
seconds (ms) or microseconds (µs). Delay affects the latency and responsiveness of
the communication.
o Jitter: This is a parameter that measures the variation or deviation of delay for differ-
ent data packets in a network. Jitter is usually expressed in milliseconds (ms) or mi-
croseconds (µs). Jitter affects the quality and smoothness of the communication, espe-
cially for real-time applications such as voice or video.
o Packet loss: This is a parameter that measures the percentage or ratio of data packets
that are lost or dropped during transmission in a network. Packet loss is usually ex-
pressed as a percentage (%) or a fraction. Packet loss affects the reliability and integ-
rity of the communication.

QoS Improving Techniques

 QoS improving techniques are techniques that aim to enhance the quality of service for differ-
ent types of data or applications in a network, by using various mechanisms such as traffic
shaping, traffic policing, or leaky bucket.
 Traffic shaping is a technique that regulates the rate and volume of transmitting data packets
in a network, by using a mechanism called token bucket.
o Token bucket is a mechanism that uses a virtual bucket that holds tokens, which rep-
resent the permission to send data packets. The bucket has a certain capacity and a
certain rate of token generation. The token bucket works as follows:
 When a source wants to send a data packet, it checks if there are enough to-
kens in the bucket to cover the size of the packet. If there are enough tokens,
it removes them from the bucket and sends the packet. If there are not enough
tokens, it either waits until there are enough tokens or drops the packet, de-
pending on the policy.
 The bucket generates tokens at a constant rate, up to its capacity. If the bucket
is full, any new tokens are discarded. The rate and capacity of the bucket de-
termine the maximum and average sending rate of the source.
 Traffic shaping is a technique that regulates the rate and volume of transmitting data
packets in a network, by using a mechanism called token bucket.
o Token bucket is a mechanism that uses a virtual bucket that holds tokens,
which represent the permission to send data packets. The bucket has a certain
capacity and a certain rate of token generation. The token bucket works as fol-
lows:
 When a source wants to send a data packet, it checks if there are
enough tokens in the bucket to cover the size of the packet. If there are
enough tokens, it removes them from the bucket and sends the packet.
84

If there are not enough tokens, it either waits until there are enough to-
kens or drops the packet, depending on the policy.
 The bucket generates tokens at a constant rate, up to its capacity. If the
bucket is full, any new tokens are discarded. The rate and capacity of
the bucket determine the maximum and average sending rate of the
source.
o The advantage of token bucket is that it allows bursty traffic to be smoothed
out and shaped according to the network capacity and policy. The disad-
vantage of token bucket is that it may cause delay or packet loss if the traffic
exceeds the bucket capacity or rate.
 Traffic policing is a technique that regulates the rate and volume of receiving data
packets in a network, by using a mechanism called leaky bucket.
o Leaky bucket is a mechanism that uses a virtual bucket that holds data packets,
which represent the incoming traffic. The bucket has a certain capacity and a
certain rate of packet leakage. The leaky bucket works as follows:
 When a destination receives a data packet, it checks if there is enough
space in the bucket to store the packet. If there is enough space, it adds
the packet to the bucket. If there is not enough space, it drops the
packet.
 The bucket leaks packets at a constant rate, up to its capacity. If the
bucket is empty, no packets are leaked. The rate and capacity of the
bucket determine the maximum and average receiving rate of the desti-
nation.
o The advantage of leaky bucket is that it prevents congestion and ensures fair-
ness by limiting the receiving rate of the destination. The disadvantage of
leaky bucket is that it may cause delay or packet loss if the traffic exceeds the
bucket capacity or rate.
 An example of traffic shaping using token bucket and traffic policing using leaky
bucket is shown in the following diagram:
85

UNIT-V: Application Layer


Domain Name Space (DNS)

 Domain Name Space (DNS) is a system that maps domain names to IP addresses and vice
versa in a network, such as the Internet.
 DNS works as follows:
o DNS uses a hierarchical structure that divides the domain name space into various
levels, such as top-level domains (TLDs), second-level domains, third-level domains,
and so on. Each level is separated by a dot (.) in the domain name. For example,
www.example.com is a domain name that consists of three levels: www is the third-
level domain, example is the second-level domain, and com is the TLD.
o DNS uses a distributed database that stores the mappings of domain names and IP ad-
dresses in various servers, called name servers. Each name server is responsible for a
certain portion of the domain name space, called a zone. For example, a name server
for the com zone can store the mappings of all the second-level domains under the
com TLD, such as example.com, google.com, or amazon.com.
o DNS uses a client-server model that involves two types of entities: DNS clients, also
called resolvers, and DNS servers, also called name servers. A DNS client is an appli-
cation or a device that requests the IP address of a domain name or vice versa from a
DNS server. A DNS server is an application or a device that responds to the DNS que-
ries from the DNS clients by providing the requested information or forwarding the
query to another DNS server.
o DNS uses a resolution process that involves two types of queries: recursive queries
and iterative queries.
 Recursive query is a type of query that requires the DNS server to provide a
definitive answer to the DNS client, either by providing the requested infor-
mation or by returning an error message. If the DNS server does not have the
86

requested information in its cache or zone, it has to contact other DNS serv-
ers until it finds the answer or an error. For example, if a DNS client asks a
DNS server for the IP address of www.example.com using a recursive query,
the DNS server has to either provide the IP address or return an error message
to the DNS client.
 Iterative query is a type of query that allows the DNS server to provide either
a definitive answer or a referral to another DNS server to the DNS client. If
the DNS server does not have the requested information in its cache or zone,
it can refer the DNS client to another DNS server that may have the answer
or another referral. For example, if a DNS client asks a DNS server for the IP
address of www.example.com using an iterative query, the DNS server can
either provide the IP address or refer the DNS client to another DNS server,
such as the name server for the com zone or the example.com zone.
 An example of DNS hierarchy is shown in the following diagram:

 An example of DNS resolution process using recursive and iterative queries is shown in the
following diagram:
87

Dynamic DNS (DDNS)

 Dynamic DNS (DDNS) is a system that dynamically updates the DNS records of a domain
name in response to the changes in its IP address or other parameters in a network, such as the
Internet.
88


 DDNS works as follows:
o DDNS requires the support of three components: a DDNS client, a DDNS server, and
a DDNS provider. A DDNS client is an application or a device that monitors the IP
address or other parameters of a domain name and sends updates to a DDNS server. A
DDNS server is an application or a device that receives the updates from the DDNS
client and modifies the DNS records of the domain name accordingly. A DDNS pro-
vider is an entity that offers the DDNS service and maintains the DDNS server and
the domain name registration.
o The DDNS client and the DDNS server communicate using various protocols, such as
HTTP, HTTPS, or DNS update. The DDNS client and the DDNS server authenticate
each other using various methods, such as username and password, digital signature,
or IP address verification.
o The DDNS client periodically checks the IP address or other parameters of the do-
main name and compares them with the previous values. If there is any change, the
DDNS client sends an update request to the DDNS server, which contains the new
values and other information, such as hostname, TTL, or record type.
o The DDNS server receives the update request from the DDNS client and verifies its
validity and authenticity. If the update request is valid and authentic, the DDNS
server modifies the DNS records of the domain name accordingly and sends an up-
date response to the DDNS client, which contains the status and result of the update.
 The use cases and benefits of DDNS are as follows:
o DDNS is useful for devices or applications that have dynamic or changing IP ad-
dresses, such as home networks, mobile devices, or cloud services. DDNS allows
these devices or applications to be accessible by using a consistent and memorable
domain name, instead of a variable and complex IP address.
o DDNS is beneficial for users who want to host their own websites, servers, or ser-
vices on their devices or applications, without relying on a third-party hosting pro-
vider. DDNS allows these users to have more control and flexibility over their online
presence and identity, as well as save money and resources.
89

TELNET

 TELNET is a remote terminal access protocol that allows a user to access and control an-
other device or system over a network, such as the Internet.

 TELNET works as follows:


o TELNET uses a client-server model that involves two types of entities: TELNET cli-
ent and TELNET server. A TELNET client is an application or a device that initiates a
connection to a TELNET server. A TELNET server is an application or a device that
accepts and responds to the connection requests from the TELNET clients.
o The TELNET client and the TELNET server communicate using the Transmission
Control Protocol (TCP) on port 23. The TELNET client and the TELNET server ex-
change data using a standard format called Network Virtual Terminal (NVT), which
consists of 8-bit ASCII characters and control codes.
o The TELNET client and the TELNET server negotiate the parameters and options for
the connection using a mechanism called option negotiation, which involves exchang-
ing commands and responses. The option negotiation allows the TELNET client and
the TELNET server to agree on various features, such as echo, terminal type, or au-
thentication.
o After completing the option negotiation, the TELNET client and the TELNET server
establish a virtual terminal session, which allows the user to access and control the
remote device or system as if it were a local terminal. The user can enter commands
and receive responses from the remote device or system using the keyboard and the
screen of the local device or system.

 The limitations of TELNET are as follows:


o TELNET is insecure, which means that it does not provide any security features,
such as encryption, authentication, or integrity protection for the data exchanged be-
tween the TELNET client and the TELNET server. TELNET is vulnerable to various
attacks, such as eavesdropping, modification, or replaying of the data exchanged be-
tween the TELNET client and the TELNET server. TELNET may compromise the
confidentiality, integrity, or availability of the communication.
o TELNET is obsolete, which means that it has been replaced by more advanced and
secure protocols, such as Secure Shell (SSH), which provides encryption, authentica-
tion, and integrity protection for the data exchanged between the SSH client and the
90

SSH server. SSH also provides additional features, such as port forwarding, file trans-
fer, or tunneling.

File Transfer Protocol (FTP)

 File Transfer Protocol (FTP) is a protocol for transferring files between systems over a net-
work, such as the Internet.
 FTP works as follows:
o FTP uses a client-server model that involves two types of entities: FTP client and FTP
server. An FTP client is an application or a device that initiates a connection to an
FTP server and requests to upload or download files. An FTP server is an application
or a device that accepts and responds to the connection requests from the FTP clients
and provides access to the files stored on the server.
o FTP uses two connections for transferring files: control connection and data connec-
tion. The control connection is used for exchanging commands and responses be-
tween the FTP client and the FTP server. The data connection is used for transferring
the actual files between the FTP client and the FTP server.
o FTP uses various commands for transferring files, such as USER, PASS, LIST,
RETR, STOR, QUIT, and so on. The FTP client sends these commands to the FTP
server using the control connection. The FTP server responds to these commands with
numerical codes and messages using the control connection. The FTP client and the
FTP server use the data connection to transfer the files according to the commands.
o FTP uses two modes for transferring files: active mode and passive mode. The active
mode and the passive mode differ in how they establish the data connection between
the FTP client and the FTP server.
 Active mode is a mode where the FTP client initiates the data connection to
the FTP server. The FTP client sends its IP address and port number to the
FTP server using the PORT command. The FTP server connects to the IP ad-
dress and port number provided by the FTP client using the data connection.
 Passive mode is a mode where the FTP server initiates the data connection to
the FTP client. The FTP client requests the IP address and port number from
the FTP server using the PASV command. The FTP server responds with its
IP address and port number using the control connection. The FTP client con-
nects to the IP address and port number provided by the FTP server using the
data connection.
 An example of FTP operation using active mode is shown in the following diagram:
91

World Wide Web (WWW)

 World Wide Web (WWW) is a system that allows users to access and share infor-
mation over a network, such as the Internet, using web browsers and web servers.
 WWW works as follows:
o WWW uses a client-server model that involves two types of entities: web browsers
and web servers. A web browser is an application or a device that allows the user to
request and view web pages from a web server. A web server is an application or a
device that stores and delivers web pages to the web browsers.
o WWW uses a protocol called Hypertext Transfer Protocol (HTTP) to communicate
between the web browsers and the web servers. HTTP is a stateless and request-re-
sponse protocol that transfers hypertext documents, which are documents that contain
text, images, audio, video, or other multimedia elements, as well as links to other doc-
uments, called hyperlinks.
o WWW uses a standard format for the web pages, called Hypertext Markup Language
(HTML), which defines the structure and content of the web pages using tags and at-
tributes. HTML also allows the inclusion of other formats, such as Cascading Style
Sheets (CSS), which define the style and appearance of the web pages, or JavaScript,
which define the behavior and interactivity of the web pages.
 HTTP request and response format are as follows:
o HTTP request is a message that is sent by the web browser to the web server to re-
quest a web page or a resource. HTTP request consists of three parts: request line, re-
quest header, and request body. The request line contains the method, the URL, and
the version of HTTP. The request header contains various fields that provide infor-
mation about the request, such as Host, User-Agent, Accept, Cookie, and so on. The
request body contains optional data that is sent to the server, such as form data or file
upload.
o HTTP response is a message that is sent by the web server to the web browser to re-
spond to a request. HTTP response consists of three parts: status line, response
header, and response body. The status line contains the version of HTTP, the status
code, and the status message. The status code is a three-digit number that indicates
the outcome of the request, such as 200 (OK), 404 (Not Found), or 500 (Internal
Server Error). The status message is a short phrase that describes the status code. The
response header contains various fields that provide information about the response,
such as Content-Type, Content-Length, Server, Set-Cookie, and so on. The response
92

body contains optional data that is sent to the browser, such as HTML document or
image file.
 An example of HTTP request and response format is shown in the following diagram:

Simple Network Management Protocol (SNMP)

 Simple Network Management Protocol (SNMP) is a protocol that allows network manage-
ment and monitoring of various devices and systems in a network, such as routers, switches,
servers, printers, or computers.
 SNMP works as follows:
o SNMP uses a client-server model that involves two types of entities: SNMP managers
and SNMP agents. An SNMP manager is an application or a device that requests and
receives information from the SNMP agents. An SNMP agent is an application or a
device that provides and sends information to the SNMP managers.
o SNMP uses a protocol called User Datagram Protocol (UDP) to communicate be-
tween the SNMP managers and the SNMP agents. UDP is a connectionless and unre-
liable protocol that transfers data packets without guaranteeing their delivery, order,
or integrity. UDP uses port 161 for the SNMP managers and port 162 for the SNMP
agents.
o SNMP uses a data structure called Management Information Base (MIB) to store and
organize the information about the devices and systems in the network. MIB consists
of various objects that represent different aspects or attributes of the devices and sys-
tems, such as name, status, performance, or configuration. MIB uses a hierarchical
structure that divides the objects into various branches, such as system, interfaces, ip,
tcp, udp, and so on. Each object has a unique identifier, called an object identifier
(OID), which consists of a sequence of numbers separated by dots (.). For example,
the OID for the system name object is 1.3.6.1.2.1.1.5.
o SNMP uses various operations for network management and monitoring, such as
GET, SET, GETNEXT, GETBULK, TRAP, INFORM, and so on. The SNMP man-
ager sends these operations to the SNMP agent using a message format called Proto-
col Data Unit (PDU), which contains the operation type, the OID, the value, and other
information. The SNMP agent responds to these operations with another PDU mes-
sage that contains the result or status of the operation.
93

 SNMP has three versions: SNMPv1, SNMPv2, and SNMPv3.


o SNMPv1 is the first and simplest version of SNMP that supports basic operations
such as GET, SET, GETNEXT, and TRAP. SNMPv1 uses a simple authentication
mechanism based on community strings, which are passwords that are shared be-
tween the SNMP manager and the SNMP agent. SNMPv1 does not provide any en-
cryption or integrity protection for the data exchanged between the SNMP manager
and the SNMP agent.
o SNMPv2 is an improved version of SNMP that supports additional operations such as
GETBULK and INFORM. SNMPv2 also supports more data types and error codes
than SNMPv1. SNMPv2 uses the same authentication mechanism as SNMPv1 based
on community strings. SNMPv2 does not provide any encryption or integrity protec-
tion for the data exchanged between the SNMP manager and the SNMP agent.
o SNMPv3 is the latest and most secure version of SNMP that supports all the opera-
tions of SNMPv2 as well as some new features such as report and notification.
SNMPv3 also supports more security features than SNMPv2 such as encryption, au-
thentication, and integrity protection for the data exchanged between the SNMP man-
ager and the SNP agent. SNPv3 uses a complex authentication mechanism based on
usernames and passwords as well as cryptographic keys and algorithms.

Bluetooth

 Bluetooth is a wireless technology for short-range communication between various


devices and systems in a network, such as smartphones, laptops, headphones, speak-
ers, printers, or sensors.
 Bluetooth works as follows:
o Bluetooth uses a radio frequency band of 2.4 GHz to transmit and receive data
between the devices and systems. Bluetooth uses a technique called frequency
hopping spread spectrum (FHSS) to avoid interference and increase security.
FHSS means that the devices and systems change their frequency of transmis-
sion and reception randomly and rapidly among 79 channels in the 2.4 GHz
band.
o Bluetooth uses a network topology called piconet to connect the devices and
systems. A piconet consists of up to eight devices or systems that communi-
cate with each other using a master-slave relationship. One device or system
acts as the master and controls the communication with the other devices or
systems, which act as the slaves. The master assigns a unique frequency hop-
ping pattern to each slave and synchronizes the transmission and reception of
data with them. Multiple piconets can form a larger network called scatternet,
where some devices or systems can act as bridges between different piconets.
o Bluetooth uses a protocol stack that consists of various layers that provide dif-
ferent functions and services for the communication between the devices and
systems. The Bluetooth protocol stack includes the following layers: radio
layer, baseband layer, link manager layer, logical link control and adaptation
protocol (L2CAP) layer, service discovery protocol (SDP) layer, and applica-
tion layer. The application layer contains various profiles that define the spe-
cific roles and capabilities of the devices and systems, such as headset profile,
hands-free profile, file transfer profile, or personal area network profile.
 An example of Bluetooth communication is shown in the following diagram:
94

Firewalls

 Firewalls are network security mechanisms that monitor and control the incoming and out-
going traffic between different networks or devices, based on predefined rules and policies.
 Firewalls have the following roles:
o Firewalls protect the network or device from unauthorized or malicious access, such
as hackers, viruses, worms, or denial-of-service attacks.
o Firewalls filter the network or device from unwanted or harmful traffic, such as spam,
phishing, or malware.
o Firewalls enforce the network or device policies and regulations, such as authentica-
tion, authorization, encryption, or logging.
 Firewalls can be classified into three types: packet filtering, proxy, and stateful inspection.
o Packet filtering is a type of firewall that examines each packet that passes through the
firewall and decides whether to allow or block it based on its source address, destina-
tion address, protocol, port number, or other criteria. Packet filtering is simple and
fast, but it does not inspect the content or the state of the packets.
o Proxy is a type of firewall that acts as an intermediary between the network or device
and the external network or device. Proxy intercepts and modifies the packets that
pass through the firewall and creates a new connection with the external network or
device. Proxy can inspect the content and the state of the packets, as well as provide
caching and authentication services. Proxy is complex and slow, but it provides more
security and functionality than packet filtering.
o Stateful inspection is a type of firewall that combines the features of packet filtering
and proxy. Stateful inspection examines each packet that passes through the firewall
and decides whether to allow or block it based on its source address, destination ad-
dress, protocol, port number, content, state, or other criteria. Stateful inspection is
more intelligent and flexible than packet filtering and proxy, but it requires more
memory and processing power.
 An example of firewalls is shown in the following diagram:
95
96

Machine Learning
UNIT-I: Introduction to Machine Learning
1. Definition of Machine Learning

 Machine learning is the study of computer algorithms that can learn from data and make pre-
dictions or decisions based on the data.
 Machine learning is important because it enables computers to perform tasks that are difficult
or impossible to program explicitly, such as recognizing faces, understanding natural lan-
guage, playing games, etc.

 Machine learning can also help us discover new knowledge and insights from large and com-
plex data sets, such as genomic data, social media data, etc.
 Some examples of machine learning in real-world applications are:
o Spam filtering: Machine learning algorithms can learn to classify email messages as
spam or not spam based on the content and metadata of the messages.
o Recommendation systems: Machine learning algorithms can learn to recommend
products, movies, music, etc. to users based on their preferences and behavior.
o Self-driving cars: Machine learning algorithms can learn to control a car autono-
mously by sensing the environment and making decisions accordingly.
o Speech recognition: Machine learning algorithms can learn to recognize and tran-
scribe spoken words into text.

2. Types of Machine Learning

 Machine learning can be broadly classified into three types based on the type and amount of
data available for learning:
o Supervised learning: Learning with labeled data
o Unsupervised learning: Discovering patterns in unlabeled data
o Reinforcement learning: Learning through trial and error
 Supervised learning is the most common type of machine learning, where the algorithm is
given a set of input-output pairs (called training data) and learns to find a function that maps
the input to the output. The goal of supervised learning is to generalize the learned function to
new and unseen inputs (called test data) and make accurate predictions. Some examples of
supervised learning tasks are:
o Classification: Predicting a discrete label for an input, such as whether an email is
spam or not, whether a tumor is benign or malignant, etc.
o Regression: Predicting a continuous value for an input, such as the price of a house,
the age of a person, etc.
97

o
 Unsupervised learning is the type of machine learning where the algorithm is given a set of
inputs without any labels or outputs and learns to discover patterns or structure in the data.
The goal of unsupervised learning is to find hidden or latent variables that explain the varia-
tion in the data. Some examples of unsupervised learning tasks are:
o Clustering: Grouping similar inputs into clusters, such as customers based on their
purchase behavior, documents based on their topics, etc.
o Dimensionality reduction: Reducing the number of features or variables in the data
while preserving the essential information, such as principal component analysis
(PCA), singular value decomposition (SVD), etc.
98

o
 Reinforcement learning is the type of machine learning where the algorithm learns from its
own actions and feedback from the environment. The algorithm (called an agent) interacts
with the environment (which can be stochastic or deterministic) and observes the state and
reward (which can be positive or negative) of the environment. The goal of reinforcement
learning is to find an optimal policy that maximizes the expected cumulative reward over
time. Some examples of reinforcement learning tasks are:
o Control: Learning to control a system or a device, such as a robot arm, a helicopter,
etc.
o Games: Learning to play games against human or computer opponents, such as chess,
Go, Atari games, etc.

3. Applications of Machine Learning

 Machine learning has been applied to various fields and domains, such as healthcare, finance,
robotics, etc., to solve complex and challenging problems that require human-like intelligence
and decision making. Some examples of machine learning applications in different fields are:
o Healthcare: Machine learning can help diagnose diseases, predict outcomes, recom-
mend treatments, analyze medical images, discover new drugs, etc. For instance,
 DeepMind’s AlphaFold system can predict the three-dimensional structure of
proteins from their amino acid sequences using deep neural networks.
 IBM’s Watson system can assist doctors in diagnosing cancer and suggesting
personalized treatments based on natural language processing and knowledge
representation.
 Google’s DeepMind Health system can detect eye diseases from retinal scans
using convolutional neural networks.
o Finance: Machine learning can help detect fraud, optimize portfolios, predict market
movements, automate trading, analyze customer behavior, etc. For instance,
 PayPal’s fraud detection system can identify fraudulent transactions and flag
them for review using supervised learning and anomaly detection.
99

 BlackRock’s Aladdin system can help investors manage their portfolios and
risks using optimization and simulation techniques.
 Renaissance Technologies’ Medallion fund can generate high returns by using
quantitative and statistical methods to trade in the financial markets.
o Robotics: Machine learning can help robots learn to perform tasks that require per-
ception, manipulation, navigation, coordination, etc. For instance,
 Boston Dynamics’ Atlas robot can perform acrobatic feats such as backflips,
somersaults, etc. using reinforcement learning and control theory.
 OpenAI’s Dactyl system can manipulate objects with a robotic hand using
reinforcement learning and computer vision.
 Amazon’s Kiva system can optimize the warehouse operations by using au-
tonomous mobile robots that can move shelves and products.

4. Challenges and Issues in Machine Learning

 Machine learning is not a magic bullet that can solve any problem without any difficulties or
drawbacks. There are many challenges and issues that need to be addressed when applying
machine learning to real-world problems, such as data quality and quantity, bias and fairness,
ethical considerations, etc.
 Data quality and quantity: Machine learning algorithms depend on the data that they are given
to learn from. Therefore, the quality and quantity of the data are crucial for the performance
and reliability of the algorithms. Some of the common problems with data are:
o Missing data: Some data points may have missing values for some features or varia-
bles, which can affect the accuracy and completeness of the analysis.
o Outliers: Some data points may have extreme or abnormal values that deviate from
the rest of the data, which can affect the robustness and stability of the analysis.
o Noise: Some data points may have errors or inaccuracies due to measurement errors,
human errors, transmission errors, etc., which can affect the validity and reliability of
the analysis.
o Inconsistencies: Some data points may have conflicting or contradictory values due to
different sources, formats, standards, etc., which can affect the consistency and com-
parability of the analysis.
 Bias and fairness: Machine learning algorithms may inherit or amplify the biases that exist in
the data, the models, or the users, which can lead to unfair or discriminatory outcomes or de-
cisions. Some of the common sources of bias are:
o Data bias: The data may not represent the true population or distribution of interest,
due to sampling bias, selection bias, reporting bias, etc., which can affect the generali-
zability and representativeness of the analysis.
o Model bias: The model may not capture the true relationship or function of interest,
due to overfitting, underfitting, confounding, etc., which can affect the accuracy and
interpretability of the analysis.
o User bias: The user may not use or evaluate the model in an objective or impartial
manner, due to confirmation bias, anchoring bias, hindsight bias, etc., which can af-
fect the trustworthiness and accountability of the analysis.
 Ethical considerations: Machine learning algorithms may have ethical implications or conse-
quences that need to be considered when designing, deploying, or using them. Some of the
common ethical issues are:
o Privacy: The data may contain sensitive or personal information that may be exposed
or misused by unauthorized parties, which can affect the confidentiality and security
of the analysis.
o Transparency: The model may not be transparent or explainable enough to justify or
understand its predictions or decisions, which can affect the interpretability and ac-
countability of the analysis.
100

o Responsibility: The user may not be responsible or liable for the outcomes or actions
of the model, which can affect the trustworthiness and accountability of the analysis.

5. Understanding the Machine Learning Workflow

 Machine learning is not a one-shot process that can be done in a single step. It is a complex
and iterative process that involves multiple steps from data collection to model deployment.
The machine learning workflow can be summarized as follows:
o Data collection: The first step is to collect or acquire the data that is relevant and suf-
ficient for the problem at hand. The data can come from various sources such as data-
bases, web pages, sensors, surveys, etc.
o Data exploration: The next step is to explore or analyze the data to understand its
characteristics and structure. This can involve descriptive statistics, data visualization,
feature engineering, etc.
o Data pre-processing: The next step is to pre-process or transform the data to make it
suitable and ready for machine learning. This can involve scaling, normalization, en-
coding, imputation, etc.

o
 Model training: The next step is to train or fit the machine learning model to the data
using a suitable algorithm and technique. This can involve choosing a model architec-
ture, setting hyperparameters, optimizing a loss function, etc.
 Model evaluation: The next step is to evaluate or test the machine learning model on
new and unseen data to measure its performance and generalization. This can involve
choosing a metric, splitting the data into train, validation, and test sets, cross-valida-
tion, etc.
 Model fine-tuning: The next step is to fine-tune or improve the machine learning
model by adjusting its parameters or features based on the evaluation results. This can
involve regularization, feature selection, grid search, etc.
 Model deployment: The final step is to deploy or use the machine learning model in a
real-world setting or application. This can involve saving, loading, updating, monitor-
ing, etc. the model.
101

6. Basic Types of Data in Machine Learning

 Data is the raw material or input for machine learning. Data can be of different types depend-
ing on the nature and format of the information it contains. The basic types of data in machine
learning are:
o Numerical data: Data that consists of numbers or quantities that can be measured or
calculated. Numerical data can be further divided into two subtypes:
 Continuous data: Data that can take any value within a range or interval, such
as height, weight, temperature, etc.
 Discrete data: Data that can take only certain values or categories, such as
number of children, gender, zip code, etc.
o Categorical data: Data that consists of labels or names that represent groups or classes
of items or entities. Categorical data can be further divided into two subtypes:
 Nominal data: Data that has no inherent order or ranking among the catego-
ries, such as color, nationality, animal, etc.
 Ordinal data: Data that has a meaningful order or ranking among the catego-
ries, such as education level, satisfaction rating, movie genre, etc.
o Text data: Data that consists of words or sentences that convey meaning or infor-
mation. Text data can be in the form of natural language (such as English, French,
etc.) or artificial language (such as HTML, SQL, etc.). Text data requires special pre-
processing techniques for machine learning, such as:
 Tokenization: Splitting the text into smaller units or tokens, such as words,
characters, symbols, etc.
 Stemming: Reducing the tokens to their root or base form, such as playing ->
play, cats -> cat, etc.
 Lemmatization: Converting the tokens to their canonical or dictionary form,
such as played -> play, mice -> mouse, etc.
 Stopword removal: Removing the tokens that are very common or irrelevant
for the analysis, such as the, a, and, etc.
 Vectorization: Converting the tokens into numerical vectors that represent
their frequency or importance in the text, such as bag-of-words (BOW), term
frequency-inverse document frequency (TF-IDF), word embeddings
(Word2Vec), etc.

7. Exploring the Structure of Data

 Before applying machine learning algorithms to the data, it is important to explore or under-
stand the structure and characteristics of the data. This can help to identify patterns, trends,
outliers, relationships, etc. in the data and gain insights for further analysis. Some of the tech-
niques for exploring the structure of data are:
o Descriptive statistics: Calculating numerical summaries or measures that describe the
central tendency, variability, and distribution of the data. Some common descriptive
statistics are:
 Mean: The average value of the data points.
 Median: The middle value of the data points when sorted in ascending or de-
scending order.
 Mode: The most frequent value of the data points.
 Variance: The measure of how much the data points deviate from the mean.
 Standard deviation: The square root of the variance.
 Range: The difference between the maximum and minimum values of the
data points.
 Percentile: The value below which a certain percentage of the data points fall.
 Quartile: The value that divides the data points into four equal groups (Q1 =
25th percentile, Q2 = 50th percentile = median, Q3 = 75th percentile).
102

 Interquartile range (IQR): The difference between Q3 and Q1.


o Data visualization: Creating graphical representations or plots that display the data in
a visual or intuitive way. Some common data visualization techniques are:
 Scatter plot: A plot that shows the relationship between two numerical varia-
bles by using dots to represent each data point.
 Histogram: A plot that shows the frequency distribution of a numerical varia-
ble by using bars to represent each bin or interval.
 Box plot: A plot that shows the summary statistics of a numerical variable by
using a box to represent Q1, Q2 (median), and Q3 and whiskers to represent
minimum and maximum values (or 1.5 * IQR).
 Bar chart: A plot that shows the frequency distribution of a categorical varia-
ble by using bars to represent each category.
 Pie chart: A plot that shows the proportion of each category in a categorical
variable by using slices to represent each category.

8. Data Quality and Remediation

 Data quality is an essential factor for machine learning success. Poor quality data can lead to
poor quality results and unreliable predictions or decisions. Therefore, it is important to check
and improve the quality of the data before applying machine learning algorithms. Some of the
common problems with data quality and their remediation techniques are:
o Missing data: Some data points may have missing values for some features or varia-
bles due to various reasons such as incomplete records, data entry errors, etc. Missing
data can affect the accuracy and completeness of the analysis and introduce bias or
uncertainty in the results. Some of the techniques for dealing with missing data are:
 Imputation: Replacing the missing values with some estimated or plausible
values based on the available data. Some common imputation methods are:
 Mean imputation: Replacing the missing values with the mean value
of the feature.
 Median imputation: Replacing the missing values with the median
value of the feature.
103

 Mode imputation: Replacing the missing values with the mode value
of the feature.
 K-nearest neighbors (KNN) imputation: Replacing the missing val-
ues with the average value of the k most similar data points based on
some distance metric.
 Deletion: Removing the data points or features that have missing values. This
can be done in two ways:
 Listwise deletion: Removing the entire data point if any of its fea-
tures have missing values.
 Pairwise deletion: Removing only the feature that has missing values
and keeping the rest of the data point.
o Outliers: Some data points may have extreme or abnormal values that deviate from
the rest of the data due to various reasons such as measurement errors, data entry er-
rors, natural variation, etc. Outliers can affect the robustness and stability of the anal-
ysis and introduce noise or distortion in the results. Some of the techniques for detect-
ing and handling outliers are:
 Detection: Identifying the data points that are outliers based on some criteria
or threshold. Some common detection methods are:
 Box plot: Using a box plot to visualize the distribution of a numerical
feature and finding the data points that lie outside the whiskers (or
1.5 * IQR) as outliers.
 Z-score: Calculating the standardized score or z-score for each data
point as (value - mean) / standard deviation and finding the data
points that have z-scores greater than 3 or less than -3 as outliers.
 Interquartile range (IQR): Calculating the interquartile range or IQR
for each feature as Q3 - Q1 and finding the data points that have val-
ues greater than Q3 + 1.5 * IQR or less than Q1 - 1.5 * IQR as outli-
ers.

 Handling: Dealing with the data points that are outliers based on some strat-
egy or action. Some common handling methods are:
 Removal: Removing the outliers from the data set.
 Capping: Replacing the outliers with some maximum or minimum
value within a reasonable range.
 Transformation: Applying some mathematical function to reduce or
eliminate the effect of outliers, such as log, square root, etc.

 Data cleaning: Removing or correcting the errors or inconsistencies in the data due to various
reasons such as human errors, data entry errors, formatting errors, etc. Data cleaning can im-
prove the validity and reliability of the analysis and reduce the noise or distortion in the re-
sults. Some of the techniques for data cleaning are:
104

o Validation: Checking the data for errors or inconsistencies using some rules or crite-
ria, such as data type, data range, data format, etc.
o Correction: Fixing the errors or inconsistencies in the data using some methods or
tools, such as manual editing, automatic correction, etc.
o Standardization: Converting the data to a common or consistent format or standard,
such as date format, currency format, unit of measurement, etc.

9. Data Pre-Processing

 Data pre-processing is the process of transforming or modifying the data to make it suitable
and ready for machine learning. Data pre-processing can improve the performance and gener-
alization of the machine learning algorithms and enhance the quality and usability of the re-
sults. Some of the techniques for data pre-processing are:
o Scaling and normalization: Changing the range or scale of the numerical features to
make them comparable or compatible with each other and with the machine learning
algorithms. Some common scaling and normalization methods are:
 Min-max scaling: Scaling the features to a fixed range between 0 and 1 by
using the formula (value - min) / (max - min).
 Standardization: Scaling the features to have zero mean and unit variance by
using the formula (value - mean) / standard deviation.
 Normalization: Scaling the features to have unit norm (length) by dividing
each value by the square root of the sum of squares of all values.
105

o One-hot encoding: Converting the categorical features into numerical features by cre-
ating dummy variables for each category. For example, if a feature has three catego-
ries A, B, and C, then one-hot encoding will create three new features A’, B’, and C’
such that A’ = 1 if A is present and 0 otherwise, B’ = 1 if B is present and 0 otherwise,
and C’ = 1 if C is present and 0 otherwise.

o Handling imbalanced data: Dealing with the data that has unequal or disproportionate
distribution of classes or labels. For example, if a classification problem has two clas-
ses positive and negative, and 90% of the data points belong to the negative class and
only 10% belong to the positive class, then the data is imbalanced. Imbalanced data
can affect the accuracy and fairness of the machine learning algorithms and introduce
bias or skewness in the results. Some of the techniques for handling imbalanced data
are:
 Oversampling: Increasing the number of data points in the minority class by
duplicating or generating new data points based on some methods or tech-
niques, such as random oversampling, synthetic minority oversampling tech-
nique (SMOTE), etc.
106

 Undersampling: Decreasing the number of data points in the majority class


by removing or selecting some data points based on some methods or tech-
niques, such as random undersampling, cluster centroids, etc.
 Weighting: Assigning different weights or importance to different classes
based on their frequency or proportion in the data set. For example, assigning
higher weights to the minority class and lower weights to the majority class.
107

UNIT-II: Modelling & Evaluation, Basics of Feature Engineering


10. Introduction to Model Selection and Evaluation

 Model selection and evaluation are important steps in the machine learning workflow, where
the goal is to choose the best model among a set of candidate models and measure its perfor-
mance and generalization on new and unseen data.
 Model evaluation metrics are quantitative measures that assess how well a model performs on
a given task or problem. Different metrics may be suitable for different types of tasks or prob-
lems, such as classification, regression, clustering, etc. Some common model evaluation met-
rics are:
o Accuracy: The proportion of correct predictions among the total number of predic-
tions. Accuracy is a simple and intuitive metric for classification problems, but it may
not be appropriate for imbalanced data or multiclass problems.
o Precision: The proportion of correct positive predictions among the total number of
positive predictions. Precision measures how precise or exact a model is in identify-
ing the positive class, but it may not account for the negative class or the false nega-
tives.
o Recall: The proportion of correct positive predictions among the total number of ac-
tual positives. Recall measures how complete or exhaustive a model is in identifying
the positive class, but it may not account for the negative class or the false positives.
o F1-score: The harmonic mean of precision and recall. F1-score balances both preci-
sion and recall and gives a single score that reflects the overall performance of a
model on the positive class.
o Mean squared error (MSE): The average of the squared differences between the ac-
tual and predicted values. MSE measures how close a model is to the true values, but
it may be sensitive to outliers or large errors.
o Root mean squared error (RMSE): The square root of the mean squared error. RMSE
measures how close a model is to the true values, but it may be easier to interpret than
MSE as it has the same unit as the values.
o R-squared: The proportion of the variance in the actual values that is explained by the
model. R-squared measures how well a model fits the data, but it may not indicate
how well a model predicts new data.
 Cross-validation is a technique for validating or testing a model on different subsets of the
data to reduce overfitting or underfitting and improve generalization. Cross-validation in-
volves splitting the data into k folds or groups, using k-1 folds for training and one fold for
testing, and repeating this process k times with different folds for testing. Some common
cross-validation methods are:
o k-fold cross-validation: Splitting the data into k equal-sized folds and using each fold
as a test set once and as a training set k-1 times.

o
108

o Holdout method: Splitting the data into two parts, one for training and one for testing,
and using them only once. This is a simple and fast method, but it may not use all the
data or reflect all the variability in the data.

11. Selecting a Model

 Selecting a model is the process of choosing the best model among a set of candidate models
based on some criteria or objectives. Selecting a model is important because different models
may have different strengths and weaknesses, advantages and disadvantages, and suitability
and applicability for different tasks or problems.
 Criteria for choosing appropriate models are quantitative or qualitative measures that evaluate
or compare the performance and quality of different models. Different criteria may be rele-
vant for different types of tasks or problems, such as classification, regression, clustering, etc.
Some common criteria for choosing appropriate models are:
o Accuracy: The degree to which a model makes correct predictions or decisions on
new and unseen data. Accuracy is a simple and intuitive criterion for selecting a
model, but it may not account for other factors such as complexity, interpretability,
robustness, etc.
o Complexity: The degree to which a model has many parameters or features that affect
its behavior or output. Complexity is a trade-off criterion for selecting a model, as it
may affect both the accuracy and the interpretability of the model. A complex model
may have high accuracy but low interpretability, while a simple model may have low
accuracy but high interpretability.
o Interpretability: The degree to which a model can be understood or explained by hu-
mans. Interpretability is an important criterion for selecting a model, especially for
applications that require trust, transparency, or accountability. An interpretable model
can provide insights into the logic or reasoning behind its predictions or decisions,
while an uninterpretable model can be seen as a black box that produces outputs with-
out explanations.
o Robustness: The degree to which a model can handle noise, uncertainty, or variability
in the data or the environment. Robustness is a desirable criterion for selecting a
model, as it indicates how well a model can cope with real-world situations that may
not be ideal or perfect. A robust model can maintain its performance and quality un-
der different conditions or scenarios, while a non-robust model can be sensitive or un-
stable under different conditions or scenarios.
 Bias-variance tradeoff is a fundamental concept in machine learning that describes the rela-
tionship between the complexity and the accuracy of a model. Bias-variance tradeoff states
that:
o Bias is the error or difference between the expected or average prediction of a model
and the true value. Bias measures how accurate a model is on average across different
data sets.
o Variance is the error or difference between the actual prediction of a model and the
expected prediction of a model. Variance measures how consistent a model is across
different data sets.
o There is an inverse relationship between bias and variance, such that increasing one
will decrease the other and vice versa. This means that there is a tradeoff between
bias and variance, such that reducing both at the same time is difficult or impossible.
o A high-bias model is a simple or underfitting model that has low complexity and high
error on both training and test data. A high-bias model does not capture the true rela-
tionship or function of interest and makes systematic errors.
o A high-variance model is a complex or overfitting model that has high complexity
and low error on training data but high error on test data. A high-variance model cap-
tures the noise or randomness in the data and makes random errors.
109

o An optimal model is a balanced or well-fitting model that has moderate complexity


and low error on both training and test data. An optimal model captures the true rela-
tionship or function of interest and makes minimal errors.

12. Training a Model (for Supervised Learning)

 Training a model is the process of building or fitting a machine learning model to the data us-
ing a suitable algorithm and technique. Training a model is an essential step in supervised
learning, where the goal is to find a function that maps the input to the output based on la-
beled data.
 Using training data to build a model involves providing input-output pairs (called training ex-
amples) to the machine learning algorithm and adjusting the parameters or weights of the
model to minimize the error or loss between the actual and predicted outputs. The training
data should be representative and sufficient for the problem at hand, as it determines how well
the model can learn from the data and generalize to new data.
 Model representation and interpretability are important aspects of training a machine
learning model, as they determine how the model can be understood or explained by
humans. Different models may have different representations and interpretability, de-
pending on their complexity and structure. Some common types of model representa-
tion and interpretability are:
o Linear models: Models that have a linear or additive relationship between the
input and the output, such as linear regression, logistic regression, etc. Linear
models are easy to represent and interpret, as they can be expressed by a sim-
ple equation or formula that shows the contribution or effect of each input fea-
ture on the output.
o Tree-based models: Models that have a hierarchical or branching structure that
splits the input space into smaller regions based on some criteria or rules, such
as decision trees, random forests, etc. Tree-based models are moderately easy
to represent and interpret, as they can be visualized by a tree diagram or graph
that shows the path or sequence of decisions that lead to the output.
o Neural network models: Models that have a layered or networked structure
that consists of interconnected nodes or units that perform some computation
or transformation on the input, such as artificial neural networks, deep neural
networks, etc. Neural network models are difficult to represent and interpret,
as they can have many layers and nodes with complex and nonlinear interac-
tions that are hard to explain or understand.
110

13. Introduction to Feature Engineering

 Feature engineering is the process of creating or modifying the features or variables that are
used as input for machine learning models. Feature engineering can enhance the performance
and quality of the machine learning models by improving the representation and suitability of
the data for the problem at hand.
 Techniques to improve feature representations are methods or strategies that can transform,
construct, or extract new or better features from the existing data. Different techniques may be
suitable for different types of data or problems, such as numerical, categorical, text, etc. Some
common techniques to improve feature representations are:
o Feature transformation: Changing the scale, shape, or distribution of the features to
make them more compatible or appropriate for the machine learning models. For ex-
ample, scaling, normalization, log, square root, etc.
o Feature construction: Creating new features from existing ones by using some logic,
computation, or combination. For example, adding, subtracting, multiplying, divid-
ing, etc.
o Feature extraction: Reducing the dimensionality or number of features by using some
technique that preserves the essential information or structure of the data. For exam-
ple, principal component analysis (PCA), singular value decomposition (SVD), latent
Dirichlet allocation (LDA), etc.
o Feature selection: Choosing a subset of features that are relevant or important for the
problem at hand by using some criterion or method. For example, correlation, mutual
information, chi-square test, etc.

14. Feature Transformation

 Feature transformation is a technique for feature engineering that involves changing the scale,
shape, or distribution of the features to make them more compatible or appropriate for the ma-
chine learning models. Feature transformation can improve the model fit and performance by
reducing skewness, outliers, heteroscedasticity, multicollinearity, etc.
 Transforming features to improve model fit involves applying some mathematical function or
operation to the features to change their values or properties. Different functions or operations
may have different effects or benefits on the features and the models. Some common func-
tions or operations for feature transformation are:
111

o Scaling: Changing the range or magnitude of the features to make them comparable
or consistent with each other and with the models. Scaling can help to avoid numeri-
cal issues or errors due to large or small values and improve the convergence or sta-
bility of the models. Some common scaling methods are:
 Min-max scaling: Scaling the features to a fixed range between 0 and 1 by
using the formula (value - min) / (max - min).
 Standardization: Scaling the features to have zero mean and unit variance by
using the formula (value - mean) / standard deviation.
 Normalization: Scaling the features to have unit norm (length) by dividing
each value by the square root of the sum of squares of all values.
o Logarithm: Applying the logarithm function to the features to reduce their range or
magnitude and make them more symmetric or normal. Logarithm can help to handle
skewed or exponential features and reduce the effect of outliers or extreme values.
The logarithm function can be expressed as log(value).
 Square root: Applying the square root function to the features to reduce their range or
magnitude and make them more symmetric or normal. Square root can help to handle
skewed or quadratic features and reduce the effect of outliers or extreme values. The
square root function can be expressed as sqrt(value).
o Power: Applying the power function to the features to increase their range or
magnitude and make them more skewed or exponential. Power can help to
handle symmetric or normal features and increase the effect of outliers or ex-
treme values. The power function can be expressed as value^p, where p is a
positive or negative exponent.

15. Feature Construction

 Feature construction is a technique for feature engineering that involves creating new features
from existing ones by using some logic, computation, or combination. Feature construction
can enhance the performance and quality of the machine learning models by adding more in-
formation or structure to the data and capturing more complex or nonlinear relationships be-
tween the features and the output.
 Creating new features from existing ones involves applying some operation or function to the
existing features to generate new features that are more relevant or useful for the problem at
hand. Different operations or functions may have different effects or benefits on the features
and the models. Some common operations or functions for feature construction are:
o Arithmetic: Performing arithmetic operations such as addition, subtraction, multipli-
cation, division, etc. on the existing features to create new features that represent
some meaningful or interesting quantity or ratio. For example, creating a new feature
that represents the body mass index (BMI) by dividing the weight by the square of the
height.
o Logical: Performing logical operations such as and, or, not, etc. on the existing fea-
tures to create new features that represent some condition or rule. For example, creat-
ing a new feature that represents whether a person is obese or not by using a logical
expression such as BMI > 30.
o Polynomial: Performing polynomial operations such as raising to a power, taking a
root, etc. on the existing features to create new features that represent some higher-
order or lower-order term. For example, creating a new feature that represents the
square of the age by using a polynomial expression such as age^2.
o Trigonometric: Performing trigonometric operations such as sine, cosine, tangent, etc.
on the existing features to create new features that represent some periodic or cyclic
pattern. For example, creating a new feature that represents the sine of the hour by
using a trigonometric expression such as sin(hour).
112

 Feature engineering for specific applications involves creating new features that are tailored
or customized for a particular domain or field that requires some domain knowledge or exper-
tise. For example, creating new features for text data may involve using natural language pro-
cessing techniques such as tokenization, stemming, lemmatization, etc., while creating new
features for image data may involve using computer vision techniques such as edge detection,
segmentation, feature extraction, etc.

16. Feature Extraction

 Feature extraction is a technique for feature engineering that involves reducing the dimen-
sionality or number of features by using some technique that preserves the essential infor-
mation or structure of the data. Feature extraction can improve the performance and quality of
the machine learning models by removing noise, redundancy, or irrelevance from the data and
simplifying or optimizing the computation or storage of the models.
 Dimensionality reduction techniques are methods or algorithms that can transform high-di-
mensional data into low-dimensional data by using some principle or criterion. Different tech-
niques may have different objectives or assumptions for dimensionality reduction, such as
variance, distance, correlation, etc. Some common dimensionality reduction techniques are:

o Principal component analysis (PCA): A technique that can reduce the dimensionality
of numerical data by finding linear combinations of the original features that capture
the maximum variance in the data. PCA can produce orthogonal and uncorrelated fea-
tures called principal components (PCs) that can be ranked by their importance or ex-
plained variance.
o Singular value decomposition (SVD): A technique that can decompose any matrix
into three matrices called U, S, and V, where U and V are orthogonal matrices and S
is a diagonal matrix. SVD can be used to reduce the dimensionality of numerical data
by selecting only the largest singular values in S and their corresponding columns in
U and V.
o Latent Dirichlet allocation (LDA): A technique that can reduce the dimensionality of
text data by finding probabilistic distributions of topics over words and documents
113

over topics. LDA can produce latent and interpretable features called topics that can
capture the main themes or concepts in the text data.
 Feature selection methods are methods or algorithms that can select a subset of features that
are relevant or important for the problem at hand by using some criterion or method. Different
methods may have different approaches or strategies for feature selection, such as filter, wrap-
per, embedded, etc. Some common feature selection methods are:
o Correlation: A method that can measure how strongly two variables are related to
each other by using some statistic such as Pearson’s correlation coefficient, Spear-
man’s rank correlation coefficient, etc. Correlation can be used to select features that
have high correlation with the output and low correlation with each other.
o Mutual information: A method that can measure how much information two variables
share with each other by using some metric such as entropy, conditional entropy, etc.
Mutual information can be used to select features that have high mutual information
with the output and low mutual information with each other.
o Chi-square test: A method that can test how likely two categorical variables are inde-
pendent of each other by using some statistic such as chi-square statistic, p-value, etc.
Chi-square test can be used to select features that have high chi-square statistic or low
p-value with the output.
114

UNIT-III: Regression
17. Introduction to Regression Analysis

 Regression analysis is a type of supervised learning that involves finding a function that best
fits the relationship between one or more input variables (called predictors or independent
variables) and one output variable (called response or dependent variable).
 Regression analysis is useful for understanding how the output variable changes with respect
to the input variables, predicting the output variable for new or unseen input values, testing
hypotheses or theories about the relationship between the variables, etc.

 Linear regression is the simplest and most common type of regression analysis, where the
function that models the relationship between the input and output variables is a linear or
straight line equation of the form:

y=β0+β1x+ϵ

where:

 y is the output variable


 x is the input variable
 β0 is the intercept or constant term
 β1 is the slope or coefficient term
 ϵ is the error or residual term
 Linear regression can model a single output variable with one input variable (called
simple linear regression) or multiple output variables with multiple input variables
(called multiple linear regression).
 Linear regression can be estimated or fitted to the data using various methods or tech-
niques, such as ordinary least squares (OLS), maximum likelihood estimation (MLE),
gradient descent, etc. The goal of these methods is to find the values of β0 and β1 that
115

minimize the sum of squared errors (SSE) or the mean squared error (MSE) between
the actual and predicted output values.

18. Multiple Linear Regression

 Multiple linear regression is a type of linear regression that involves modeling one output
variable with two or more input variables. Multiple linear regression can capture more com-
plex or nonlinear relationships between the variables and explain more variation in the out-
put variable.


 Multiple linear regression can be expressed by a linear equation of the form:

y=β0+β1x1+β2x2+...+βnxn+ϵ

where:

 y is the output variable


 x1,x2,...,xn are the input variables
 β0 is the intercept or constant term
 β1,β2,...,βn are the slope or coefficient terms
 ϵ is the error or residual term
 Multiple linear regression can be estimated or fitted to the data using similar methods
or techniques as simple linear regression, such as ordinary least squares (OLS), maxi-
mum likelihood estimation (MLE), gradient descent, etc. The goal of these methods is
to find the values of β0,β1,...,βn that minimize the sum of squared errors (SSE) or the
mean squared error (MSE) between the actual and predicted output values.
 Assumptions in regression analysis are conditions or requirements that need to be sat-
isfied by the data and the model for the regression analysis to be valid and reliable.
Some common assumptions in regression analysis are:
o Linearity: The relationship between the input and output variables is linear or can be
approximated by a linear function.
116

o Independence: The error terms are independent of each other and of the input vari-
ables.
o Homoscedasticity: The error terms have constant or equal variance across different
values of the input variables.
o Normality: The error terms follow a normal or Gaussian distribution with zero mean
and constant variance.
o No multicollinearity: The input variables are not correlated or related to each other.

19. Main Problems in Regression Analysis

 Regression analysis is not a perfect or flawless technique that can solve any problem without
any difficulties or drawbacks. There are some main problems or challenges that need to be
addressed when applying regression analysis to real-world problems, such as overfitting and
underfitting, non-linearity in data, etc.
 Overfitting and underfitting are two common problems that affect the accuracy and generali-
zation of regression models. Overfitting and underfitting are related to the bias-variance
tradeoff, which states that there is an inverse relationship between bias and variance, such
that increasing one will decrease the other and vice versa.
o Overfitting is the problem where the model fits the training data too well and cap-
tures the noise or randomness in the data, but fails to generalize to new or unseen
data. Overfitting results in high variance and low bias, which means that the model
has high sensitivity or instability across different data sets and low error or difference
between the expected and actual output values.
o Underfitting is the problem where the model fits the training data too poorly and
misses the true relationship or function of interest, but performs similarly on new or
unseen data. Underfitting results in low variance and high bias, which means that
the model has low sensitivity or instability across different data sets and high error
or difference between the expected and actual output values.
o An optimal model is a balanced or well-fitting model that fits the training data rea-
sonably well and generalizes to new or unseen data. An optimal model has moderate
variance and bias, which means that the model has moderate sensitivity or instabil-
ity across different data sets and low error or difference between the expected and
actual output values.
 Polynomial regression is a type of regression analysis that involves modeling a non-linear re-
lationship between the input and output variables by using a polynomial function of the
form:

y=β0+β1x+β2x2+...+βnxn+ϵ

where:

 y is the output variable


 x is the input variable
 β0,β1,...,βn are the slope or coefficient terms
 ϵ is the error or residual term
 n is the degree or order of the polynomial
 Polynomial regression can handle non-linearity in data by adding higher-order or
lower-order terms to the linear equation. Polynomial regression can improve the
model fit and performance by capturing more complex or nonlinear patterns in the
data, but it may also increase the risk of overfitting or underfitting depending on the
degree of the polynomial.
117

20. Logistic Regression

 Logistic regression is a type of regression analysis that involves modeling a binary classifica-
tion problem using a regression function. Binary classification is a supervised learning prob-
lem where the goal is to predict whether an input belongs to one of two classes or catego-
ries, such as yes or no, positive or negative, etc.


 Logistic regression can model a binary classification problem by using a logistic function or
sigmoid function of the form:

p=1+e−(β0+β1x)1

where:

 p is the probability of belonging to the positive class


 x is the input variable
 β0 is the intercept or constant term
 β1 is the slope or coefficient term
 Logistic regression can estimate or fit the parameters β0 and β1 using various meth-
ods or techniques, such as maximum likelihood estimation (MLE), gradient descent,
118

etc. The goal of these methods is to find the values of β0 and β1 that maximize the
likelihood or probability of observing the actual output values given the input values.
 Probability estimation and decision threshold are important aspects of logistic regres-
sion, as they determine how to make predictions or decisions based on the logistic
function. Probability estimation and decision threshold involve:

 xi is the input value for the i-th data point


 β0,β1,...,βp are the parameters of the model
 n is the number of data points
 p is the number of parameters
 λ is a hyperparameter that controls the strength of regularization
 Elastic net regression: A regularized linear regression model that uses a combination
of L1 and L2 norms as the regularization term. Elastic net regression can balance the
advantages and disadvantages of ridge and lasso regression by reducing and eliminat-
ing some of the parameters. Elastic net regression can be expressed by an objective
function of the form:

βmini=1∑n(yi−(β0+β1xi))2+λ1j=1∑p∣βj∣+λ2j=1∑pβj2

where:

 yi is the actual output value for the i-th data point


 xi is the input value for the i-th data point
 β0,β1,...,βp are the parameters of the model
 n is the number of data points
 p is the number of parameters
 λ1 and λ2 are hyperparameters that control the strength of regularization and the bal-
ance between L1 and L2 norms.
119

UNIT-IV: Supervised Learning: Classification


22. Introduction to Classification

 Classification is a type of supervised learning, where the goal is to predict the class label of
an input based on some features.
 A class label is a categorical variable that represents the category or group that the input be-
longs to, such as “spam” or “not spam” for an email, or “positive” or “negative” for a senti-
ment analysis.
 Features are numerical or categorical variables that describe some characteristics or proper-
ties of the input, such as the number of words, the presence of certain keywords, or the tone
of the text.
 A classification model is a function that maps an input to a class label, based on some param-
eters that are learned from a set of training data.
 Training data is a collection of inputs and their corresponding class labels, which are used
to train or fit the classification model.
 A decision boundary is a surface that separates the input space into different regions, each
corresponding to a class label. For example, in a two-dimensional space, a decision boundary
can be a line, a curve, or a complex shape.
 A classification model can be evaluated based on its accuracy, which is the proportion of in-
puts that are correctly classified by the model. Other metrics such as precision, recall,
and F1-score can also be used to measure the performance of a classification model.

Example: Email Spam Classification

 One of the applications of classification is email spam detection, where the goal is to classify
an email as either “spam” or “not spam” based on its content.
 The features for this task can be the frequency of certain words, the length of the email, the
sender’s address, etc.
 The class label for this task is either “spam” or “not spam”, which can be represented by 1 or
0 respectively.
 A classification model for this task can be a logistic regression model, which learns a set of
parameters that define a linear decision boundary in the feature space.
120

 The training data for this task can be a collection of emails and their labels, which can be ob-
tained from public datasets or from user feedback.
 The accuracy of the classification model can be measured by comparing its predictions with
the true labels on a separate set of test data, which is not used for training.

23. Classification Learning Steps

 The general steps for classification learning are as follows:


o Data preparation
o Model training
o Model evaluation
o Model deployment

Data Preparation for Classification

 Data preparation is the process of transforming raw data into a suitable format for classifica-
tion learning. It involves the following steps:
o Data collection: obtaining relevant and sufficient data from various sources
o Data cleaning: removing or correcting missing, noisy, or inconsistent data
o Data exploration: analyzing and visualizing data to understand its characteristics and
distribution
o Data preprocessing: applying techniques such as feature extraction, feature selection,
feature scaling, feature encoding, etc. to enhance the quality and usability of data
o Data splitting: dividing data into training, validation, and test sets

Training a Classification Model

 Training a classification model is the process of finding the optimal parameters that minimize
a loss function or maximize an objective function on the training data. It involves the fol-
lowing steps:
o Model selection: choosing an appropriate classification algorithm and its hyperparam-
eters
o Model fitting: applying an optimization algorithm such as gradient descent, stochastic
gradient descent, etc. to update the parameters iteratively until convergence
o Model validation: using the validation data to tune the hyperparameters and select the
best model
121

Model Evaluation

 Model evaluation is the process of assessing the performance and generalization ability of a
classification model on unseen data. It involves the following steps:
o Model testing: using the test data to measure the accuracy and other metrics of the
model
o Model comparison: comparing different models based on their metrics and trade-offs
o Model interpretation: explaining how the model works and why it makes certain pre-
dictions

Model Deployment

 Model deployment is the process of making a classification model available for practical use
in real-world scenarios. It involves the following steps:
o Model integration: integrating the model with other systems or applications
o Model monitoring: tracking and updating the model based on new data and feedback
o Model maintenance: fixing and improving the model based on errors and changes

24. Common Classification Algorithms

 There are many classification algorithms that can be used for different tasks and data types.
Some of the common ones are:

k-Nearest Neighbors (kNN)

 kNN is a simple and intuitive classification algorithm that predicts the class label of an input
based on the k most similar inputs in the training data.
 The similarity between inputs is measured by a distance metric, such as Euclidean distance,
Manhattan distance, etc.
 The class label of an input is determined by a majority vote of its k nearest neighbors, or by
a weighted vote based on the inverse of the distances.
122

 kNN is a lazy learner, which means it does not learn any parameters from the training data,
but rather stores the entire data and performs the computation at the time of prediction.
 kNN is easy to implement and understand, but it can be slow and memory-intensive for large
datasets. It can also be sensitive to noise, outliers, and irrelevant features.

Support Vector Machines (SVM)

 SVM is a powerful and flexible classification algorithm that learns a linear or non-linear de-
cision boundary that maximizes the margin between different classes.
 The margin is the distance between the decision boundary and the closest points from each
class, which are called support vectors.
 SVM can learn non-linear decision boundaries by using a kernel function, which transforms
the original feature space into a higher-dimensional space where the data becomes more sepa-
rable.
 SVM can also handle multi-class classification by using strategies such as one-vs-one, one-
vs-all, etc.
 SVM is effective and robust for high-dimensional and complex data, but it can be computa-
tionally expensive and sensitive to hyperparameters and kernel choice.
123

Random Forest

 Random forest is an ensemble classification algorithm that combines multiple decision


trees to produce a more accurate and stable prediction.
 A decision tree is a simple and interpretable classification algorithm that splits the feature
space into regions based on a series of if-then rules that are learned from the training data.
 A random forest builds many decision trees by using two sources of randomness: bootstrap
sampling and feature sampling.
o Bootstrap sampling is a technique that creates different subsets of the training data by
sampling with replacement
o Feature sampling is a technique that selects a random subset of features at each split
of a decision tree
 A random forest predicts the class label of an input by taking the majority vote or the aver-
age probability of all the decision trees in the ensemble.
 Random forest is fast and scalable for large datasets, and can handle noise, outliers, missing
values, and unbalanced data. It can also provide measures of feature importance and uncer-
tainty estimation.


124

UNIT-V: Other Types of Learning


25. Ensemble Learning

 Ensemble learning is a technique that combines multiple base learners or weak learners to
create a strong learner or an ensemble that has better performance and generalization ability
than any individual learner.
 A base learner is a simple and basic machine learning algorithm, such as a decision tree, a lo-
gistic regression, a kNN, etc.
 A strong learner is a complex and powerful machine learning algorithm, such as a random for-
est, a boosted tree, a neural network, etc.
 The motivation for ensemble learning is based on the wisdom of crowds principle, which
states that the collective opinion of a group of individuals is more accurate and reliable than
the opinion of any single individual.
 There are three main techniques for ensemble learning: bagging, boosting, and stacking.

Bagging

 Bagging stands for bootstrap aggregating, which is a technique that creates multiple base
learners by using bootstrap sampling and then aggregates their predictions by taking
the majority vote or the average probability.
 Bootstrap sampling is a technique that creates different subsets of the training data by sam-
pling with replacement, which means that some samples may appear more than once or not at
all in each subset.
 Bagging reduces the variance of the base learners, which means that it makes them less sen-
sitive to small changes in the data. It also reduces the risk of overfitting, which means that it
makes them more adaptable to unseen data.
 An example of bagging is random forest, which is an ensemble of decision trees that are
trained on different bootstrap samples and different feature subsets.

Boosting

 Boosting is a technique that creates multiple base learners by using sequential learning,
which means that each base learner is trained on a modified version of the training data that
125

depends on the performance of the previous base learners. The final prediction is obtained by
taking a weighted vote or a weighted sum of all the base learners.
 Boosting increases the accuracy of the base learners, which means that it makes them more
capable of capturing complex patterns in the data. It also reduces the risk of underfitting,
which means that it makes them more expressive and flexible.
 An example of boosting is AdaBoost, which is an adaptive boosting algorithm that assigns
higher weights to the samples that are misclassified by the previous base learners and lower
weights to the samples that are correctly classified.

Stacking

 Stacking is a technique that creates multiple base learners by using any machine learning al-
gorithm and then trains another machine learning algorithm, called a meta learner or
a blender, on the predictions of the base learners. The final prediction is obtained by applying
the meta learner on the predictions of the base learners.
 Stacking improves the diversity of the base learners, which means that it makes them more
complementary and independent from each other. It also improves the robustness of the en-
semble, which means that it makes it less prone to errors and biases.
 An example of stacking is stacked generalization, which is a general framework for stacking
that can use any machine learning algorithm as a base learner or a meta learner.

26. AdaBoost

 AdaBoost stands for adaptive boosting, which is one of the most popular and influential
boosting algorithms. It was proposed by Yoav Freund and Robert Schapire in 1996.
 AdaBoost is based on the idea of creating a strong learner by combining multiple weak learn-
ers, where each weak learner is a binary classifier that has an accuracy slightly better than
random guessing.
 AdaBoost works by iteratively adding weak learners to the ensemble, where each weak
learner is trained on a weighted version of the training data, where the weights are updated
based on the errors made by the previous weak learners. The final prediction is obtained by
taking a weighted vote of all the weak learners in the ensemble.
126


 The concept and workflow of AdaBoost can be summarized as follows:
o Initialize all the sample weights to be equal and normalized, such that they sum up to
one
o For each iteration:
 Train a weak learner on the weighted training data
 Calculate the error rate and the weight coefficient of the weak learner
 Update the sample weights by increasing the weights of the misclassified
samples and decreasing the weights of the correctly classified samples
 Normalize the sample weights to make them sum up to one
o Output the final strong learner as a weighted combination of all the weak learners

Example: AdaBoost Algorithm for Binary Classification


 Suppose we have a binary classification problem with two classes: +1 and -1. We
have N training samples: (x1, y1), (x2, y2), …, (xN, yN), where xi is the feature vec-
tor and yi is the class label. We want to use AdaBoost to create a strong learner H(x)
that can predict the class label of any input x.
 The AdaBoost algorithm for binary classification can be described as follows:
o Initialize the sample weights: wi = 1/N for i = 1, 2, …, N
o For t = 1, 2, …, T:
 Train a weak learner ht(x) on the weighted training data
 Calculate the error rate of ht(x): et = sum(wi * I(yi != ht(xi))) / sum(wi),
where I is the indicator function that returns 1 if the condition is true and 0
otherwise
 Calculate the weight coefficient of ht(x): at = 0.5 * log((1 - et) / et)
 Update the sample weights: wi = wi * exp(-at * yi * ht(xi)) for i = 1, 2, …, N
 Normalize the sample weights: wi = wi / sum(wi) for i = 1, 2, …, N
o Output the final strong learner: H(x) = sign(sum(at * ht(x))) for t = 1, 2, …, T, where
sign is the sign function that returns +1 if the argument is positive and -1 otherwise
127

27. Gradient Boosting Machines (GBM)

 Gradient boosting machines (GBM) are a type of boosting algorithm that use gradient de-
scent to optimize the loss function of the ensemble. They were proposed by Jerome Friedman
in 2001.


 GBM are based on the idea of creating a strong learner by adding multiple base learners to the
ensemble, where each base learner is a regression tree that fits the negative gradient of the
loss function at each iteration. The final prediction is obtained by taking a weighted sum of all
the base learners in the ensemble.
 A regression tree is a type of decision tree that predicts a continuous value instead of a class
label. It splits the feature space into regions based on a series of if-then rules that are learned
from the training data. The prediction of a regression tree is the average value of the samples
in each region.
 The negative gradient of the loss function is the direction that points to the steepest descent of
the loss function. It indicates how much and in what direction the prediction should be ad-
justed to reduce the loss.
 The concept and workflow of GBM can be summarized as follows:
o Initialize the prediction to be a constant value that minimizes the loss function
o For each iteration:
 Calculate the negative gradient of the loss function for each sample
 Train a regression tree on the negative gradient as the target value
 Calculate the weight coefficient of the regression tree by using line search or
shrinkage
 Update the prediction by adding the weighted regression tree
o Output the final strong learner as the sum of all the weighted regression trees

Example: GBM Algorithm for Regression


 Suppose we have a regression problem with one output variable: y. We have N train-
ing samples: (x1, y1), (x2, y2), …, (xN, yN), where xi is the feature vector and yi is
the output value. We want to use GBM to create a strong learner F(x) that can predict
the output value of any input x. We use the mean squared error (MSE) as our loss
function: L(y, F) = (y - F)^2 / 2.
 The GBM algorithm for regression can be described as follows:
o Initialize the prediction to be a constant value: F0(x) = argminc(sum((yi - c)^2 / 2))
o For m = 1, 2, …, M:
128

 Calculate the negative gradient of the loss function for each sample: ri = -(yi -
Fm-1(xi))
 Train a regression tree hm(x) on (xi, ri) as the target value
 Calculate the weight coefficient of hm(x) by using line search or shrinkage:
am = argminc(sum((ri - c * hm(xi))^2 / 2)) or am = v * argminc(sum((ri - c *
hm(xi))^2 / 2)), where v is a small positive constant
 Update the prediction by adding the weighted regression tree: Fm(x) = Fm-
1(x) + am * hm(x)
o Output the final strong learner: F(x) = F0(x) + sum(am * hm(x)) for m = 1, 2, …, M

XGBoost

 XGBoost stands for extreme gradient boosting, which is an efficient and scalable implemen-
tation of GBM.

 It was developed by Tianqi Chen and his team at the University of Washington in 2014.

 XGBoost improves the performance and speed of GBM by using several techniques, such as:
o Regularization: adding a penalty term to the loss function to prevent overfitting and
improve generalization
o Sparsity-awareness: handling missing values and sparse features efficiently and au-
tomatically
o Weighted quantile sketch: using a novel data structure to find the optimal split
points for the regression trees
o Out-of-core computation: using external memory to handle large-scale data that
cannot fit into the main memory
o Parallel and distributed learning: using multiple CPU cores or machines to speed
up the training process
o Cache optimization: using hardware-aware design to optimize the memory access
and reduce the computation cost

 XGBoost has become one of the most popular and widely used machine learning algorithms,
especially for Kaggle competitions, where it has won several awards and prizes. It has also
129

been adopted by many companies and organizations for various applications, such as web
search, recommendation systems, fraud detection, etc.

28. Reinforcement Learning

 Reinforcement learning is a type of machine learning that deals with learning from ac-
tions and rewards, rather than from features and labels. It is inspired by the way humans and
animals learn from trial and error and feedback.
 Reinforcement learning is based on the idea of creating an agent that can interact with an en-
vironment and learn an optimal policy that maximizes a long-term reward or value. The
agent does not have any prior knowledge or supervision about the environment or the task,
but rather learns from its own experience and exploration.
 The concept and workflow of reinforcement learning can be summarized as follows:
o Define the agent, the environment, the state space, the action space, and the reward
function
o Initialize the agent’s policy or value function randomly or heuristically
o For each episode or iteration:
 Observe the current state of the environment
 Select an action based on the current policy or value function, possibly with
some exploration
 Execute the action and observe the next state and the reward
 Update the policy or value function based on the observed transition and re-
ward
o Output the final optimal policy or value function

Example: Reinforcement Learning Paradigm

 One of the classic examples of reinforcement learning is the game of chess, where the goal is
to create an agent that can play chess against human or computer opponents.
 The agent is the chess player, who can control the pieces on the board.
 The environment is the chess board, which has 64 squares and 32 pieces of two colors: white
and black.
 The state space is the set of all possible configurations of the board, which is very large (esti-
mated to be 10^120).
 The action space is the set of all possible moves that can be made by the agent at each state,
which depends on the rules of chess and the position of the pieces.
 The reward function is a scalar value that indicates how desirable a state or an action is for the
agent. For example, a simple reward function can be +1 for winning, -1 for losing, and 0 for
130

drawing or continuing. A more sophisticated reward function can take into account other fac-
tors, such as material advantage, positional advantage, checkmate threat, etc.

Q-learning
Q-learning is a type of reinforcement learning, which is a branch of machine learning that
deals with learning from actions and rewards. Q-learning is a model-free, off-policy algo-
rithm that learns the value of an action in a given state, without requiring a model of the envi-
ronment or the transition probabilities. Q-learning can find an optimal policy for any finite
Markov decision process, given enough exploration time and a partly-random policy1
The basic idea of Q-learning is to create a table, called a Q-table, that stores the expected fu-
ture reward, or Q-value, for each state-action pair. The Q-value represents how good it is to
take a certain action in a certain state. The Q-learning algorithm updates the Q-table itera-
tively, based on the observed rewards and the next states. The final Q-table can be used to se-
lect the best action in each state, by choosing the action with the highest Q-value.
Q-learning follows these steps:

 Initialize the Q-table with zeros or random values


 Observe the current state
 Choose an action based on exploration or exploitation
 Execute the action and observe the reward and the next state
 Update the Q-value for the state-action pair using the Bellman equation
 Repeat until convergence or termination

The Bellman equation is a recursive formula that relates the Q-value of a state-action pair to
the Q-values of the next state-action pairs. It is given by:
Q(state, action) = reward + gamma * max Q(next_state, all_actions)
where gamma is a discount factor that controls how much the future rewards are valued.
Q-learning is a simple and powerful algorithm that can learn to solve complex problems with
stochastic and dynamic environments. However, it also has some limitations, such as:

 It requires a large amount of memory and computation to store and update the Q-table for
large state and action spaces
 It can be slow to converge or even diverge in some cases
 It can be sensitive to the choice of parameters, such as learning rate, discount factor, and ex-
ploration strategy

To overcome these limitations, some extensions and variations of Q-learning have been pro-
posed, such as deep Q-learning, which uses a neural network to approximate the Q-function;
double Q-learning, which reduces the overestimation bias of Q-learning; and dueling Q-learn-
ing, which separates the estimation of state value and action advantage.
131

PRINCILES OF ARTIFICIAL INTELLIGENCE


UNIT - I: Introduction to AI, Intelligent Agents, and Problem Solving
1. Introduction to AI
1.1 Definition of Artificial Intelligence

 Artificial Intelligence (AI) is the study of how to create machines and systems that can per-
form tasks that normally require human intelligence, such as reasoning, learning, perception,
decision making, and problem solving.
 AI is also the field that aims to understand the nature of intelligence and its underlying mech-
anisms.
 AI is a broad and interdisciplinary field that draws from computer science, mathematics, psy-
chology, linguistics, philosophy, neuroscience, and other disciplines.

1.2 Applications of AI in Various Fields

 AI has many applications in various fields, such as:


o Robotics: AI can be used to design and control robots that can perform tasks in com-
plex and dynamic environments, such as manufacturing, exploration, surgery, and en-
tertainment.
o Healthcare: AI can be used to diagnose diseases, recommend treatments, analyze
medical images, discover new drugs, and assist doctors and patients.
o Finance: AI can be used to predict market trends, optimize portfolios, detect fraud,
automate trading, and provide financial advice.
o Education: AI can be used to personalize learning, tutor students, grade assignments,
generate questions, and enhance teaching.
o Entertainment: AI can be used to create games, music, art, stories, movies, and virtual
reality experiences.
o And many more: AI can be used to improve transportation, security, communication,
social media, e-commerce, agriculture, environment, etc.
132

1.3 History of AI: Key Milestones and Evolution

 The history of AI can be divided into four main periods:


o The Birth of AI (1943-1955): The foundations of AI were laid by pioneers such as
Alan Turing (who proposed the Turing test), John von Neumann (who developed the
theory of computation), Warren McCulloch and Walter Pitts (who modeled artificial
neurons), and Claude Shannon (who applied information theory to chess playing).
o The Golden Age of AI (1956-1973): The term “artificial intelligence” was coined by
John McCarthy at the Dartmouth conference in 1956. This period saw the develop-
ment of many influential AI systems and techniques, such as the Logic Theorist (the
first AI program), the General Problem Solver (the first general-purpose problem
solver), ELIZA (the first chatbot), SHRDLU (a natural language understanding sys-
tem), Perceptrons (the first neural networks), and DENDRAL (an expert system for
organic chemistry).
o The AI Winter (1974-1980): The limitations and challenges of AI became apparent in
this period. Some of the factors that contributed to the decline of AI were the diffi-
culty of scaling up symbolic systems to handle common sense knowledge and reason-
ing, the lack of funding and support from governments and industries, and the criti-
cism from philosophers and psychologists.
o The Renaissance of AI (1981-present): The revival of AI was driven by several fac-
tors, such as the emergence of new paradigms (such as connectionism, evolutionary
computation, probabilistic reasoning), the availability of large amounts of data and
computing power, the integration of different subfields and disciplines (such as ma-
chine learning, computer vision, natural language processing), and the success of var-
ious applications (such as IBM’s Deep Blue beating the world chess champion in
133

1997, Google’s AlphaGo beating the world Go champion in 2016, OpenAI’s GPT-3
generating natural language texts in 2020).

1.4 Types of AI: Narrow AI vs. General AI

 There are different ways to classify AI systems based on their capabilities and goals:
o Narrow AI: Also known as weak AI or applied AI. It refers to AI systems that are de-
signed to perform specific tasks or domains that require a limited amount of intelli-
gence, such as face recognition, speech recognition, spam filtering, self-driving cars,
etc. Most of the current AI systems fall into this category.
o General AI: Also known as strong AI or artificial general intelligence (AGI). It refers
to AI systems that can achieve human-level intelligence or beyond across a wide
range of tasks or domains that require general intelligence, such as common sense
reasoning, natural language understanding, creativity, planning, etc. This is the ulti-
mate goal of many AI researchers, but it is still far from being realized.

o
134

2. Intelligent Agents
2.1 Agents and Rationality: What is an Agent, Rational Behavior

 An agent is anything that can perceive its environment through sensors and act upon it
through actuators.
 An agent can be a physical entity (such as a robot, a car, or a human) or a software entity
(such as a chatbot, a game, or a program).
 An agent can be simple (such as a thermostat or a calculator) or complex (such as a chess
player or a self-driving car).
 An agent can be autonomous (such as a Mars rover or a Roomba) or interactive (such as a
personal assistant or a social media platform).
 Rationality is the quality of doing the right thing, given what the agent knows and what the
agent wants.
 Rationality depends on four factors: the agent’s performance measure, the agent’s prior
knowledge, the agent’s perceptual capabilities, and the agent’s actions.
 A rational agent is an agent that always acts to achieve the best outcome or, when there is un-
certainty, the best expected outcome, according to its performance measure.

2.2 Structure of Agents: Perception, Decision-making, Action

 The basic structure of an agent consists of three components: perception, decision-making,


and action.
 Perception: The process of acquiring information from the environment through sensors. Sen-
sors can be physical devices (such as cameras, microphones, or thermometers) or abstract in-
puts (such as databases, files, or messages).
 Decision-making: The process of selecting an action to perform based on the current state and
the desired goals. Decision-making can be based on different methods, such as rules, models,
goals, utilities, or learning.
 Action: The process of executing an action in the environment through actuators. Actuators
can be physical devices (such as motors, speakers, or printers) or abstract outputs (such as dis-
plays, sounds, or texts).

2.3 Agent-Environment Interaction: Interaction with the Surrounding Environment

 The interaction between an agent and its environment can be characterized by several proper-
ties:
o Fully observable vs. partially observable: An environment is fully observable if the
agent can access the complete state of the environment at each point in time. An envi-
ronment is partially observable if the agent can access only some aspects of the envi-
ronment or if its sensors are noisy or inaccurate.
o Deterministic vs. stochastic: An environment is deterministic if the next state of the
environment is completely determined by the current state and the action executed by
the agent. An environment is stochastic if there is some uncertainty about the next
state of the environment.
o Episodic vs. sequential: An environment is episodic if the agent’s experience is di-
vided into atomic episodes, where each episode consists of one perception and one
action, and the outcome of each action does not depend on previous actions. An envi-
ronment is sequential if the agent’s current decision affects all future decisions.
o Static vs. dynamic: An environment is static if the environment does not change while
the agent is deliberating. An environment is dynamic if the environment can change
while the agent is deliberating.
135

o Discrete vs. continuous: An environment is discrete if there are a finite number of dis-
tinct and clearly defined states, actions, and percepts. An environment is continuous if
there are an infinite number of possible states, actions, and percepts.
o Single-agent vs. multi-agent: An environment is single-agent if the agent is the only
entity that affects the environment. An environment is multi-agent if there are other
agents that affect the environment.

2.4 Types of Agents: Simple Reflex Agents, Model-Based Agents, Goal-Based Agents,

Utility-Based Agents, Learning Agents

 There are different types of agents that differ in their decision-making methods:
o Simple reflex agents: These are agents that act based on their current percept
only, without any memory or knowledge of the past or future. They use condi-
tion-action rules to map percepts to actions. For example, a simple reflex agent
for driving a car might have rules like:
 If car-in-front-is-braking then initiate-braking
 If light-is-green then accelerate
 If light-is-red then stop

Simple reflex agents are easy to implement but they are limited by their lack
of memory and knowledge. They can only handle fully observable environ-
ments and they cannot plan ahead or learn from their experience.
o Model-based agents: These are agents that maintain an internal state that rep-
resents some aspects of the environment that are not directly observable. They
use a model of how the world works to update their state based on their per-
cepts and actions. They also use condition-action rules to map states to ac-
tions. For example, a model-based agent for driving a car might have rules
like:

 If car-in-front-is-braking then initiate-braking


 If light-is-green then accelerate
 If light-is-red then stop
 If state-is-off-road then steer-back-to-road
136

Model-based agents are more flexible and robust than simple reflex agents, as they can han-
dle partially observable environments and adapt to changes. However, they still cannot plan
ahead or pursue long-term goals.
 Goal-based agents: These are agents that have some explicit goals that they want to
achieve, and they act based on their current state and their goal state. They use a
search or planning algorithm to find a sequence of actions that leads from the current
state to the goal state. For example, a goal-based agent for driving a car might have a
goal like:
 Reach destination X in the shortest time possible

Goal-based agents are more intelligent and rational than model-based agents, as they can plan
ahead and optimize their actions. However, they still cannot handle uncertain or complex en-
vironments or trade-off between conflicting goals.
137

 Utility-based agents: These are agents that have some preferences or values that they
want to maximize, and they act based on their current state and their utility function.
They use a decision theory or reinforcement learning algorithm to find the best action
that maximizes their expected utility. For example, a utility-based agent for driving a
car might have a utility function like:
 Utility = - (travel time) - (fuel cost) - (traffic violations) + (comfort) + (safety)

Utility-based agents are more flexible and realistic than goal-based agents, as they can handle
uncertain or complex environments and trade-off between conflicting goals. However, they
still cannot learn from their experience or improve their performance over time.
 Learning agents: These are agents that can learn from their experience and improve
their performance over time. They have four components: a learning element, a per-
formance element, a critic, and a problem generator. The learning element uses feed-
back from the critic to improve the agent’s knowledge or behavior. The performance
element uses the agent’s knowledge or behavior to select actions. The critic evaluates
the agent’s actions and provides feedback. The problem generator suggests actions
that lead to new and informative experiences. For example, a learning agent for driv-
ing a car might use the following components:


138

 Learning element: A neural network that learns to map states to actions based on re-
wards and penalties
 Performance element: A controller that executes the actions suggested by the neural
network
 Critic: A sensor that measures the distance to the destination, the fuel level, the traffic
rules, the comfort level, and the safety level
 Problem generator: A randomizer that occasionally chooses different routes or speeds
Learning agents are the most advanced and adaptive type of agents, as they can learn from
their experience and improve their performance over time. They can also discover new
knowledge or behavior that was not programmed by the designer.
3. Problem Solving
3.1 Problems in AI: Well-defined Problems and Goal States

 A problem is a situation that requires an agent to find a solution or a course of action that sat-
isfies some criteria or constraints.
 A problem can be formalized as a tuple (S, A, T, G, C), where:
o S is the set of possible states of the world
o A is the set of possible actions that the agent can perform
o T is the transition function that maps a state and an action to a new state
o G is the goal test function that determines whether a state is a goal state or not
o C is the cost function that assigns a numeric value to each action or path
 A well-defined problem is a problem that has a clear and complete specification of all the
components of the tuple. A well-defined problem has the following properties:
o The initial state is known and unique
o The goal state or states are known and well-defined
o The actions and their effects are known and deterministic
o The cost of each action or path is known and consistent
 A well-defined problem can be solved by finding a sequence of actions that leads from the in-
itial state to a goal state, while minimizing the total cost.
 Examples of well-defined problems in AI are:
o The 8-puzzle: The state is the configuration of eight tiles and a blank space on a 3x3
grid. The actions are moving the blank space up, down, left, or right. The transition
function swaps the blank space with the adjacent tile. The goal state is the configura-
tion where the tiles are ordered from 1 to 8. The cost of each action is 1.
o The Tower of Hanoi: The state is the configuration of n disks of different sizes on
three pegs. The actions are moving a disk from one peg to another. The transition
function moves the disk to the top of another peg. The goal state is the configuration
where all the disks are on the rightmost peg, with the largest disk at the bottom and
the smallest disk at the top. The cost of each action is 1.
o The Traveling Salesperson Problem: The state is the location of a salesperson who
has to visit n cities. The actions are traveling from one city to another. The transition
function changes the location of the salesperson to another city. The goal state is the
location where the salesperson has visited all the cities exactly once and returned to
the starting city. The cost of each action is the distance between the two cities.

3.2 Search Spaces: Defining Problems as State Space Search

 A search space is a representation of all the possible states and actions that are relevant to
solving a problem.
 A search space can be visualized as a graph, where:
o The nodes are the states
139

o The edges are the actions


o The weights are the costs
 A search space can be defined by applying the transition function T to generate all the succes-
sor states from an initial state, until reaching a goal state or exhausting all possibilities.
 A search space can have different characteristics, such as:
o Size: The number of nodes in the graph
o Shape: The branching factor (the average number of successors per node) and the
depth (the length of the longest path from the initial node to any node)
o Cycles: The presence of loops or repeated states in the graph
 A search space can be explored by using different search algorithms, such as:
o Uninformed search: Search algorithms that do not use any domain-specific
knowledge or heuristics, such as breadth-first search, depth-first search, uniform-cost
search, etc.
o Informed search: Search algorithms that use some domain-specific knowledge or heu-
ristics, such as greedy search, A* search, hill-climbing search, etc.

3.3 Production System: Rule-Based Approach to Problem Solving


 A production system is a system that uses rules to represent and manipulate
knowledge in order to solve problems.
 A production system consists of four components:
o A set of production rules: Each rule has an IF part (the condition) and a THEN
part (the action). For example:
 IF there is smoke THEN there is fire
 IF X is a bird THEN X can fly
o A working memory: A collection of facts or data that represent the current
state of knowledge or belief. For example:
 There is smoke
 Tweety is a bird
o A control strategy: A method for selecting which rule to apply next. For exam-
ple:
 Forward chaining: Start from the facts in the working memory and apply
rules that match them until reaching a goal or no more rules can be applied.
 Backward chaining: Start from a goal and apply rules that can infer it from
other facts until reaching the facts in the working memory or no more rules
can be applied.
o A conflict resolution: A method for resolving conflicts when more than one
rule can be applied at the same time. For example:
 Random selection: Choose a rule randomly
 Priority order: Choose a rule based on some predefined order or ranking
 Specificity: Choose a rule that has the most specific condition
 A production system can be used to solve problems by applying rules to transform the
working memory from the initial state to the goal state, while following the control
strategy and the conflict resolution.
3.4 Problem Characteristics: Deterministic vs. Stochastic, Single Agent vs. Multi-Agent

 Problems can have different characteristics that affect the difficulty and the methods of solv-
ing them. Some of these characteristics are:
o Deterministic vs. stochastic: A problem is deterministic if the outcome of each action
is fully determined by the current state and the action. A problem is stochastic if there
is some uncertainty or randomness in the outcome of each action.
140

o Single agent vs. multi-agent: A problem is single agent if there is only one agent that
affects the environment. A problem is multi-agent if there are other agents that affect
the environment, either cooperatively or competitively.
 Deterministic problems can be solved by using search algorithms that find a sequence of ac-
tions that leads to a goal state with certainty. Stochastic problems can be solved by using deci-
sion-making algorithms that find a policy or a strategy that maximizes the expected utility or
reward over time.
 Single agent problems can be solved by assuming that the agent has full control over the envi-
ronment and its actions. Multi-agent problems can be solved by taking into account the ac-
tions and reactions of other agents, either by using game theory or negotiation techniques.

3.5 Issues in Designing Search Programs: Completeness, Optimality, Time Complexity, Space
Complexity

 When designing search programs, there are some issues or criteria that need to be considered,
such as:
o Completeness: The ability of a search algorithm to find a solution if one exists.
o Optimality: The ability of a search algorithm to find the best solution among all possi-
ble solutions, according to some cost function.
o Time complexity: The amount of time or number of steps required by a search algo-
rithm to find a solution.
o Space complexity: The amount of memory or number of nodes stored by a search al-
gorithm to find a solution.
 These issues or criteria are often interrelated and trade-off with each other. For example, a
search algorithm that is complete and optimal may have high time and space complexity,
while a search algorithm that has low time and space complexity may be incomplete or
suboptimal.
 These issues or criteria also depend on some factors, such as:
o The size and shape of the search space
o The branching factor and depth of the search tree
o The presence or absence of cycles in the search graph
o The availability or quality of heuristics or domain knowledge

UNIT - II: Search Algorithms


4. Search Algorithms
4.1 Problem-Solving Agents and Search Algorithms Terminology

 A problem-solving agent is an agent that can formulate and solve problems by finding a se-
quence of actions that leads from an initial state to a goal state.
 A search algorithm is a method for finding a solution or a path to a goal state in a search
space.
 Some common terminology used in search algorithms are:
o State: A representation of a situation or a configuration of the world
o Action: A transformation or a transition from one state to another
o Initial state: The state where the agent starts its search
o Goal state: The state where the agent wants to reach or satisfy some criteria
o Path: A sequence of states and actions that leads from the initial state to a goal state
141

o Solution: A path that reaches a goal state


o Cost: A numeric value assigned to each action or path that measures the quality or the
difficulty of the action or path
o Node: A data structure that contains a state, a parent node, an action, a path cost, and
optionally other information
o Frontier: A data structure that stores the nodes that are generated but not yet expanded
o Expanded: A node that has been visited and its successors have been generated
o Generated: A node that has been created by applying an action to another node

4.2 Properties of Search Algorithms: Completeness, Optimality, Time and Space Complexity

 When evaluating search algorithms, there are some properties or criteria that need to be con-
sidered, such as:
o Completeness: The ability of a search algorithm to find a solution if one exists
o Optimality: The ability of a search algorithm to find the best solution among all possi-
ble solutions, according to some cost function
o Time complexity: The amount of time or number of steps required by a search algo-
rithm to find a solution
o Space complexity: The amount of memory or number of nodes stored by a search al-
gorithm to find a solution
 These properties or criteria are often interrelated and trade-off with each other. For example, a
search algorithm that is complete and optimal may have high time and space complexity,
while a search algorithm that has low time and space complexity may be incomplete or
suboptimal.
 These properties or criteria also depend on some factors, such as:
o The size and shape of the search space
o The branching factor and depth of the search tree
o The presence or absence of cycles in the search graph
o The availability or quality of heuristics or domain knowledge

4.3 Types of Search Algorithms: Uninformed (Blind) Search and Informed (Heuristic) Search

 There are two main types of search algorithms based on the amount and type of information
they use, such as:
o Uninformed search: Search algorithms that do not use any domain-specific
knowledge or heuristics, but only the problem definition. They are also called blind
search, as they explore the search space blindly without any guidance. Examples of
uninformed search algorithms are breadth-first search, depth-first search, uniform-
cost search, etc.
142

o
o Informed search: Search algorithms that use some domain-specific knowledge or heu-
ristics to guide the search process. They are also called heuristic search, as they use
heuristics to estimate the quality or the distance of each node to the goal state. Exam-
ples of informed search algorithms are greedy best-first search, A* search, hill-climb-
ing search, etc.
143

5. Uniformed/Blind Search Algorithms


5.1 Breadth-First Search (BFS)

 Breadth-first search (BFS) is an uninformed search algorithm that explores the nodes in the
order of their distance from the initial node, i.e., it expands the shallowest nodes first.
 BFS uses a queue as its frontier data structure, i.e., it adds new nodes at the end of the queue
and removes nodes from the front of the queue.
 BFS is complete, i.e., it can find a solution if one exists.
 BFS is optimal if all actions have the same cost, i.e., it can find the lowest-cost solution
among all possible solutions.
 BFS has high time and space complexity, i.e., it can take a long time and use a lot of memory
to find a solution. The time and space complexity of BFS are both O(b^d), where b is the
branching factor and d is the depth of the shallowest solution.


144

5.2 Depth-First Search (DFS)

 Depth-first search (DFS) is an uninformed search algorithm that explores the nodes in the or-
der of their depth from the initial node, i.e., it expands the deepest nodes first.
 DFS uses a stack as its frontier data structure, i.e., it adds new nodes at the top of the stack
and removes nodes from the top of the stack.
 DFS is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is infinite or contains cycles.
 DFS is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if the search space is not uniform.
 DFS has low space complexity, i.e., it uses a small amount of memory to find a solution. The
space complexity of DFS is O(bm), where b is the branching factor and m is the maximum
depth of the search space. However, DFS has high time complexity, i.e., it can take a long
time to find a solution. The time complexity of DFS is O(b^m), where b is the branching fac-
tor and m is the maximum depth of the search space.

5.3 Depth-Limited Search

 Depth-limited search (DLS) is a variation of DFS that limits the depth of the search to a pre-
defined value l, i.e., it expands nodes only up to depth l and ignores nodes beyond that depth.
 DLS uses a stack as its frontier data structure, like DFS.
 DLS is complete if l is greater than or equal to d, where d is the depth of the shallowest solu-
tion, i.e., it can find a solution if one exists within the depth limit. Otherwise, DLS is incom-
plete, i.e., it may not find a solution even if one exists beyond the depth limit.
 DLS is not optimal, like DFS, i.e., it may not find the lowest-cost solution among all possible
solutions.
 DLS has low space complexity, like DFS, i.e., it uses a small amount of memory to find a so-
lution. The space complexity of DLS is O(bl), where b is the branching factor and l is the
depth limit. However, DLS has high time complexity, like DFS, i.e., it can take a long time to
find a solution. The time complexity of DLS is O(b^l), where b is the branching factor and l is
the depth limit.
145

5.4 Iterative Deepening Depth-First Search (IDDFS)

 Iterative deepening depth-first search (IDDFS) is a combination of BFS and DLS that itera-
tively increases the depth limit from 0 to infinity, i.e., it performs DLS with increasing depth
limits until finding a solution or exhausting all possibilities.
 IDDFS uses a stack as its frontier data structure, like DFS and DLS.
 IDDFS is complete, like BFS, i.e., it can find a solution if one exists.
 IDDFS is optimal if all actions have the same cost, like BFS, i.e., it can find the lowest-cost
solution among all possible solutions.
 IDDFS has low space complexity, like DFS and DLS, i.e., it uses a small amount of memory
to find a solution. The space complexity of IDDFS is O(bd), where b is the branching factor
and d is the depth of the shallowest solution. However, IDDFS has high time complexity, like
BFS, i.e., it can take a long time to find a solution. The time complexity of IDDFS is O(b^d),
where b is the branching factor and d is the depth of the shallowest solution.


146


5.5 Uniform Cost Search

 Uniform cost search (UCS) is an uninformed search algorithm that explores the nodes in the
order of their path cost from the initial node, i.e., it expands the lowest-cost nodes first.
 UCS uses a priority queue as its frontier data structure, i.e., it adds new nodes according to
their path cost and removes nodes with the lowest path cost.
 UCS is complete, i.e., it can find a solution if one exists.
 UCS is optimal for any action cost function, i.e., it can find the lowest-cost solution among all
possible solutions.
 UCS has high time and space complexity, i.e., it can take a long time and use a lot of memory
to find a solution. The time and space complexity of UCS are both O(b^c*), where b is the
branching factor and c* is the cost of the optimal solution.

5.6 Bidirectional Search

 Bidirectional search (BDS) is an uninformed search algorithm that simultaneously explores


two search spaces: one from the initial node forward and one from the goal node backward,
i.e., it expands nodes from both ends until meeting in the middle.
 BDS uses two queues as its frontier data structures: one for the forward search and one for the
backward search. It also uses two sets to store the expanded nodes: one for the forward search
and one for the backward search.

 BDS is complete if both searches are breadth-first or uniform-cost, i.e., it can find a solution
if one exists.
 BDS is optimal if both searches are uniform-cost, i.e., it can find the lowest-cost solution
among all possible solutions.
147

 BDS has low time complexity, i.e., it can find a solution faster than unidirectional search. The
time complexity of BDS is O(b^(d/2)), where b is the branching factor and d is the depth of
the shallowest solution.
 BDS has high space complexity, i.e., it uses a lot of memory to store the nodes from both
searches. The space complexity of BDS is O(b^(d/2)), where b is the branching factor and d is
the depth of the shallowest solution.

6. Informed/Heuristic Search Algorithms


6.1 Greedy Best-First Search Algorithm

 Greedy best-first search (GBFS) is an informed search algorithm that explores the nodes in
the order of their heuristic value, i.e., it expands the node that is closest to the goal state ac-
cording to some heuristic function.
 GBFS uses a priority queue as its frontier data structure, i.e., it adds new nodes according to
their heuristic value and removes nodes with the lowest heuristic value.
 GBFS is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is infinite or contains cycles.
 GBFS is not optimal, i.e., it may not find the lowest-cost solution among all possible solu-
tions, especially if the heuristic function is not consistent or admissible.
 GBFS has low time complexity, i.e., it can find a solution faster than uninformed search. The
time complexity of GBFS is O(b^m), where b is the branching factor and m is the maximum
depth of the search space.
 GBFS has high space complexity, i.e., it uses a lot of memory to store the nodes in the priority
queue. The space complexity of GBFS is O(b^m), where b is the branching factor and m is
the maximum depth of the search space.
148

6.2 A* Search Algorithm

 A* search (A*) is an informed search algorithm that explores the nodes in the order of their
evaluation function, i.e., it expands the node that has the lowest sum of path cost and heuristic
value.
 A* uses a priority queue as its frontier data structure, like GBFS, but it uses a different evalu-
ation function: f(n) = g(n) + h(n), where g(n) is the path cost from the initial node to node n
and h(n) is the heuristic value of node n.
 A* is complete, i.e., it can find a solution if one exists.
 A* is optimal if the heuristic function is admissible, i.e., it never overestimates the true cost to
reach the goal state from any node. A* is also optimal if the heuristic function is consistent,
i.e., it satisfies the triangle inequality: h(n) <= c(n, n’) + h(n’), where c(n, n’) is the cost of
moving from node n to node n’.

 A* has low time complexity, i.e., it can find a solution faster than uninformed search or
GBFS. The time complexity of A* depends on the quality of the heuristic function: the closer
h(n) is to the true cost, the faster A* will find a solution. The worst-case time complexity of
A* is O(b^m), where b is the branching factor and m is the maximum depth of the search
space.
 A* has high space complexity, i.e., it uses a lot of memory to store the nodes in the priority
queue. The space complexity of A* is O(b^m), where b is the branching factor and m is the
maximum depth of the search space.
149

6.3 Hill Climbing Algorithm

 Hill climbing algorithm (HCA) is an informed search algorithm that explores the nodes in the
order of their heuristic value, like GBFS, but it only moves to a successor node if it has a
higher heuristic value than the current node, i.e., it always moves uphill.
 HCA uses a stack as its frontier data structure, like DFS, but it only adds one node at a time to
the stack, i.e., it always chooses the best successor node.
 HCA is incomplete, i.e., it may not find a solution even if one exists, especially if the search
space is not smooth or contains local maxima or plateaus.
 HCA is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if the heuristic function is not consistent or admissible.
150

 HCA has low time and space complexity, i.e., it can find a solution quickly and use a small
amount of memory. The time and space complexity of HCA are both O(bm), where b is the
branching factor and m is the maximum depth of the search space.

6.4 Constraint Satisfaction Problem (CSP)

 A constraint satisfaction problem (CSP) is a special type of problem that can be solved by us-
ing search algorithms. A CSP consists of three components:
o A set of variables: Each variable has a domain of possible values
o A set of constraints: Each constraint specifies some restrictions on the values of some
variables
o A goal test: A function that determines whether an assignment of values to variables
satisfies all the constraints
 A CSP can be solved by using different methods, such as:
o Backtracking search: A recursive search algorithm that tries to assign values to varia-
bles one by one, and backtracks if a constraint is violated or no value is available
o Forward checking: An improvement of backtracking search that keeps track of the
remaining values for each variable and prunes them if they are inconsistent with the
current assignment
o Arc consistency: An improvement of forward checking that ensures that every possi-
ble value for a variable has a consistent value for another variable in every constraint
o Heuristics: Techniques that can improve the efficiency and effectiveness of search al-
gorithms, such as variable ordering (choosing which variable to assign next), value
ordering (choosing which value to assign to a variable), and constraint propagation
(reducing the domains of variables based on constraints)
151

6.5 Means-Ends Analysis

 Means-ends analysis (MEA) is an informed search algorithm that uses heuristics to reduce the
difference between the current state and the goal state, i.e., it identifies and solves subprob-
lems that are relevant to achieving the goal.
 MEA uses a stack as its frontier data structure, like DFS, but it also uses another stack to store
subgoals and operators.
 MEA works as follows:
o It compares the current state and the goal state and finds a difference
o It selects an operator that can reduce or eliminate the difference
o It pushes the operator and its preconditions (subgoals) onto the stack
o It pops and applies an operator from the stack if possible
o It repeats until reaching the goal state or no more operators are available
 MEA is incomplete, i.e., it may not find a solution even if one exists, especially if there are
dead ends or irrelevant operators.
 MEA is not optimal, i.e., it may not find the lowest-cost solution among all possible solutions,
especially if there are multiple ways to reduce or eliminate a difference.
 MEA has low time complexity, i.e., it can find a solution faster than uninformed
search. The time complexity of MEA depends on the quality of the heuristics: the
more relevant and effective they are, the faster MEA will find a solution. The worst-
case time complexity of MEA is O(b^m), where b is the branching factor and m is the
maximum depth of the search space.
 MEA has high space complexity, i.e., it uses a lot of memory to store the nodes and
operators in the stacks. The space complexity of MEA is O(b^m), where b is the
branching factor and m is the maximum depth of the search space.
152


153

UNIT - III: Adversarial Search and Knowledge Representation


7. Adversarial Search/Game Playing
7.1 Introduction to Adversarial Search and Game Playing

 Adversarial search is a type of search that involves more than one agent, where the agents
have conflicting goals or interests, i.e., they compete or cooperate with each other.
 Game playing is a special case of adversarial search, where the agents follow some predefined
rules and try to maximize their payoffs or rewards.
 Game playing can be formalized as a tuple (S, A, T, R, T), where:
o S is the set of possible states of the game
o A is the set of possible actions or moves that the agents can make
o T is the transition function that maps a state and an action to a new state
o R is the reward function that assigns a numeric value to each state or action for each
agent
o T is the terminal test function that determines whether a state is a terminal state or
not, i.e., whether the game is over or not
 Game playing can have different characteristics, such as:
o Zero-sum vs. non-zero-sum: A game is zero-sum if the sum of the rewards for all the
agents is zero, i.e., one agent’s gain is another agent’s loss. A game is non-zero-sum if
the sum of the rewards for all the agents is not zero, i.e., the agents can have different
or shared interests.
o Deterministic vs. stochastic: A game is deterministic if the outcome of each action is
fully determined by the current state and the action. A game is stochastic if there is
some uncertainty or randomness in the outcome of each action.
o Perfect information vs. imperfect information: A game has perfect information if all
the agents have complete and accurate knowledge of the current state and the past ac-
tions. A game has imperfect information if some aspects of the current state or the
past actions are hidden or unknown to some agents.
o Turn-taking vs. simultaneous: A game is turn-taking if only one agent can act at a
time and the agents alternate their actions. A game is simultaneous if more than one
agent can act at the same time and the agents act independently or interdependently.

7.2 Minimax Algorithm: Decision-Making in Two-Player Games

 Minimax algorithm is a search algorithm that can be used to find the optimal strategy for two-
player zero-sum turn-taking deterministic perfect information games, such as chess, tic-tac-
toe, etc.
 Minimax algorithm works as follows:
o It assumes that both players are rational and play optimally, i.e., they try to maximize
their own reward and minimize their opponent’s reward.
o It builds a search tree that represents all the possible states and actions from the cur-
rent state to the terminal states, where each node corresponds to a state and each edge
corresponds to an action.
o It assigns a value to each node based on the reward function, where positive values
favor one player (the maximizer) and negative values favor the other player (the mini-
mizer).

 It propagates the values from the leaf nodes (the terminal states) to the root node (the current
state) by applying the min-max rule: at each level of the tree, it chooses the minimum value
among its children if it is a minimizer’s turn, or it chooses the maximum value among its chil-
dren if it is a maximizer’s turn.
154

 It selects the best action at the root node, i.e., the action that leads to the child node with the
highest value for the maximizer or the lowest value for the minimizer.
 Minimax algorithm is complete, i.e., it can find a solution if one exists.
 Minimax algorithm is optimal, i.e., it can find the best solution among all possible solutions,
assuming that both players play optimally.
 Minimax algorithm has high time and space complexity, i.e., it can take a long time and use a
lot of memory to find a solution. The time and space complexity of minimax algorithm are
both O(b^m), where b is the branching factor and m is the maximum depth of the search tree.

7.3 Alpha-Beta Pruning: Reducing the Search Space in Game Trees

 Alpha-beta pruning is an optimization technique that can be used to improve the efficiency of
minimax algorithm by pruning or eliminating branches of the search tree that are not relevant
to the final decision, i.e., branches that do not affect the value of the root node.
 Alpha-beta pruning works as follows:
o It maintains two values for each node: alpha and beta. Alpha is the best value that the
maximizer can guarantee at that node or above. Beta is the best value that the mini-
mizer can guarantee at that node or below.
o It initializes alpha to negative infinity and beta to positive infinity at the root node.
o It updates alpha and beta as it traverses the search tree in a depth-first manner, using
the min-max rule: at each level of the tree, it sets alpha to the maximum value among
its children if it is a maximizer’s turn, or it sets beta to the minimum value among its
children if it is a minimizer’s turn.
o It prunes a branch when alpha is greater than or equal to beta, i.e., when there is no
need to explore further nodes in that branch because they cannot improve the value of
the root node.
o It returns the best action at the root node, like minimax algorithm.
155

 Alpha-beta pruning does not affect the completeness or optimality of minimax algorithm, i.e.,
it can find a solution if one exists and it can find the best solution among all possible solu-
tions, assuming that both players play optimally.
 Alpha-beta pruning reduces the time and space complexity of minimax algorithm, i.e.,
it can find a solution faster and use less memory than minimax algorithm. The time
and space complexity of alpha-beta pruning depend on the order of exploration of the
nodes in the search tree: the best case is O(b^(m/2)), where b is the branching factor
and m is the maximum depth of the search tree, and the worst case is O(b^m), like
minimax algorithm. The space complexity of alpha-beta pruning is O(bm), like mini-
max algorithm.

8. Knowledge Representation
8.1 Representations and Mappings: Encoding Knowledge for AI Systems

 Knowledge representation is the process of encoding knowledge for AI systems, i.e., trans-
forming human knowledge into a form that can be manipulated and used by machines.
 Knowledge representation involves two aspects: representations and mappings. Representa-
tions are the data structures or languages that are used to store and express knowledge. Map-
pings are the functions or algorithms that are used to create and manipulate knowledge.
 Knowledge representation has three main objectives: to capture the meaning and structure of
human knowledge, to enable efficient and effective reasoning and inference, and to facilitate
communication and interaction between humans and machines.
 Knowledge representation is a crucial and challenging task for AI systems, as it requires bal-
ancing trade-offs between expressiveness, efficiency, and uncertainty.
156

8.2 Approaches to Knowledge Representation: Logical, Semantic Networks, Frames, etc.

 There are different approaches or paradigms to knowledge representation, each with its own
advantages and disadvantages. Some of the common approaches are:
o Logical: This approach uses formal logic systems, such as propositional logic, predi-
cate logic, modal logic, etc., to represent knowledge as a set of symbols and rules that
follow a well-defined syntax and semantics. Logical representations are precise, con-
sistent, and deductive, but they can also be complex, rigid, and incomplete.
o Semantic networks: This approach uses graphs or networks to represent knowledge as
a set of nodes and links that capture the concepts and relations in a domain. Semantic
networks are intuitive, flexible, and associative, but they can also be ambiguous, in-
consistent, and inefficient.
o Frames: This approach uses hierarchical structures to represent knowledge as a set of
frames or schemas that capture the attributes and values of entities or situations in a
domain. Frames are modular, structured, and inheritable, but they can also be redun-
dant, inflexible, and complex.
o Scripts: This approach uses sequences or scenarios to represent knowledge as a set of
scripts or stories that capture the events and actions in a domain. Scripts are natural,
dynamic, and contextual, but they can also be incomplete, stereotypical, and rigid.
o Ontologies: This approach uses formal vocabularies to represent knowledge as a set
of ontologies or taxonomies that capture the categories and subcategories in a do-
main. Ontologies are standardized, reusable, and sharable, but they can also be large,
complex, and evolving.
157

8.3 Issues in Knowledge Representation: Expressiveness, Efficiency,

Uncertainty

 There are some issues or challenges that need to be addressed when designing or choosing a
knowledge representation system, such as:


o Expressiveness: The ability of a representation system to capture the meaning
and structure of human knowledge in a domain. Expressiveness depends on
factors such as the richness of the vocabulary, the complexity of the syntax,
the clarity of the semantics,
 Uncertainty: The ability of a representation system to handle the incompleteness, in-
consistency, or ambiguity of human knowledge in a domain. Uncertainty depends on
factors such as the sources of uncertainty, the types of uncertainty, the methods of un-
certainty representation, and the techniques of uncertainty reasoning.

UNIT - IV: Knowledge Representation Using Predicate Logic and


Rules
9. Knowledge Representation Using Predicate Logic
9.1 Representing Simple Facts in Logic: Propositions, Predicates, and Connectives
 Logic is a formal system of reasoning that uses symbols and rules to represent and
manipulate knowledge.
 Logic can be used to represent simple facts in a domain by using propositions, predi-
cates, and connectives.
 A proposition is a statement that can be either true or false. For example:
158

o The sky is blue


o 2+2=4
o Paris is the capital of France
 A proposition can be represented by a propositional symbol, such as p, q, r, etc. For
example:
o p: The sky is blue
o q: 2 + 2 = 4
o r: Paris is the capital of France
 A predicate is a statement that can be true or false depending on the values of its argu-
ments or variables. For example:
o IsBlue(x): x is blue
o Equals(x, y): x equals y
o CapitalOf(x, y): x is the capital of y
 A predicate can be represented by a predicate symbol followed by a list of arguments
or variables, such as P(x), Q(x, y), R(x, y, z), etc. For example:
o IsBlue(sky): The sky is blue
o Equals(2 + 2, 4): 2 + 2 equals 4
o CapitalOf(Paris, France): Paris is the capital of France
 A connective is a symbol that can be used to combine or modify propositions or pred-
icates to form more complex statements. For example:
o ¬: Negation (not)
o ∧: Conjunction (and)
o ∨: Disjunction (or)
o →: Implication (if … then)
o ↔: Equivalence (if and only if)
 A connective can be represented by a logical operator that follows some syntax and
semantics rules. For example:
o ¬p: It is not the case that p
o p ∧ q: Both p and q are true
o p ∨ q: Either p or q or both are true
o p → q: If p is true then q is true
o p ↔ q: p and q have the same truth value

10. Representing Knowledge using Rules


10.1 Procedural vs. Declarative Knowledge
 Knowledge can be classified into two types based on how it is represented and used:
procedural knowledge and declarative knowledge.
 Procedural knowledge is knowledge that specifies how to perform a task or achieve a
goal, i.e., it is knowledge about actions and processes. For example:
o How to ride a bike
o How to bake a cake
o How to solve a math problem
 Declarative knowledge is knowledge that describes facts or properties about a do-
main, i.e., it is knowledge about objects and relations. For example:
o A bike has two wheels and a handlebar
o A cake is made of flour, eggs, sugar, and butter
o A math problem has an equation and a solution
 Procedural knowledge can be represented by using rules, procedures, algorithms, or
programs that specify the steps or instructions to perform a task or achieve a goal. For
example:
o To ride a bike, you need to balance on the seat, pedal with your feet, and steer with
your hands
159

o To bake a cake, you need to mix the ingredients, pour the batter into a pan, and bake
it in an oven
o To solve a math problem, you need to apply some formulas, operations, and methods
 Declarative knowledge can be represented by using rules, facts, statements, or asser-
tions that describe the features or characteristics of a domain. For example:
o HasWheels(bike, 2)
o MadeOf(cake, [flour, eggs, sugar, butter])
o Equation(problem1, x + 2 = 5)

10.2 Logic Programming: Prolog and Its Applications

 Logic programming is a paradigm of programming that uses logic as the basis for represent-
ing and manipulating knowledge.
 Logic programming uses a logic system, such as predicate logic, to express knowledge as a
set of rules and facts that follow a well-defined syntax and semantics.


 Logic programming uses a logic interpreter or engine to execute queries or goals that ask for
some information or solution based on the given knowledge.
 Logic programming uses a logic inference or resolution method to derive new facts or an-
swers from the existing rules and facts by applying some logical rules or principles.
 Prolog is one of the most popular and widely used logic programming languages. Prolog
stands for PROgramming in LOGic.
 Prolog has the following features:
o It uses predicate logic as its logic system
o It uses Horn clauses as its rule and fact format
o It uses unification as its inference method
o It uses backtracking as its search strategy
 Prolog has many applications in various domains, such as:
o Artificial intelligence: Prolog can be used to implement expert systems, natural lan-
guage processing, machine learning, computer vision, etc.
o Database: Prolog can be used to query and manipulate relational data using logical
operations
o Education: Prolog can be used to teach and learn logic, programming, and problem-
solving skills
160

10.3 Forward vs. Backward Reasoning: Inference Control in Rule-Based Systems


 Forward reasoning and backward reasoning are two methods of inference control in
rule-based systems, i.e., two ways of applying rules to facts to derive new facts or an-
swers.
 Forward reasoning is a method of inference control that starts from the given facts and
applies rules that match them until reaching a goal or no more rules can be applied.
For example:
o Given facts: John is a student, John studies hard
o Rules: If X is a student then X has an ID, If X studies hard then X gets good grades
o Goal: John gets good grades
o Forward reasoning: John is a student -> John has an ID, John studies hard -> John
gets good grades -> Goal reached
 Backward reasoning is a method of inference control that starts from a goal and ap-
plies rules that can infer it from other facts until reaching the given facts or no more
rules can be applied. For example:
o Given facts: John is a student, John studies hard
o Rules: If X is a student then X has an ID, If X studies hard then X gets good grades
o Goal: John has an ID
o Backward reasoning: John has an ID <- John is a student -> Goal reached
 Forward reasoning and backward reasoning have different advantages and disad-
vantages. For example:
o Forward reasoning is data-driven, i.e., it follows the data flow from the facts to the
goal. It is suitable for situations where the facts are known and the goal is unknown
or flexible. It can generate multiple or unexpected goals. However, it can also be inef-
ficient or irrelevant, as it may apply rules that do not lead to the goal or generate
facts that are not needed.
o Backward reasoning is goal-driven, i.e., it follows the data flow from the goal to the
facts. It is suitable for situations where the goal is known and the facts are unknown
or incomplete. It can focus on the relevant rules and facts. However, it can also be
incomplete or inconsistent, as it may not find a solution even if one exists or gener-
ate contradictory facts.

10.4 Matching: Pattern Matching and Unification


 Matching is a process of finding correspondences or similarities between two expres-
sions, such as rules and facts, queries and answers, etc.
 Matching can be performed by using different methods, such as pattern matching and
unification.
 Pattern matching is a method of matching that compares two expressions based on
their syntactic structure and literal values, i.e., it checks whether they have the same
shape and content. For example:
o Pattern: IsBlue(x)
o Expression: IsBlue(sky)
o Pattern matching: Success, x = sky
o Pattern: Equals(x + y, 4)
o Expression: Equals(2 + 2, 4)
o Pattern matching: Success, x = 2, y = 2
o Pattern: Equals(x + y, 4)
o Expression: Equals(3 + 2, 5)
o Pattern matching: Failure, no match
161

 Unification is a method of matching that compares two expressions based on their se-
mantic meaning and logical equivalence, i.e., it checks whether they can be made
identical by substituting some variables with some values. For example:
o Expression 1: IsBlue(x)
o Expression 2: IsBlue(sky)
o Unification: Success, x = sky
o Expression 1: Equals(x + y, 4)
o Expression 2: Equals(2 + 2, 4)
o Unification: Success, x = 2, y = 2
o Expression 1: Equals(x + y, 4)
o Expression 2: Equals(3 + z, 5)
o Unification: Success, x = 3, y = 1, z = 2
 Matching can be used for various purposes, such as:
o Rule application: Matching can be used to apply rules to facts by matching the
condition part of the rule with the fact and deriving the action part of the rule
as a new fact.
o Query answering: Matching can be used to answer queries by matching the
query with the facts or rules and returning the values of the variables or the
truth value of the query.
o Substitution: Matching can be used to substitute variables with values by
matching an expression with another expression and replacing the variables
with the corresponding values.
162

UNIT - V: Uncertain Knowledge and Reasoning and Learning


11. Uncertain Knowledge and Reasoning
11.1 Probability and Bayes’ Theorem: Dealing with Uncertainty
 Uncertainty is a condition where the state of the world or the outcome of an action is
not fully known or predictable.
 Uncertainty can arise from various sources, such as incomplete information, noisy
data, unreliable sensors, ambiguous language, conflicting evidence, etc.
 Probability is a mathematical framework that can be used to quantify and reason
about uncertainty. Probability assigns a numeric value between 0 and 1 to each possi-
ble event or proposition, indicating how likely or confident it is to occur or be true.
 Probability can be interpreted in different ways, such as:
o Frequentist: Probability is the relative frequency or proportion of an event or proposi-
tion in a large number of repeated trials or observations.
o Bayesian: Probability is the degree of belief or subjective confidence in an event or
proposition based on prior knowledge and evidence.
 Probability can be calculated using different methods, such as:
o Classical: Probability is the ratio of the number of favorable outcomes to the number
of possible outcomes, assuming that all outcomes are equally likely.
o Empirical: Probability is the ratio of the number of observed occurrences to the num-
ber of total trials, based on empirical data or statistics.
o Subjective: Probability is the personal judgment or estimation of an individual or a
group, based on intuition or experience.
 Bayes’ theorem is a formula that can be used to update or revise the probability of an
event or proposition based on new evidence or information. Bayes’ theorem states
that:
o P(A|B) = P(B|A) * P(A) / P(B)

where P(A|B) is the posterior probability of A given B, P(B|A) is the likelihood of B


given A, P(A) is the prior probability of A, and P(B) is the marginal probability of B.
 Bayes’ theorem can be used for various purposes, such as:
o Inference: Bayes’ theorem can be used to infer the probability of a cause given an ef-
fect, or a hypothesis given some data.
o Learning: Bayes’ theorem can be used to learn the probability of a parameter given
some observations, or a model given some evidence.
o Decision making: Bayes’ theorem can be used to make optimal decisions under un-
certainty, by maximizing the expected utility or reward.

11.2 Certainty Factors and Rule-Based Systems

 Certainty factors are a method of representing and reasoning with uncertain knowledge in
rule-based systems. Certainty factors assign a numeric value between -1 and 1 to each fact or
rule, indicating how certain or confident it is to be true or false.
 Certainty factors can be interpreted in different ways, such as:
o Probabilistic: Certainty factors are equivalent to probabilities, i.e., they measure the
likelihood or frequency of an event or proposition.
o Evidential: Certainty factors are based on evidence, i.e., they measure the strength or
quality of the support or opposition for an event or proposition.
o Fuzzy: Certainty factors are based on vagueness, i.e., they measure the degree or ex-
tent of membership or satisfaction for an event or proposition.
 Certainty factors can be calculated using different methods, such as:
163

o Heuristic: Certainty factors are assigned by experts or users based on their intuition or
experience, using some rules of thumb or guidelines.
o Empirical: Certainty factors are derived from data or statistics, using some formulas
or algorithms.
o Learning: Certainty factors are learned from observations or feedback, using some
techniques or models.
 Certainty factors can be used for various purposes, such as:
o Rule application: Certainty factors can be used to apply rules to facts by combining
the certainty factors of the condition and the action parts of the rule, using some oper-
ators or functions.
o Query answering: Certainty factors can be used to answer queries by aggregating the
certainty factors of the facts or rules that are relevant to the query, using some opera-
tors or functions.
o Decision making: Certainty factors can be used to make decisions under uncertainty,
by comparing the certainty factors of the alternatives or outcomes and choosing the
best one.

11.3 Bayesian Networks: Probabilistic Graphical Models for Representing Uncertain


Knowledge

 Bayesian networks are a method of representing and reasoning with uncertain knowledge
using probability and graph theory. Bayesian networks are also known as probabilistic graph-
ical models or belief networks.
 Bayesian networks consist of two components: a directed acyclic graph (DAG) and a condi-
tional probability table (CPT). The DAG represents the variables and their dependencies in a
domain. The CPT represents the probabilities of each variable given its parents in the DAG.


 Bayesian networks have the following features:
o They capture the joint probability distribution of all the variables in a domain, i.e.,
they measure the likelihood of any combination of values for the variables.
o They encode the conditional independence assumptions among the variables, i.e.,
they specify which variables are independent of each other given some other varia-
bles.
o They allow for efficient and effective inference and learning, i.e., they enable compu-
ting the posterior probabilities of some variables given some evidence or updating
the parameters of the network given some data.
 Bayesian networks can be used for various purposes, such as:
o Diagnosis: Bayesian networks can be used to diagnose the causes or effects of some
symptoms or observations, by computing the probabilities of some hypotheses given
some evidence.
164

o Prediction: Bayesian networks can be used to predict the outcomes or consequences


of some actions or events, by computing the probabilities of some effects given
some causes.
o Decision making: Bayesian networks can be used to make optimal decisions under
uncertainty, by computing the expected utilities or rewards of some alternatives or
actions.

11.4 Dempster-Shafer Theory: Theory of Evidence for Combining Uncertain Information

 Dempster-Shafer theory is a method of representing and reasoning with uncertain knowledge


using evidence and belief functions. Dempster-Shafer theory is also known as the theory of
evidence or the belief function theory.
 Dempster-Shafer theory consists of three components: a frame of discernment, a mass func-
tion, and a belief function. The frame of discernment is a set of mutually exclusive and ex-
haustive propositions or hypotheses in a domain. The mass function is a function that assigns
a value between 0 and 1 to each subset of the frame of discernment, indicating how much evi-
dence or support it has. The belief function is a function that derives the degree of belief or
confidence in each proposition or hypothesis from the mass function, using some rules or op-
erations.


 Dempster-Shafer theory has the following features:
o It captures the uncertainty and ignorance of the knowledge in a domain, i.e., it
measures how much information or lack of information is available for each proposi-
tion or hypothesis.
o It encodes the evidential support or opposition for each proposition or hypothesis, i.e.,
it specifies how much evidence or counter-evidence is provided by different sources
or agents.
165

o It allows for combining and updating uncertain information from multiple sources or
agents, i.e., it enables computing the combined mass and belief functions using some
operators or methods.
 Dempster-Shafer theory can be used for various purposes, such as:
o Inference: Dempster-Shafer theory can be used to infer the degree of belief or confi-
dence in some propositions or hypotheses given some evidence or information.
o Learning: Dempster-Shafer theory can be used to learn the mass and belief functions
from data or observations, using some techniques or models.
o Decision making: Dempster-Shafer theory can be used to make optimal decisions un-
der uncertainty, by maximizing the expected utility or reward.

11.5 Fuzzy Logic: Handling Vagueness and Gradual Truth Values

 Fuzzy logic is a method of representing and reasoning with uncertain knowledge using fuzzy
sets and fuzzy rules. Fuzzy logic is also known as fuzzy set theory or fuzzy reasoning.
 Fuzzy logic consists of two components: fuzzy sets and fuzzy rules. Fuzzy sets are sets that
have fuzzy boundaries or membership functions, i.e., they allow partial or gradual member-
ship for each element. Fuzzy rules are rules that have fuzzy antecedents or consequents, i.e.,
they allow partial or gradual truth values for each condition or action.
 Fuzzy logic has the following features:
o It captures the vagueness and ambiguity of the knowledge in a domain, i.e., it
measures how vague or ambiguous each concept or relation is.
o It encodes the linguistic expressions and modifiers for each concept or relation, i.e., it
specifies how natural language terms and phrases can be translated into fuzzy sets and
rules.
o It allows for approximate and flexible reasoning and inference, i.e., it enables compu-
ting the fuzzy truth values and fuzzy outputs using some operators or functions.
 Fuzzy logic can be used for various purposes, such as:
o Classification: Fuzzy logic can be used to classify objects or situations into fuzzy cat-
egories or classes, by computing the degree of membership for each category or class.
o Control: Fuzzy logic can be used to control systems or processes that have uncertain
inputs or outputs, by computing the fuzzy actions or commands based on fuzzy rules.
o Decision making: Fuzzy logic can be used to make optimal decisions under uncer-
tainty, by computing the fuzzy utilities or rewards of each alternative or action.

o
166

12. Learning
12.1 Overview of Different Forms of Learning: Supervised, Unsupervised, Reinforcement
Learning

 Learning is the process of acquiring and improving knowledge or skills from data or experi-
ence.
 Learning can be classified into different forms based on the type and amount of feedback or
guidance available, such as:
o Supervised learning: Learning from labeled data, i.e., data that has the correct or de-
sired output or answer for each input or example. Supervised learning aims to learn a
function or a model that can map the input to the output, and generalize to new or un-
seen data. Examples of supervised learning tasks are classification, regression, etc.
o Unsupervised learning: Learning from unlabeled data, i.e., data that has no output or
answer for each input or example. Unsupervised learning aims to learn the structure
or the distribution of the data, and discover hidden patterns or features. Examples of
unsupervised learning tasks are clustering, dimensionality reduction, etc.
o Reinforcement learning: Learning from trial and error, i.e., learning by interacting
with an environment and receiving rewards or penalties for each action or behavior.
Reinforcement learning aims to learn a policy or a strategy that can maximize the cu-
mulative reward or minimize the cumulative cost over time. Examples of reinforce-
ment learning tasks are game playing, robot control, etc.

12.2 Learning Decision Trees: Building Decision Trees from Data

 A decision tree is a graphical representation of a function or a model that can be used for clas-
sification or regression tasks. A decision tree consists of nodes and branches that form a tree-
like structure. The nodes represent the features or attributes of the data, and the branches rep-
resent the values or ranges of the features or attributes. The leaf nodes represent the output or
the class of the data.
 A decision tree can be learned from data by using different algorithms, such as ID3, C4.5,
CART, etc. The general steps of learning a decision tree are:
o Start with the entire data set as the root node
o Choose a feature or an attribute that best splits the data into subsets based on some
criterion or measure, such as information gain, gini index, etc.
o Create a branch for each value or range of the feature or attribute, and assign the cor-
responding subset of data to each branch
o Repeat the process recursively for each branch until reaching a stopping condition,
such as all data in a branch have the same output or class, there are no more features
or attributes to split on, etc.
o Assign the output or class to each leaf node based on the majority vote or the average
value of the data in that node
 A decision tree can be used for various purposes, such as:
o Classification: A decision tree can be used to classify new or unseen data by follow-
ing the branches from the root node to a leaf node based on the values or ranges of
the features or attributes of the data, and returning the output or class of that leaf node
167

 Regression: A decision tree can be used to predict the output or value of new or unseen data
by following the branches from the root node to a leaf node based on the values or ranges of
the features or attributes of the data, and returning the output or value of that leaf node
o Explanation: A decision tree can be used to explain the reasoning or logic behind a
classification or regression result, by showing the path or sequence of decisions that
led to that result
o Visualization: A decision tree can be used to visualize the structure or distribution of
the data, by showing the features or attributes and their values or ranges that are rele-
vant or important for the output or class

12.3 Neural Networks: Basics of Artificial Neural Networks and Their Applications in Learn-
ing

 A neural network is a computational model that is inspired by the structure and function of
biological neural networks, such as the brain. A neural network consists of a large number of
interconnected units or nodes called neurons, that can process and transmit information.
 A neural network can be represented by a graph or a matrix that shows the neurons and their
connections or weights. The neurons are organized into layers, such as input layer, hidden
layer, and output layer. The connections or weights are the values that determine how much
influence one neuron has on another.
 A neural network can be learned from data by using different algorithms, such as backpropa-
gation, gradient descent, etc. The general steps of learning a neural network are:
o Initialize the weights randomly or with some heuristic
o Feed the input data to the input layer and propagate it forward through the hidden
layer(s) to the output layer, using some activation function or transfer function
o Compare the output with the desired output and calculate the error or loss function
o Adjust the weights backward from the output layer to the input layer, using some
learning rule or update rule
o Repeat the process until reaching a stopping condition, such as convergence, mini-
mum error, maximum iteration, etc.
 A neural network can be used for various purposes, such as:
o Classification: A neural network can be used to classify data into categories or clas-
ses, by mapping the input to the output and assigning a label based on some threshold
or criterion
o Regression: A neural network can be used to predict the output or value of data, by
mapping the input to the output and returning a numeric value
168

o Clustering: A neural network can be used to group data into clusters or segments, by
learning the features or patterns that distinguish different groups of data
o Association: A neural network can be used to discover associations or correlations
among data, by learning the rules or relationships that link different items or variables

You might also like