0% found this document useful (0 votes)
4 views

Unit-1

The document provides an overview of Artificial Intelligence (AI) and Machine Learning (ML), detailing their definitions, goals, advantages, and disadvantages. It discusses various applications of AI, types of machine learning, and the machine learning life cycle, which includes data gathering, preparation, wrangling, analysis, and model training. The document emphasizes the transformative potential of AI and ML in solving complex problems and enhancing decision-making across various sectors.

Uploaded by

sohanamylavarapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit-1

The document provides an overview of Artificial Intelligence (AI) and Machine Learning (ML), detailing their definitions, goals, advantages, and disadvantages. It discusses various applications of AI, types of machine learning, and the machine learning life cycle, which includes data gathering, preparation, wrangling, analysis, and model training. The document emphasizes the transformative potential of AI and ML in solving complex problems and enhancing decision-making across various sectors.

Uploaded by

sohanamylavarapu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

1.

INTRODUCTION

ARTIFICIAL INTELLIGENCE
Artificial Intelligence is one of the booming technologies of computer science, which is ready to create a
new revolution in the world by making intelligent machines. AI holds a tendency to cause a machine to work as
a human. The word "Artificial Intelligence" first adopted by American Computer scientist John McCarthy at
the Dartmouth Conference in the year 1956.

What is Artificial Intelligence?


Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines
"man-made," and intelligence defines "thinking power", hence AI means "a man-made thinking power."
So, we can define AI as, "It is a branch of computer science by which we can create intelligent machines
which can behave like a human, think like humans, and able to make decisions."

Why Artificial Intelligence?


 With the help of AI, we can create such software or devices which can solve real-world problems very
easily and with accuracy such as health issues, marketing, traffic issues, etc.
 With the help of AI, we can create your personal virtual Assistant, such as Cortana, Google Assistant
etc.
 With the help of AI, we can build such Robots which can work in an environment where survival of
humans can be at risk.
 AI opens a path for other new technologies, new devices, and new Opportunities.

Goals of Artificial Intelligence: Following are the main goals of Artificial Intelligence:
1. Replicate human intelligence
2. Solve Knowledge-intensive tasks
3. An intelligent connection of perception and action
4. Building a machine which can perform tasks that requires human intelligence such as:
 Proving a theorem
 Playing chess
 Plan some surgical operation
 Driving a car in traffic

1
5. Creating some system which can exhibit intelligent behavior, learn new things by itself, demonstrate,
explain, and can advise to its user.

Definition: “AI is the study of how to make computers do things at which, at the moment, people are better”.

Advantages of Artificial Intelligence:


1. High Accuracy with fewer errors: AI machines or systems are prone to less errors and high accuracy as it
takes decisions as per pre-experience or information.
2. High-Speed: AI systems can be of very high-speed and fast-decision making; because of that AI systems
can beat a chess champion in the Chess game.
3. High reliability: AI machines are highly reliable and can perform the same action multiple times with high
accuracy.
4. Useful for risky areas: AI machines can be helpful in situations such as defusing a bomb, exploring the
ocean floor, where to employ a human can be risky.
5. Digital Assistant: AI can be very useful to provide digital assistant to the users such as AI technology is
currently used by various E-commerce websites to show the products as per customer requirement.
6. Useful as a public utility: AI can be very useful for public utilities such as a self-driving car which can make
our journey safer and hassle-free, facial recognition for security purpose, Natural language processing to
communicate with the human in human-language, etc.

Disadvantages of Artificial Intelligence:


1. High Cost: The hardware and software requirement of AI is very costly as it requires lots of maintenance to
meet current world requirements.
2. Can't think out of the box: Even we are making smarter machines with AI, but still they cannot work out of
the box, as the robot will only do that work for which they are trained, or programmed.
3. No feelings and emotions: AI machines can be an outstanding performer, but still it does not have the
feeling so it cannot make any kind of emotional attachment with human, and may sometime be harmful for
users if the proper care is not taken.
4. Increase dependency on machines: With the increment of technology, people are getting more dependent on
devices and hence they are losing their mental capabilities.
5. No Original Creativity: As humans are so creative and can imagine some new ideas but still AI machines
cannot beat this power of human intelligence and cannot be creative and imaginative.

2
Now AI has developed to a remarkable level. The concept of Deep learning, big data, and data science
are now trending like a boom. Nowadays companies like Google, Facebook, IBM, and Amazon are working
with AI and creating amazing devices. The future of Artificial Intelligence is inspiring and will come with high
intelligence.

Applicationsof AI:
1. Gaming: AI plays crucial role in strategic games such as chess, poker, tic-tac-toe, etc., where machine can
think of large number of possible positions based on heuristic knowledge.
2. Natural Language Processing: It is possible to interact with the computer that understands natural
language spoken by humans.
3. Expert Systems: There are some applications which integrate machine, software and special information
to impart reasoning and advising. They provide explanation and advice to the users.
4. Vision Systems: These systems understand, interpret and comprehend visual input on the computer. For
example,
a. A spying aeroplane takes photographs, which are used to figure out spatial information or map of the
areas.
b. Doctors use clinical expert system to diagnose the patient.
c. Police use computer software that can recognize the face of criminal with the stored portrait made by
forensic artist.
5. Speech Recognition: Some intelligent systems are capable of hearing and comprehending the language in
terms of sentences and their meanings while a human talks to it. It can handle different accents, slang
words, noise in the background, change in human’s noise due to cold, etc.
6. Handwriting Recognition: The handwriting recognition software reads the text written on paper by a pen
or on screen by a stylus. It can recognize the shapes of the letters and converts it into editable text.
7. Intelligent Robots: Robots are able to perform the tasks given by humans. They have special sensors to
detect physical data from the real world such as light, heat, temperature, movement, sound, bump, and
pressure. They have efficient processors, multiple sensors and huge memory, to exhibit intelligence. In
addition, they are capable of learning from their mistakes and they can adapt to the new environment.

MACHINE LEARNING
Machine learning is a growing technology which enables computers to learn automatically from past
data. Machine learning uses various algorithms for building mathematical models and making predictions using

3
historical data or information. Currently, it is being used for various tasks such as image recognition, speech
recognition, email filtering, Facebook auto-tagging, recommender system, and many more.
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences on their own.
The term machine learning was first introduced by Arthur Samuel in 1959. We can define it as
“Machine learning enables a machine to automatically learn from data, improve performance from
experiences, and predict things without being explicitly programmed.”
 A Machine Learning system learns from historical data, builds the prediction models, and whenever it
receives new data, predicts the output for it.
 The accuracy of predicted output depends upon the amount of data, as the huge amount of data helps to
build a better model which predicts the output more accurately.
 Suppose we have a complex problem, where we need to perform some predictions, so instead of writing
a code for it, we just need to feed the data to generic algorithms, and with the help of these algorithms,
machine builds the logic as per the data and predict the output.
 Machine learning has changed our way of thinking about the problem.

Definition: “A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.

A checkers learning problem:


 Task T: playing checkers
 Performance measure P: percent of games won against opponents
 Training experience E: playing practice games against itself
We can specify many learning problems in this fashion, such as learning to recognize handwrittenwords (or)
learning to drive a robotic automobile autonomously.

4
Features of Machine Learning:
 Machine learning uses data to detect various patterns in a given dataset
 It can learn from past data and improve automatically
 It is a data-driven technology
 Machine learning is much similar to data mining as it also deals with the huge amount of the data

Importance of Machine Learning:


 Rapid increment in the production of data
 Solving complex problems, which are difficult for a human
 Decision making in various sector including finance
 Finding hidden patterns and extracting useful information from data

TYPES (OR) CLASSIFICATION OF MACHINE LEARNING:


 Supervised Learning
 Unsupervised Learning
 Semi-Supervised Learning
 Reinforcement Learning

Supervised Learning: Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn about each data, once
the training and processing are done then we test the model by providing a sample data to check whether it is
predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised learning is
based on supervision, and it is the same as when a student learns things in the supervision of the teacher. The
example of supervised learning is spam filtering.Supervised learning can be grouped further in two categories of
algorithms:
 Regression: Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather forecasting,
Market Trends, etc
 Linear Regression
 Regression Trees

5
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression
 Classification: Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
 Random Forest
 Decision Trees
 Logistic Regression
 Support vector Machines

Advantages of Supervised Learning:


 With the help of supervised learning, the model can predict the output on the basis of prior experiences
 In supervised learning, we can have an exact idea about the classes of objects
 Supervised learning model helps us to solve various real-world problems such as fraud detection, spam
filtering, etc

Disadvantages of Supervised Learning:


 Supervised learning models are not suitable for handling the complex tasks
 Supervised learning cannot predict the correct output if the test data is different from the training dataset
 Training required lots of computation times
 In supervised learning, we need enough knowledge about the classes of object

Unsupervised Learning: Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled, classified, or
categorized, and the algorithm needs to act on that data without any supervision. The goal of unsupervised
learning is to restructure the input data into new features or a group of objects with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful insights
from the huge amount of data. It can be further classifieds into two categories of algorithms:
 Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group. Cluster
analysis finds the commonalities between the data objects and categorizes them as per the presence and
absence of those commonalities.
6
 K-Means Clustering algorithm
 Mean-shift algorithm
 DBSCAN Algorithm
 Principal Component Analysis
 Independent Component Analysis

 Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together
in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association
rule is Market Basket Analysis.
 Apriori algorithm
 FP-growth algorithm

Advantages of Unsupervised Learning:


 Unsupervised learning is used for more complex tasks as compared to supervised learning because, in
unsupervised learning, we don't have labeled input data.
 Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.

Disadvantages of Unsupervised Learning:


 Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
 The result of the unsupervised learning algorithm might be less accurate as input data is not labeled, and
algorithms do not know the exact output in advance.

Semi-Supervised Learning:To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.The main aim of semi-supervised learning is
to effectively use all the available data, rather than only labeled data like in supervised learning. Initially,
similar data is clustered along with an unsupervised learning algorithm, and further, it helps to label the
unlabeled data into labeled data.

Advantages of Semi-Supervised Learning:


 It is simple and easy to understand the algorithm
7
 It is highly efficient
 It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms

Disadvantages of Semi-Supervised Learning:


 Iterations results may not be stable
 We cannot apply these algorithms to network-level data
 Accuracy is low

Reinforcement Learning: Reinforcement learning is a feedback-based learning method, in which a learning


agent gets a reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning, the agent interacts
with the environment and explores it. The goal of an agent is to get the most reward points, and hence, it
improves its performance.The robotic dog, which automatically learns the movement of his arms, is an example
of Reinforcement learning.

Advantages of Reinforcement Learning:


 It helps in solving complex real-world problems which are difficult to be solved by general techniques
 The learning model of RL is similar to the learning of human beings; hence most accurate results can be
found
 Helps in achieving long term results

Disadvantages of Reinforcement Learning:


 RL algorithms are not preferred for simple problems
 RL algorithms require huge data and computations
 Too much reinforcement learning can lead to an overload of states which can weaken the results

Machine learning Life cycle:Machine learning life cycle involves seven major steps

8
1. Gathering Data:
 The goal of this step is to identify and obtain all data-related problems
 In this step, we need to identify the different data sources, as data can be collected from various
sources such as files, database, internet, or mobile devices
 The more will be the data, the more accurate will be the prediction
2. Data Preparation:
 Data preparation is a step where we put our data into a suitable place and prepare it to use in our
machine learning training
 First, we put all data together, and then randomize the ordering of data
 This step can be further divided into two processes:
 Data exploration:It is used to understand the nature of data that we have to work with.
We need to understand the characteristics, format, and quality of data.A better
understanding of data leads to an effective outcome
 Data pre-processing:Now the next step is preprocessing of data for its analysis
3. Data Wrangling:
 Data wrangling is the process of cleaning and converting raw data into a useable format
 Cleaning of data is required to address the quality issues
 In real-world applications, collected data may have various issues like Missing Values, Duplicate
data, Invalid data, Noise
4. Analyse Data:
 This step involves: Selection of analytical techniques, Building models, Review the result
 The aim of this step is to build a machine learning model to analyze the data using various
analytical techniques and review the outcome
5. Train the model:
 In this step we train our model to improve its performance for better outcome of the problem
 We use datasets to train the model using various machine learning algorithms
 Training a model is required so that it can understand the various patterns, rules, and, features
6. Test the model:
 In this step, we check for the accuracy of our model by providing a test dataset to it
 Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem
7. Deployment:

9
 Here, we deploy the model in the real-world system
 If the above-prepared model is producing an accurate result as per our requirement with
acceptable speed, then we deploy the model in the real system.
 But before deploying the project, we will check whether it is improving its performance using
available data or not

Data Preprocessing in Machine Learning:


Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning
model. It is the first and crucial step while creating a machine learning model.Data preprocessing is required
tasks for cleaning the data and making it suitable for a machine learning model which also increases the
accuracy and efficiency of a machine learning model. It involves below steps:
1. Getting the dataset: The collected data for a particular problem in a proper format is known as the
dataset. Dataset may be of different formats for different purposes. To use the dataset in our code, we
usually put it into a CSV (Comma-Separated Values) file. Sometimes, we may also need to use an
HTML or xlsx file
2. Importing libraries: In order to perform data preprocessing using Python, we need to import some
predefined Python libraries like numpy (used for including any type of mathematical operation, supports
to add multidimensional arrays and matrices), pandas (used for importing and managing the datasets),
matplotlib(used to plot any type of charts) etc
3. Importing datasets: Before importing a dataset, we need to set the current directory as a working
directory
4. Finding Missing Data: If our dataset contains some missing data, then it may create a huge problem for
our machine learning model. Hence it is necessary to handle missing values present in the dataset.There
are mainly two ways to handle missing data. They are:
 By deleting the particular row
 By calculating the mean
To handle missing values, we will use Scikit-learn library in our code, which contains various libraries
for building machine learning models.
5. Encoding Categorical Data: Since machine learning model completely works on mathematics and
numbers, but if our dataset would have a categorical variable, then it may create trouble while building
the model. So it is necessary to encode these categorical variables into numbers.
6. Splitting dataset into training and test set:

10
 Training Set: A subset of dataset to train the machine learning model, and we already know the
output.
 Test set: A subset of dataset to test the machine learning model, and by using the test set, model
predicts the output.
7. Feature scaling: It is a technique to standardize the independent variables of the dataset in a specific
range. In feature scaling, we put our variables in the same range and in the same scale so that no any
variable dominate the other variable.

DEEP LEARNING
Deep learning is based on the branch of machine learning, which is a subset of artificial intelligence.
Since neural networks imitate the human brain and so deep learning will do. In deep learning, nothing is
programmed explicitly. Basically, it is a machine learning class that makes use of numerous nonlinear
processing units so as to perform feature extraction as well as transformation. The output from each preceding
layer is taken as input by each one of the successive layers.
Deep learning algorithms are used, especially when we have a huge no of inputs and outputs.Deep
learning is implemented with the help of Neural Networks, and the idea behind the motivation of Neural
Network is the biological neurons, which is nothing but a brain cell.

Definition: “Deep learning is a collection of statistical techniques of machine learning for learning feature
hierarchies that are actually based on artificial neural networks”.

 Here, we provide the raw data of images to the first layer of the input layer.
 After then, input layer will determine the patterns of local contrast that means it will differentiate on the
basis of colors, luminosity, etc.

11
 Then the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose, and lips, etc.
And then, it will fixate those face features on the correct face template.
 So, in the 2nd hidden layer, it will actually determine the correct face here as it can be seen in the above
image, after which it will be sent to the output layer.
 Likewise, more hidden layers can be added to solve more complex problems, for example, if you want
to find out a particular kind of face having large or light complexions.
 So, as and when the hidden layers increase, we are able to solve complex problems.

Types of Deep Learning Networks:


 Feed Forward Neural Network
 Recurrent Neural Network
 Convolutional Neural Network
 Restricted Boltzmann Machine
 Autoencoders

Feed Forward Neural Network:


 A feed-forward neural network is none other than an Artificial Neural Network, which ensures that the
nodes do not form a cycle.
 In this kind of neural network, all the perceptrons are organized within layers, such that the input layer
takes the input, and the output layer generates the output.
 Since the hidden layers do not link with the outside world, it is named as hidden layers

Recurrent Neural Network:


 Recurrent neural networks are yet another variation of feed-forward networks
 Here each of the neurons present in the hidden layers receives an input with a specific delay in time
 The Recurrent neural network mainly accesses the preceding info of existing iterations

Convolutional Neural Network:


 Convolutional Neural Networks are a special kind of neural network mainly used for image
classification, clustering of images and object recognition
 To achieve the best accuracy, deep convolutional neural networks are preferred more than any other
neural network

12
 The applications of CNN are Identify Faces, Street Signs, Tumors, Image Recognition, Video Analysis,
NLP, Anomaly Detection, Drug Discovery, Checkers Game, Time Series Forecasting

Restricted Boltzmann Machine:


 Here the neurons present in the input layer and the hidden layer encompasses symmetric connections
amid them
 However, there is no internal association within the respective layer

Autoencoders:
 An autoencoder neural network is another kind of unsupervised machine learning algorithm
 Here the number of hidden cells is merely small than that of the input cells
 But the number of input cells is equivalent to the number of output cells
 The autoencoders are mainly used for the smaller representation of the input
 It helps in the reconstruction of the original data from compressed data

Applications of Deep Learning:


1. Self-Driving Cars: In self-driven cars, it is able to capture the images around it by processing a huge
amount of data, and then it will decide which actions should be incorporated to take a left or right or
should it stop, to reduce the accidents.
2. Voice control assistance: Siri is one of the examples of Voice control assistance. We can tell Siri
whatever we want it to do, and it will search and display it.
3. Automatic Image Caption Generation: Whatever image is uploaded, the algorithm will work in such a
way that it will generate caption accordingly. Supposeif we say blue colored eye, it will display a blue-
colored eye with a caption at the bottom of the image.
4. Automatic Machine Translation: With the help of automatic machine translation, we are able to convert
one language into another.

Advantages of Deep Learning:


 It lessens the need for feature engineering
 It eradicates all those costs that are needless
 It easily identifies difficult defects
 It results in the best-in-class performance on problems

13
Disadvantages of Deep Learning:
 It requires an ample amount of data
 It is quite expensive to train
 It does not have strong theoretical groundwork

Deep Learning Algorithms:The Deep Learning Algorithms are as follows:


 Convolutional Neural Networks (CNNs)
 Long Short Term Memory Networks (LSTMs)
 Recurrent Neural Networks (RNNs)
 Generative Adversarial Networks (GANs)
 Radial Basis Function Networks (RBFNs)
 Multilayer Perceptrons (MLPs)
 Self Organizing Maps (SOMs)
 Deep Belief Networks (DBNs)
 Restricted Boltzmann Machines (RBMs)

DIFFERENCES BETWEEN AI, ML & DL

Artificial Intelligence (AI) Machine Learning (ML) Deep Learning (DL)


It is the study which enables It is the study that makes use of
It is the study that uses statistical
machines to mimic human Neural Networks to imitate
methods enabling machines to
behaviour through particular functionality just like a human
improve with experience.
algorithm. brain.

14
AI is the broader family
consisting of ML and DL as it’s ML is the subset of AI. DL is the subset of ML.
components.
DL is a ML algorithm that uses
AI is a computer algorithm
ML is an AI algorithm which deep (more than one layer)
which exhibits intelligence
allows system to learn from data. neural networks to analyze data
through decision making.
and provide output accordingly.
If you have a clear idea about the If you are clear about the math
logic (math) involved in behind involved in it but don’t have idea
and you can visualize the about the features, so you break
Search Trees and much complex
complex functionalities like K- the complex functionalities into
math is involved in AI.
Mean, Support Vector Machines, linear/lower dimension features
etc., then it defines the ML by adding more layers, then it
aspect. defines the DL aspect.
It attains the highest rank in
The aim is to basically increase The aim is to increase accuracy
terms of accuracy when it is
chances of success and not not caring much about the
trained with large amount of
accuracy. success ratio.
data.
DL can be considered as neural
networks with a large number of
Three broad categories/types Of
parameters layers lying in one of
AI are: Artificial Narrow Three broad categories/types Of
the four fundamental network
Intelligence (ANI), Artificial ML are: Supervised Learning,
architectures: Unsupervised Pre-
General Intelligence (AGI) and Unsupervised Learning and
trained Networks, Convolutional
Artificial Super Intelligence Reinforcement Learning
Neural Networks, Recurrent
(ASI)
Neural Networks and Recursive
Neural Networks
The efficiency Of AI is basically Less efficient than DL as it can’t More powerful than ML as it can
the efficiency provided by ML work for longer dimensions or easily work for larger sets of
and DL respectively. higher amount of data. data.
Examples of AI applications Examples of ML applications Examples of DL applications
include: Google’s AI-Powered include: Virtual Personal include: Sentiment based news

15
Predictions, Ridesharing Apps Assistants: Siri, Alexa, Google, aggregation, Image analysis and
Like Uber and Lyft, Commercial etc., Email Spam and Malware caption generation, etc.
Flights Use an AI Autopilot, etc. Filtering.

MAIN CHALLENGES OF MACHINE LEARNING


1. Inadequate Training Data: The major issue that comes while using machine learning algorithms is the
lack of quality as well as quantity of data.Data quality can be affected by some factors as follows:
 Noisy Data: It is responsible for an inaccurate prediction that affects the decision as well as
accuracy in classification tasks.
 Incorrect data: It is also responsible for faulty programming and results obtained in machine
learning models. Hence, incorrect data may affect the accuracy of the results also.
 Generalizing of output data: Sometimes, it is also found that generalizing output data becomes
complex, which results in comparatively poor future actions.
2. Poor quality of data:Data plays a significant role in machine learning, and it must be of good quality as
well. Noisy data, incomplete data, inaccurate data, and unclean data lead to less accuracy in
classification and low-quality results.
3. Non-representative training data: To make sure our training model is generalized well or not, we have to
ensure that sample training data must be representative of new cases that we need to generalize. The
training data must cover all cases that are already occurred as well as occurring.
4. Overfitting and Underfitting: Whenever a machine learning model is trained with a huge amount of data,
it starts capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model. The methods to reduce overfitting are:
 Increase training data in a dataset
 Reduce model complexity by simplifying the model by selecting one with fewer parameters
 Ridge Regularization and Lasso Regularization
 Early stopping during the training phase
 Reduce the noise
 Reduce the number of attributes in training data.
 Constraining the model
Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer
amounts of data, and as a result, it provides incomplete and inaccurate data and destroys the accuracy of
the machine learning model. The methods to reduce Underfitting are:

16
 Increase model complexity
 Remove noise from the data
 Trained on increased and better features
 Reduce the constraints
 Increase the number of epochs to get better results
5. Monitoring and maintenance: Different results for different actions require data change; hence editing of
codes as well as resources for monitoring them also become necessary.
6. Getting bad recommendations: A machine learning model operates under a specific context which
results in bad recommendations and concept drift in the model. Let's understand with an example where
at a specific time customer is looking for some gadgets, but now customer requirement changed over
time but still machine learning model showing same recommendations to the customer while customer
expectation has been changed. This incident is called a Data Drift.
7. Lack of skilled resources: Although Machine Learning is continuously growing in the market, the
absence of skilled resources in the form of manpower is also an issue. Hence, we need manpower
having in-depth knowledge of mathematics, science, and technologies for developing and managing
scientific substances for machine learning.
8. Customer Segmentation:Customer segmentation is also an important challenge to identify the customers
who paid for the recommendations shown by the model and who don't even check them. Hence, an
algorithm is necessary to recognize the customer behavior and trigger a relevant recommendation for the
user based on past experience.
9. Process Complexity of Machine Learning: The machine learning process is very complex. It also
includes analyzing the data, removing data bias, training data, applying complex mathematical
calculations, etc., making the procedure more complicated and quite tedious.
10. Data Bias: These errors exist when certain elements of the dataset are heavily weighted or need more
importance than others. Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors. The methods to remove Data Bias are:
 Research more for customer segmentation
 Be aware of your general use cases and potential outliers
 Combine inputs from multiple sources to ensure data diversity
 Include bias testing in the development process
 Analyze data regularly and keep tracking errors to resolve them easily
 Review the collected and annotated data

17
11. Lack of Explain ability: This basically means the outputs cannot be easily comprehended as it is
programmed in specific ways to deliver for certain conditions. Hence, a lack of explain ability is also
found in machine learning algorithms which reduce the credibility of the algorithms.
12. Slow implementations and results: Machine learning models are highly efficient in producing accurate
results but are time-consuming. Slow programming, excessive requirements' and overloaded data take
more time to provide accurate results than expected.
13. Irrelevant features: Although machine learning models are intended to give the best possible outcome, if
we feed garbage data as input, then the result will also be garbage. Hence, we should use relevant
features in our training sample.

18
1. STATISTICAL LEARNING

TRAINING AND TEST LOSS


Train and test datasets are the two key concepts of machine learning, where the training dataset is used
to fit the model, and the test dataset is used to evaluate the model.

Training Dataset:
 The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the
machine learning model.
 Firstly, the training data is fed to the ML algorithms, which lets them learn how to make predictions for
the given task.
 The training data varies depending on whether we are using Supervised Learning or Unsupervised
Learning Algorithms.
 The type of training data that we provide to the model is highly responsible for the model's accuracy and
prediction ability.
 Training data is approximately more than or equal to 60% of the total data for an ML project.

Test Dataset:
 Once we train the model with the training dataset, it's time to test the model with the test dataset.
 The test dataset is another subset of original data, which is independent of the training dataset.
 Usually, the test dataset is approximately 20-25% of the total original data for an ML project.

Need of Splitting dataset into Train and Test set: Splitting the dataset into train and test sets is one of the
important parts of data pre-processing, as by doing so, we can improve the performance of our model and hence
give better predictability.

If we train our model with a training set and then test it with a completely different test dataset, and then
our model will not be able to understand the correlations between the features.Therefore, if we train and test the
model with two different datasets, then it will decrease the performance of the model. Hence it is important to
split a dataset into two parts, i.e., train and test set.

19
In this way, we can easily evaluate the performance of our model. Such as, if it performs well with the
training data, but does not perform well with the test dataset, then it is estimated that the model may be
overfitted.
Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the model's
prediction was on a single example. If the model's prediction is perfect, the loss is zero; otherwise, the loss is
greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across
all examples.

Here, The arrows represent loss


The blue lines represent predictions

We can notice that the arrows in the left plot are much longer than their counterparts in the right plot.
Clearly, the line in the right plot is a much better predictive model than the line in the left plot.

Mean square error (MSE): MSE is a popular loss function. It is the average squared loss per example over the
whole dataset. To calculate MSE, sum up all the squared losses for individual examples and then divide by the
number of examples:

Where,
(x , y) is an example in which x is the set of features that the model uses to make predictions& y is the
example's label
prediction(x) is a function of the weights and bias in combination with the set of features x
D is a data set containing many labeled examples, which are (x , y) pairs
N is the number of examples in D

20
TRADEOFFS IN STATISTICAL LEARNING
Bias:
The bias is known as the difference between the prediction of the values by the ML model and the
correct value. Being high in biasing gives a large error in training as well as testing data. Its recommended that
an algorithm should always be low biased to avoid the problem of underfitting.
By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data in the
data set. Such fitting is known as Underfitting of Data. This happens when the hypothesis is too simple or linear
in nature. Refer to the graph given below for an example of such a situation

In such a problem, a hypothesis looks like follows

Variance:
The variability of model prediction for a given data point which tells us spread of our data is called the
variance of the model. The model with high variance has a very complex fit to the training data and thus is not
able to fit accurately on the data which it hasn’t seen before. As a result, such models perform very well on
training data but has high error rates on test data.
When a model is high on variance, it is then said to as Overfitting of Data. Overfitting is fitting the
training set accurately via complex curve and high order hypothesis but is not the solution as the error with
unseen data is high.While training a data model variance should be kept low.The high variance data looks like
follows

21
In such a problem, a hypothesis looks like follows

Bias Variance Tradeoff:


If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance
condition and thus is error-prone. If algorithms fit too complex (hypothesis with high degree eq.) then it may be
on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is
something between both of these conditions, known as Trade-off or Bias Variance Trade-off.
This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be
more complex and less complex at the same time. For the graph, the perfect tradeoff will be like.

The best fit will be given by hypothesis on the tradeoff point.The error to complexity graph to show
trade-off is given as

This is referred to as the best point chosen for the training of the algorithm which gives low error in
training as well as testing data.
22
ESTIMATING RISK STATISTICS
Evaluating the performance of a Machine learning model is one of the important steps while building an
effective ML model. To evaluate the performance or quality of the model, different metrics are used, and these
metrics are known as performance metrics or evaluation metrics. These performance metrics help us understand
how well our model has performed for the given data.
In machine learning, each task or problem is divided into classification and Regression. Different
evaluation metrics are used for both Regression and Classification tasks.

Performance Metrics for Classification:

 Confusion Matrix: The confusion matrix is a matrix used to determine the performance of the
classification models for a given set of test data. It can only be determined if the true values for test data
are known.
N=total Predictions Actual : Positive Actual : Negative
Predicted : Positive True Positive False Positive
Predicted : Negative False Negative True Negative

 True Negative: Model has given prediction No, and the real or actual value was also No.
 True Positive: The model has predicted Yes, and the actual value was also Yes.
 False Negative: The model has predicted No, but the actual value was Yes, it is also called as
Type-II error.
 False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I
error.
Example of Confusion Matrix is as follows:

N=165 Actual : Positive Actual : Negative


Predicted : Positive 100 10
Predicted : Negative 5 50

In this example, the total number of predictions are 165, out of which 110 time predicted yes,
whereas 55 times predicted No. However, in reality, 60 cases in which patients don't have the disease,
whereas 105 cases in which patients have the disease.

23
 Classification Accuracy: It is one of the important parameters to determine the accuracy of the
classification problems. It defines how often the model predicts the correct output. It can be calculated
as the ratio of the number of correct predictions made by the classifier to all number of predictions made
by the classifiers.

 Precision: It can be defined as the number of correct outputs provided by the model or out of all positive
classes that have predicted correctly by the model, how many of them were actually true.

 Recall (or) Sensitivity: It is defined as the out of total positive classes, how our model predicted
correctly. The recall must be as high as possible.

 F-Score: If two models have low precision and high recall or vice versa, it is difficult to compare these
models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and precision
at the same time. The F-score is maximum if the recall is equal to the precision.

 AUC(Area Under the Curve)-ROC: Sometimes we need to visualize the performance of the
classification model on charts; then, we can use the AUC-ROC curve. Firstly, ROC means Receiver
Operating Characteristic curve. ROC represents a graph to show the performance of a classification
model at different threshold levels. The curve is plotted between two parameters, which are:
 True Positive Rate: TPR or true Positive rate is a synonym for Recall, hence can be calculated
as:

 False Positive Rate: FPR or False Positive Rate can be calculated as:

24
AUC is known for Area Under the ROC curve. As its name suggests, AUC calculates the two-
dimensional area under the entire ROC curve, as shown below image:

AUC calculates the performance across all the thresholds and provides an aggregate measure.
The value of AUC ranges from 0 to 1. It means a model with 100% wrong prediction will have an AUC
of 0.0, whereas models with 100% correct predictions will have an AUC of 1.0.

Performance Metrics for Regression:


 Mean Absolute Error: Mean Absolute Error or MAE is one of the simplest metrics, which measures the
absolute difference between actual and predicted values, where absolute means taking a number as
Positive.

Here, Y is the Actual outcome, Y' is the predicted outcome, and N is the total number of data points.

 Mean Squared Error: Mean Squared error or MSE is one of the most suitable metrics for Regression
evaluation. It measures the average of the Squared difference between predicted values and the actual
value given by the model.

 R2 Score: R squared error is also known as Coefficient of Determination, which is another popular
metric used for Regression model evaluation. The R-squared metric enables us to compare model with
a constant baseline to determine the performance of the model. To select the constant baseline, we
need to take the mean of the data and draw the line at the mean. The R squared score will always be
less than or equal to 1 without concerning if the values are too large or small.

25
 Adjusted R2: Adjusted R squared, as the name suggests, is the improved version of R squared error. R
square has a limitation of improvement of a score on increasing the terms, even though the model is
not improving, and it may mislead the data scientists.
To overcome the issue of R square, adjusted R squared is used, which will always show a lower
value than R². It is because it adjusts the values of increasing predictors and only shows improvement
if there is a real improvement.

Here, n is the number of observations, k denotes the number of independent variables and Ra2
denotes the adjusted R2

SAMPLING DISTRIBUTION OF AN ESTIMATOR


In statistics, it is the probability distribution of the given statistic estimated on the basis of a random
sample. It provides a generalized way to statistical inference. The estimator is the generalized mathematical
parameter to calculate sample statistics. An estimate is the result of the estimation.
The sampling distribution of estimator depends on the sample size. The effect of change of the sample
size has to be determined. An estimate has a single numerical value and hence they are called point estimates.
There are various estimators like sample mean, sample standard deviation, proportion, variance, range etc.
Sampling distribution of the mean: It is the population mean from which the samples are drawn. For all
the sample sizes, it is likely to be normal if the population distribution is normal. The population mean is equal
to the mean of the sampling distribution of the mean. Sampling distribution of mean has the standard deviation,
which is as follows:

Where is the standard deviation of the sampling mean, is the population standard deviation and
n is the sample size.
As the size of the sample increases, the spread of the sampling distribution of the mean decreases. But
the mean of the distribution remains the same and it is not affected by the sample size. he sampling distribution
of the standard deviation is the standard error of the standard deviation. It is defined as:

26
EMPIRICAL RISK MINIMIZATION
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of
learning algorithms and is used to give theoretical bounds on their performance.
Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of
learning algorithms and is used to give theoretical bounds on their performance. The idea is that we don’t know
exactly how well an algorithm will work in practice because we don't know the true distribution of data that the
algorithm will work on but as an alternative we can measure its performance on a known set of training data.
We assumed that our samples come from this distribution and use our dataset as an approximation. If we
compute the loss using the data points in our dataset, it’s called empirical risk. It is “empirical” and not “true”
because we are using a dataset that’s a subset of the whole population.
When our learning model is built, we have to pick a function that minimizes the empirical risk that is the
delta between predicted output and actual output for data points in the dataset. This process of finding this
function is called empirical risk minimization (ERM). We want to minimize the true risk. We don’t have
information that allows us to achieve that, so we hope that this empirical risk will almost be the same as the true
empirical risk.
However, we can compute an approximation, called empirical risk, by averaging the loss function on the
training set; more formally, computing the expectation with respect to the empirical measure:

For example if we to build a model that can differentiate between a male and a female based on specific
features. If we select 150 random people where women are really short, and men are really tall, then the model
might incorrectly assume that height is the differentiating feature. For building a truly accurate model, we have
to gather all the women and men in the world to extract differentiating features. Unfortunately, that is not
possible! So we select a small number of people and hope that this sample is representative of the whole
population.

27

You might also like