0% found this document useful (0 votes)
5 views

IML unit 1 notes

The document provides an overview of Machine Learning, defining it as a branch of artificial intelligence that enables computers to learn from data and improve their accuracy over time. It outlines the machine learning process, including steps such as defining objectives, data gathering, preparation, exploratory analysis, model building, evaluation, and making predictions. Additionally, it discusses various applications of machine learning, types of machine learning systems, and the differences between supervised and unsupervised learning.

Uploaded by

sampreethip638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

IML unit 1 notes

The document provides an overview of Machine Learning, defining it as a branch of artificial intelligence that enables computers to learn from data and improve their accuracy over time. It outlines the machine learning process, including steps such as defining objectives, data gathering, preparation, exploratory analysis, model building, evaluation, and making predictions. Additionally, it discusses various applications of machine learning, types of machine learning systems, and the differences between supervised and unsupervised learning.

Uploaded by

sampreethip638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

RKR21 IML II/I ALL BRANCHES

1.1 Introduction of Machine Learning


Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn.

Machine learning is a branch of artificial intelligence and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.

A subset of machine learning is closely related to computational statistics, which focuses on


making predictions using computers; but not all machine learning is statistical learning. The study
of mathematical optimization delivers methods, theory and application domains to the field of
machine learning. Machine learning is an important component of the growing field of data
science. Through the use of statistical methods, algorithms are trained to make classifications or
predictions.

In 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition for Machine
Learning as, “A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E.

Fig1: Definition of Machine Learning

Definitions in Machine Learning:


RKR21 IML II/I ALL BRANCHES

• Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used
to learn patterns from data and draw significant information from it. It is the logic behind
a Machine Learning model.
An example of a Machine Learning algorithm is, the Linear Regression algorithm.

• Model: A model is the main component of Machine Learning. A model is trained by using
a Machine Learning Algorithm. An algorithm maps all the decisions that a model is
supposed to take based on the given input, in order to get the correct output.

• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.

• Response Variable: It is the feature or the output variable that needs to be predicted by
using the predictor variable(s).

• Training Data: The Machine Learning model is built using the training data. The training
data helps the model to identify key trends and patterns essential to predict the output.

• Testing Data: After the model is trained, it must be tested to evaluate how accurately it
can predict an outcome. This is done by the testing data set.

Machine Learning Process:


RKR21 IML II/I ALL BRANCHES

Fig2: Process of Machine Learning

Step 1: Define the objective of the Problem Statement

At this step, we must understand what exactly needs to be predicted. In our case, the objective is
to predict the possibility of rain by studying weather conditions. At this stage, it is also essential
to take mental notes on what kind of data can be used to solve this problem or the type of approach
you must follow to get to the solution.

Step 2: Data Gathering

At this stage, you must be asking questions such as,

● What kind of data is needed to solve this problem?


● Is the data available?
● How can I get the data?

Once you know the types of data that is required, you must understand how you can derive this
data. Data collection can be done manually or by web scraping. However, if you’re a beginner and
you’re just looking to learn Machine Learning you don’t have to worry about getting the data.
There are 1000s of data resources on the web, you can just download the data set and get going.

Coming back to the problem at hand, the data needed for weather forecasting includes measures
such as humidity level, temperature, pressure, locality, whether or not you live in a hill station, etc.
Such data must be collected and stored for analysis.
RKR21 IML II/I ALL BRANCHES

Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot of
inconsistencies in the data set such as missing values, redundant variables, duplicate values, etc.
Removing such inconsistencies is very essential because they might lead to wrongful computations
and predictions. Therefore, at this stage, you scan the data set for any inconsistencies and you fix
them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data and finding all the
hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of Machine
Learning. Data Exploration involves understanding the patterns and trends in the data. At this
stage, all the useful insights are drawn and correlations between the variables are understood.

For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if
the temperature has fallen low. Such correlations must be understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the Machine
Learning Model. This stage always begins by splitting the data set into two parts, training data,
and testing data. The training data will be used to build and analyze the model. The logic of the
model is based on the Machine Learning Algorithm that is being implemented.

In the case of predicting rainfall, since the output will be in the form of True (if it will rain
tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic
Regression.

Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the model to a test.
The testing data set is used to check the efficiency of the model and how accurately it can predict
the outcome. Once the accuracy is calculated, any further improvements in the model can be
RKR21 IML II/I ALL BRANCHES

implemented at this stage. Methods like parameter tuning and cross-validation can be used to
improve the performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions. The final output
can be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg. the
predicted value of a stock).

In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.

So that was the entire Machine Learning process. Now it’s time to learn about the different ways
in which Machines can learn.

1.2 Uses of Machine Learning


The need for machine learning is increasing day by day. The reason behind the need for machine
learning is that it is capable of doing tasks that are too complex for a person to implement directly.
As a human, we have some limitations as we cannot access the huge amount of data manually, so
for this, we need some computer systems and here comes the machine learning to make things
easy for us.

We can train machine learning algorithms by providing them the huge amount of data and let them
explore the data, construct the models, and predict the required output automatically.

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
RKR21 IML II/I ALL BRANCHES

It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture.

Fig3: Machine Learning Applications


2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
RKR21 IML II/I ALL BRANCHES

product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Google understands the user
interest using various machine learning algorithms and suggests the product as per customer
interest.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company is
working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is
a genuine transaction or a fraud transaction.

1.3 Types of Machine Learning Systems

At a broad level, machine learning can be classified into four types:


RKR21 IML II/I ALL BRANCHES

Fig4: Types of Machine Learning Techniques

1. Supervised learning
2. Unsupervised learning
3. Semi supervised learning
4. Reinforcement learning

1) Supervised Learning

Supervised learning is a type of machine learning method in which we provide sample labeled data
to the machine learning system in order to train it, and on that basis, it predicts the output. The
system creates a model using labeled data to understand the datasets and learn about each data,
once the training and processing are done then we test the model by providing a sample data to
check whether it is predicting the exact output or not.

For Example:

• Let us consider images that are labeled a spoon or a knife. This known data is fed to the
machine, which analyzes and learns the association of these images based on its features such
as shape, size, sharpness, etc.
RKR21 IML II/I ALL BRANCHES

• Now when a new image is fed to the machine without any label, the machine is able to predict
accurately that it is a spoon with the help of the past data.

Fig5: Example for Supervised learning algorithm

The goal of supervised learning is to map input data with the output data. The supervised learning
is based on supervision, and it is the same as when a student learns things in the supervision of the
teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:


o Classification
o Regression

Classification - Supervised Learning

• Classification is used when the output variable is categorical i.e. with 2 or more classes.
• For example, yes or no, male or female, true or false, etc.
RKR21 IML II/I ALL BRANCHES

Example: Spam Filtering

Fig6: Example for classification

• In order to predict whether a mail is spam or not, we need to first teach the machine what
a spam mail is.
• This is done based on a lot of spam filters - reviewing the content of the mail, reviewing
the mail header, and then searching if it contains any false information.
• Certain keywords and blacklist filters that blackmails are used from already blacklisted
spammers.
• All of these features are used to score the mail and give it a spam score. The lower the total
spam score of the email, the more likely that it is not a scam.
• Based on the content, label, and the spam score of the new incoming mail, the algorithm
decides whether it should land in the inbox or spam folder.

Regression - Supervised Learning


• Regression is used when the output variable is a real or continuous value. In this case,
there is a relationship between two or more variables i.e., a change in one variable is
associated with a change in the other variable.
RKR21 IML II/I ALL BRANCHES

• For example, salary based on work experience or weight based on height, etc.

Example: humidity and temperature

Fig7: Example for Regression

Let’s consider two variables -humidity and temperature. Here, ‘temperature’ is the independent
variable and ‘humidity' is the dependent variable. If the temperature increases, then the humidity
decreases.
These two variables are fed to the model and the machine learns the relationship between them.
After the machine is trained, it can easily predict the humidity based on the given temperature.

Some of the supervised learning applications are:


● Sentiment analysis (Twitter, Facebook, Netflix, YouTube, etc)
● Natural Language Processing
● Image classification
● Predictive analysis
● Pattern recognition
● Spam detection
● Speech/Sequence processing
Supervised Learning: Uses

• Prediction of future cases: Use the rule to predict the output for future inputs
• Knowledge extraction: The rule is easy to understand
RKR21 IML II/I ALL BRANCHES

• Compression: The rule is simpler than the data it explains


• Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

Supervised Learning: Limitations

● Slow -it requires human experts to manually label training examples one by one
● Costly -a model should be trained on the large volumes of hand-labeled data to provide
accurate predictions.

2) Unsupervised Learning

Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects with
similar patterns.

Example:
• Let's take a similar example as before, but this time we do not tell the machine whether it's
a spoon or a knife.
• The machine identifies patterns from the given set and groups them based on their patterns,
similarities, etc.

Fig8: Example for Unsupervised Learning


RKR21 IML II/I ALL BRANCHES

In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association

Clustering - Unsupervised Learning

• Clustering is the method of dividing the objects into clusters that are similar between them
and are dissimilar to the objects belonging to another cluster.
• For example, finding out which customers made similar product purchases.

Example:

Fig9: Example for Clustering

• Suppose a telecom company wants to reduce its customer churn rate by providing
personalized call and data plans.
• The behavior of the customers is studied and the model segments the customers with
similar traits.
• Group A customers use more data and also have high call durations. Group B customers
are heavy Internet users, while Group C customers have high call duration.
• So, Group B will be given more data benefit plants, while Group C will be given cheaper
called call rate plans and group A will be given the benefit of both.
Association - Unsupervised Learning
RKR21 IML II/I ALL BRANCHES

• Association is a rule-based machine learning to discover the probability of the co-


occurrence of items in a collection.
• For example, finding out which products were purchased together.
Example:

Fig9: Example for Association

• Let’s say that a customer goes to a supermarket and buys bread, milk, fruits, and wheat.
Another customer comes and buys bread, milk, rice, and butter.
• Now, when another customer comes, it is highly likely that if he buys bread, he will buy
milk too.
• Hence, a relationship is established based on customer behavior and recommendations are
made.

Uses of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to
labeled data.

Unsupervised Learning- Limitations


RKR21 IML II/I ALL BRANCHES

• It has a limited area of applications, mostly for clustering purposes.


• It provides less accurate results.
An example of a clustering algorithm is k-Means where k refers to the number of clusters to
discover in the data.

Unsupervised Learning applications are:

1. Similarity detection
2. Automatic labeling
3. Object segmentation (such as Person, Animal, Films)

The goal in such unsupervised learning problems may be to discover groups of similar examples
within the data, where it is called clustering, or to determine the distribution of data within the
input space, known as density estimation, or to project the data from a high-dimensional space
down to two or three dimensions for the purpose of visualization.

3) Semi-Supervised Learning

Semi-supervised learning is supervised learning where the training data contains very few labeled
examples and a large number of unlabeled examples.
The goal of a semi-supervised learning model is to make effective use of all of the available data,
not just the labeled data like in supervised learning.

Fig10: Example for Semi supervised Learning


RKR21 IML II/I ALL BRANCHES

• Semi-supervised learning is an important category that lies between the Supervised and
Unsupervised machine learning.
• To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.
• Labeled data exists with a very small amount while it consists of a huge amount of
unlabeled data.
• Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labeled data.
• It is why label data is a comparatively, more expensive acquisition than unlabeled data.

“Semisupervised” learning attempts to improve the accuracy of supervised learning by exploiting


information in unlabeled data. This sounds like magic, but it can work!

Real-world applications of Semi-supervised Learning


• Speech Analysis
• Web content classification
• Protein sequence classification
• Text document classifier

4) Reinforcement Learning

Reinforcement learning is learning what to do — how to map situations to actions—so as to


maximize a numerical reward signal. The learner is not told which actions to take, but instead must
discover which actions yield the most reward by trying them.

Terms used in Reinforcement Learning

• Environment — Physical world in which the agent operates


• State — Current situation of the agent
• Reward — Feedback from the environment
• Policy — Method to map agent’s state to actions
• Value — Future reward that an agent would receive by taking an action in a particular
state
RKR21 IML II/I ALL BRANCHES

Reinforcement learning is a feedback-based learning method, in which a learning agent gets a


reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning, the
agent interacts with the environment and explores it. The goal of an agent is to get the most reward
points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
An example of a reinforcement problem is playing a game where the agent has the goal of getting
a high score and can make moves in the game and received feedback in terms of punishments or
rewards.

Fig11: Example for Reinforcement learning

In many complex domains, reinforcement learning is the only feasible way to train a program to
perform at high levels. For example, in game playing, it is very hard for a human to provide
accurate and consistent evaluations of large numbers of positions, which would be needed to train
an evaluation function directly from examples. Instead, the program can be told when it has won
or lost, and it can use this information to learn an evaluation function that gives reasonably accurate
estimates of the probability of winning from any given position.

• Suppose there is an AI agent present within a maze environment, and his goal is to find the
diamond.
RKR21 IML II/I ALL BRANCHES

• The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as
feedback.
• The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
• The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty.
• As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.

Supervised Unsupervised Semi-Supervised Reinforcement


Parameters
Learning Learning Learning Learning

Trained using
Trained using Works on
Learns by using unlabeled data
Definition both labeled& interacting with
labeled data without any
unlabeled data the environment
guidance
Type of No –predefined
Labeled data Unlabeled data Both
data data
Type of Regression and Association and Classification and Exploitation or
problems classification Clustering Regression Exploration

Supervision Yes No No supervision

Linear Regression,
Text document Q –Learning,
Algorithms Logistic Regression, K –Means
classifier SARSA
SVM, KNN etc.
Classify the data
Discover and also
Learn a series of
Aim Calculate outcomes underlying discovers
action
patterns underlying
patterns
Recommendatio
Self Driving
Risk Evaluation, n System, Text
Application Cars, Gaming,
Forecast Sales Anomaly Classification
Healthcare
Detection

Table1: Comparison of Machine Learning Techniques


RKR21 IML II/I ALL BRANCHES

1.4 Batch and Online Learning


Batch learning represents the training of machine learning models in a batch manner. In other
words, batch learning represents the training of the models at regular intervals such as weekly, bi-
weekly, monthly, quarterly, etc. The data gets accumulated over a period of time. The models then
get trained with the accumulated data from time to time at periodic intervals.

Batch learning is also called offline learning. The models trained using batch or offline learning
are moved into production only at regular intervals based on the performance of models trained
with new data.

Fig12: Process of Batch Learning

Building offline models or models trained in a batch manner requires training the models with the
entire training data set. Improving the model performance would require re-training all over again
with the entire training data set. These models are static in nature which means that once they get
trained; their performance will not improve until a new model gets re-trained. Offline models or
models trained using batch learning are deployed in the production environment by replacing the
old model with the newly trained model.
RKR21 IML II/I ALL BRANCHES

There can be various reasons why we can choose to adopt batch learning for training the models.
Some of these reasons are the following:

● The business requirements do not require frequent learning of models.


● The data distribution is not expected to change frequently. Therefore, batch learning is
suitable.
● The software systems (big data) required for batch learning is not available due to various
reasons including the cost. The fact that the model is trained with a lot of accumulated data
takes a lot of time and resources (CPU, memory space, disk space, disk I/O, network I/O,
etc.).
● The expertise required for creating the system for incremental learning is not available.

If the models trained using batch learning needs to learn about new data, the models need to be
retrained using the new data set and replaced appropriately with the model already in production
based on different criteria such as model performance. The whole process of batch learning can be
automated as well. The disadvantage of batch learning is it takes a lot of time and resources to re-
training the model.

The criteria based on which the machine learning models can be decided to train in a batch manner
depends on the model performance. Red-amber-green statuses can be used to determine the health
of the model based on the prediction accuracy or error rates. Accordingly, the models can be
chosen to be retrained or otherwise.

The following stakeholders can be involved in reviewing the model performance and leveraging
batch learning:

● Business/product owners
● Product managers
● Data scientists
● ML engineers
In online learning, the training happens in an incremental manner by continuously feeding data as
it arrives or in a small group. Each learning step is fast and cheap, so the system can learn about
new data on the fly, as it arrives.
RKR21 IML II/I ALL BRANCHES

Online learning is great for machine learning systems that receive data as a continuous flow (e.g.,
stock prices) and need to adapt to change rapidly or autonomously. It is also a good option if you
have limited computing resources: once an online learning system has learned about new data
instances, it does not need them anymore, so you can discard them (unless you want to be able to
roll back to a previous state and “replay” the data) or move the data to another form of storage
(warm or cold storage) if you are using the data lake. This can save a huge amount of space and
cost.

Fig13: Process of Online Learning

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one
machine’s main memory (this is also called out-of-core learning). The algorithm loads part of the
data runs a training step on that data and repeats the process until it has run on all of the data.

One of the key aspects of online learning is the learning rate. The rate at which you want your
machine learning to adapt to new data set is called the learning rate. A system with a high learning
rate will tend to forget the learning quickly. A system with a low learning rate will be more like
batch learning.

One of the big disadvantages of an online learning system is that if it is fed with bad data, the
system will have bad performance and the user will see the impact instantly. Thus, it is very
important to come up with appropriate data governance strategy to ensure that the data fed is of
high quality. In addition, it is very important to monitor the performance of the machine learning
system in a very close manner.
RKR21 IML II/I ALL BRANCHES

The following are some of the challenges for adopting an online learning method:

● Data governance
● Model governance includes appropriate algorithm and model selection on-the-fly

Online models require only a single deployment in the production setting and they evolve over a
period of time. The disadvantage that the online models have is that they don’t have the entire
dataset available for the training. The models are trained in an incremental manner based on the
assumptions made using the available data and the assumptions at times can be sub-optimal.

Online learning vs Batch learning

More complex because the model Less complex because the model is
Complexity keeps evolving over time as more fed with more consistent data sets
data becomes available. periodically.

More computational power is Less computational power is needed


Computational required because of the continuous because data is delivered in batches;
power feed of data that leads to the model isn’t continuously refining
continuous refinement. itself.

Harder to implement and control Easier to implement because offline


because the production model learning provides engineers with
Use in production
changes in real-time according to more time to perfect the model
its data feed. before deployment.

Used in applications where new Used in applications where data


data patterns are constantly patterns remain constant and don’t
Applications
required (e.g., weather prediction have sudden concept drifts (e.g.,
tools, Stock market predictions) image classification)

Table2: Comparison of Online learning vs Batch learning

1.4 Challenges in Machine Learning and Introduction to


Probability
RKR21 IML II/I ALL BRANCHES

1. Inadequate Training Data

Noisy Data-It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
Incorrect data-It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
Generalizing of output data-Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.

2. Monitoring and maintenance

Regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.

3. Lack of skilled resources

Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others.

4. Process Complexity of Machine Learning

Machine learning includes analyzing the data, removing data bias, training data, applying complex
mathematical calculations, etc., making the procedure more complicated and quite tedious.

5. Slow implementations and results

Machine learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected.

6.Overfitting and Under fitting

Overfitting:
RKR21 IML II/I ALL BRANCHES

•Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.

•Let's understand with a simple example where we have a few training data sets such as
1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a
considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set.

Underfitting:
•Whenever a machine learning model is trained with fewer amounts of data, and as a
result, it provides incomplete and inaccurate data and destroys the accuracy of the
machine learning model.

Probability
Probability is the foundation stone of ML, which tells how likely is the event to occur. The value
of Probability always lies between 0 to 1. 1 indicates more likely that event will occur. 0 indicates
that event will not occur.

Formula:

Probability of an event = (Number of way an event can occur) / (Total number of outcomes)

Example 1: Tossing a Coin

When a coin is tossed, there are two possible outcomes: Heads (H) or Tails (T)
Number of ways Head can happen: 1(there is only 1 face with a “H" on Coin)
Total number of outcomes: 2(there are 2 faces altogether)
So the probability of Head (H)= 1/2 i.e 50%, similar for Tail also

Find the chances of rolling a "4" with a single die?


RKR21 IML II/I ALL BRANCHES

Number of ways it can happen: 1(there is only 1 face with a "4" on it)
Total number of outcomes: 6(there are 6 faces altogether)
So the probability =1/6

You might also like