Unit 4
Unit 4
This machine learning tutorial gives you an introduction to machine learning along
with the wide range of machine learning techniques such
as Supervised, Unsupervised, and Reinforcement learning. You will learn about
regression and classification models, clustering methods, hidden Markov models,
and various sequential models.
71.9M
1.1K
Hello Java Program for Beginners
Machine Learning is said as a subset of artificial intelligence that is mainly
concerned with the development of algorithms which allow a computer to learn from
the data and past experiences on their own. The term machine learning was first
introduced by Arthur Samuel in 1959. We can define it in a summarized way as:
With the help of sample historical data, which is known as training data, machine
learning algorithms build a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning brings computer
science and statistics together for creating predictive models. Machine learning
constructs or uses the algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining
more data.
We can train machine learning algorithms by providing them the huge amount of
data and let them explore the data, construct the models, and predict the required
output automatically. The performance of the machine learning algorithm depends
on the amount of data, and it can be determined by the cost function. With the help
of machine learning, we can save both time and money.
The importance of machine learning can be easily understood by its uses cases,
Currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such
as Netflix and Amazon have build machine learning models that are using a vast
amount of data to analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine
Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide
sample labeled data to the machine learning system in order to train it, and on that
basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model
by providing a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student
learns things in the supervision of the teacher. The example of supervised learning
is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been
labeled, classified, or categorized, and the algorithm needs to act on that data
without any supervision. The goal of unsupervised learning is to restructure the input
data into new features or a group of objects with similar patterns.
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning
agent gets a reward for each right action and gets a penalty for each wrong action.
The agent learns automatically with these feedbacks and improves its performance.
In reinforcement learning, the agent interacts with the environment and explores it.
The goal of an agent is to get the most reward points, and hence, it improves its
performance.
The robotic dog, which automatically learns the movement of his arms, is an example
of Reinforcement learning.
Note: We will learn about the above types of machine learning in detail in later chapters.
History of Machine Learning
Before some years (about 40-50 years), machine learning was science fiction, but
today it is the part of our daily life. Machine learning is making our day to day life
easy from self-driving cars to Amazon virtual assistant "Alexa". However, the idea
behind machine learning is so old and has a long history. Below some milestones are
given which have occurred in the history of machine learning:
o The duration of 1974 to 1980 was the tough time for AI and ML researchers,
and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.
o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce
20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against
the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new
name to neural net research as "deep learning," and nowadays, it has become
one of the most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to
recognize the image of humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was
the first Chabot who convinced the 33% of human judges that it was not a
machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they
claimed that it could recognize a person with the same precision as a human
can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go
game. In 2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was
able to learn the online trolling. It used to read millions of comments of
different websites to learn to stop online trolling.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
Play Video
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the
performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal,
and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.
For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction,
there is a specific pattern which gets change for the fraud transaction hence, it
detects it and makes our online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.
Machine learning life cycle involves seven major steps, which are given below:
o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment
The most important thing in the complete process is to understand the problem and
to know the purpose of the problem. Therefore, before starting the life cycle, we
need to understand the problem because the good result depends on the better
understanding of the problem.
In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train
a model, we need data, hence, life cycle starts by collecting data.
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this
step is to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can be collected
from various sources such as files, database, internet, or mobile devices. It is one
of the most important steps of the life cycle. The quantity and quality of the collected
data will determine the efficiency of the output. The more will be the data, the more
accurate will be the prediction.
By performing the above task, we get a coherent set of data, also called as a dataset.
It will be used in further steps.
2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.
In this step, first, we put all data together, and then randomize the ordering of data.
o Data exploration:
It is used to understand the nature of data that we have to work with. We
need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.
3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.
It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:
o Missing Values
o Duplicate data
o Invalid data
o Noise
It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:
The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine learning
techniques such as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.
Hence, in this step, we take the data and use machine learning algorithms to build
the model.
5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.
We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns, rules,
and, features.
6. Test Model
Once our machine learning model has been trained on a given dataset, then we test
the model. In this step, we check for the accuracy of our model by providing a test
dataset to it.
Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.
7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the
model in the real-world system.
In this topic, we will learn to install Python and an IDE with the help of Anaconda
distribution.
Below some steps are given to show the downloading and installing process of
Anaconda and IDE:
o To download Anaconda in your system, firstly, open your favorite browser and
type Download Anaconda Python, and then click on the first link as given in
the below image. Alternatively, you can directly download it by clicking on this
link, https://www.anaconda.com/distribution/#download-section.
o After clicking on the first link, you will reach to download page of Anaconda,
as shown in the below image:
o Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you can
download it as per your OS type by clicking on available options shown in
below image. It will provide you Python 2.7 and Python 3.7 versions, but the
latest version is 3.7, hence we will download Python 3.7 version. After clicking
on the download option, it will start downloading on your computer.
Note: In this topic, we are downloading Anaconda for Windows you can choose it as per
your OS.
o In the next window, you will get two options for installations as given in the
below image. Select the first option (Just me) and click on Next.
o Now you will get a window for installing location, here, you can leave it as
default or change it by browsing a location, and then click on Next. Consider
the below image:
o Now installation is completed, tick the checkbox if you want to learn more
about Anaconda and Anaconda cloud. Click on Finish to end the process.
Note: Here, we will use the Spyder IDE to run Python programs.
The goal of AI is to make a smart computer The goal of ML is to allow machines to learn
system like humans to solve complex from data so that they can give accurate
problems. output.
In AI, we make intelligent systems to In ML, we teach machines with data to perform
perform any task like a human. a particular task and give an accurate result.
Machine learning and deep learning are the Deep learning is a main subset of machine
two main subsets of AI. learning.
AI has a very wide range of scope. Machine learning has a limited scope.
On the basis of capabilities, AI can be Machine learning can also be divided into
divided into three types, which are, Weak mainly three types that are Supervised
AI, General AI, and Strong AI. learning, Unsupervised learning,
and Reinforcement learning.
It includes learning, reasoning, and self- It includes learning and self-correction when
correction. introduced with new data.
AI completely deals with Structured, semi- Machine learning deals with Structured and
structured, and unstructured data. semi-structured data.
In recent data mining projects, various major data mining techniques have been
developed and used, including association, classification, clustering, prediction,
sequential patterns, and regression.
1. Classification:
This technique is used to obtain important and relevant information about data and
metadata. This data mining technique helps to classify data in different classes.
Play Video
2. Clustering:
Clustering is a division of information into groups of connected objects. Describing
the data by a few clusters mainly loses certain confine details, but accomplishes
improvement. It models data by its clusters. Data modeling puts clustering from a
historical point of view rooted in statistics, mathematics, and numerical analysis.
From a machine learning point of view, clusters relate to hidden patterns, the search
for clusters is unsupervised learning, and the subsequent framework represents a
data concept. From a practical point of view, clustering plays an extraordinary job in
data mining applications. For example, scientific data exploration, text mining,
information retrieval, spatial database applications, CRM, Web analysis,
computational biology, medical diagnostics, and much more.
In other words, we can say that Clustering analysis is a data mining technique to
identify similar data. This technique helps to recognize the differences and similarities
between the data. Clustering is very similar to the classification, but it involves
grouping chunks of data together based on their similarities.
3. Regression:
Regression analysis is the data mining process is used to identify and analyze the
relationship between variables because of the presence of the other factor. It is used
to define the probability of the specific variable. Regression, primarily a form of
planning and modeling. For example, we might use it to project certain costs,
depending on other factors such as availability, consumer demand, and competition.
Primarily it gives the exact relationship between two or more variables in the given
data set.
4. Association Rules:
This data mining technique helps to discover a link between two or more items. It
finds a hidden pattern in the data set.
Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of databases.
Association rule mining has several applications and is commonly used to help sales
correlations in data or medical data sets.
The way the algorithm works is that you have various data, For example, a list of
grocery items that you have been buying for the last six months. It calculates a
percentage of items being purchased together.
o Lift:
This measurement technique measures the accuracy of the confidence over
how often item B is purchased.
(Confidence) / (item B)/ (Entire dataset)
o Support:
This measurement technique measures how often multiple items are
purchased and compared it to the overall dataset.
(Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when
item A is purchased as well.
(Item A + Item B)/ (Item A)
5. Outer detection:
This type of data mining technique relates to the observation of data items in the
data set, which do not match an expected pattern or expected behavior. This
technique may be used in various domains like intrusion, detection, fraud detection,
etc. It is also known as Outlier Analysis or Outilier mining. The outlier is a data point
that diverges too much from the rest of the dataset. The majority of the real-world
datasets have an outlier. Outlier detection plays a significant role in the data mining
field. Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.
6. Sequential Patterns:
The sequential pattern is a data mining technique specialized for evaluating
sequential data to discover sequential patterns. It comprises of finding interesting
subsequences in a set of sequences, where the stake of a sequence can be measured
in terms of different criteria like length, occurrence frequency, etc.
In other words, this technique of data mining helps to discover or recognize similar
patterns in transaction data over some time.
7. Prediction:
Prediction used a combination of other data mining techniques such as trends,
clustering, classification, etc. It analyzes past events or instances in the right
sequence to predict a future event.
Tasks:
o A business goal states the target of the business terminology. For example,
increase catalog sales to the existing customer.
o A data mining goal describes the project objectives. For example, It assumes
how many objects a customer will buy, given their demographics details (Age,
Salary, and City) and the price of the item over the past three years.
o It states the targeted plan to accomplish the business and data mining plan.
o The project plan should define the expected set of steps to be performed
during the rest of the project, including the latest technique and better
selection of tools.
2. Data Understanding:
Data understanding starts with an original data collection and proceeds with
operations to get familiar with the data, to data quality issues, to find better insight
in data, or to detect interesting subsets for concealed information hypothesis.
Tasks:
Explore data:
3. Data Preparation:
Tasks:
o Select data
o Clean data
o Construct data
o Integrate data
o Format data
Select data:
Clean data:
Construct data:
Integrate data:
o Integrate data refers to the methods whereby data is combined from various
tables, or documents to create new documents or values.
Format data:
4. Modeling:
In modeling, various modeling methods are selected and applied, and their
parameters are measured to optimum values. Some methods gave particular
requirements on the form of data. Therefore, stepping back to the data preparation
phase is necessary.
Tasks:
o It selects the real modeling method that is to be used. For example, decision
tree, neural network.
o If various methods are applied,then it performs this task individually for each
method.
o Generate a procedure or mechanism for testing the validity and quality of the
model before constructing a model. For example, in classification, error rates
are commonly used as quality measures for data mining models. Therefore,
typically separate the data set into train and test set, build the model on the
train set and assess its quality on the separate test set.
Build model:
o To create one or more models, we need to run the modeling tool on the
prepared data set.
Assess model:
o It interprets the models according to its domain expertise, the data mining
success criteria, and the required design.
o It assesses the success of the application of modeling and discovers methods
more technically.
o It Contacts business analytics and domain specialists later to discuss the
outcomes of data mining in the business context.
5. Evaluation:
o At the last of this phase, a decision on the use of the data mining results
should be reached.
o It evaluates the model efficiently, and review the steps executed to build the
model and to ensure that the business objectives are properly achieved.
o The main objective of the evaluation is to determine some significant business
issue that has not been regarded adequately.
o At the last of this phase, a decision on the use of the data mining outcomes
should be reached.
Tasks:
o Evaluate results
o Review process
o Determine next steps
Evaluate results:
o It assesses the degree to which the model meets the organization's business
objectives.
o It tests the model on test apps in the actual implementation when time and
budget limitations permit and also assesses other data mining results
produced.
o It unveils additional difficulties, suggestions, or information for future
instructions.
Review process:
o The review process does a more detailed evaluation of the data mining
engagement to determine when there is a significant factor or task that has
been somehow ignored.
o It reviews quality assurance problems.
6. Deployment:
Determine:
o Deployment refers to how the outcomes need to be utilized.
Tasks:
o Plan deployment
o Plan monitoring and maintenance
o Produce final report
o Review project
Plan deployment:
o To deploy the data mining outcomes into the business, takes the assessment
results and concludes a strategy for deployment.
o It refers to documentation of the process for later deployment.
o It is important when the data mining results become part of the day-to-day
business and its environment.
o It helps to avoid unnecessarily long periods of misuse of data mining results.
o It needs a detailed analysis of the monitoring process.
o A final report can be drawn up by the project leader and his team.
o It may only be a summary of the project and its experience.
o It may be a final and comprehensive presentation of data mining.
Review project:
o Review projects evaluate what went right and what went wrong, what was
done wrong, and what needs to be improved.