0% found this document useful (0 votes)
79 views39 pages

Unit 4

This document provides an overview of machine learning. It defines machine learning as a type of artificial intelligence that enables computers to learn automatically from past data and experiences. The document then describes different types of machine learning techniques including supervised learning, unsupervised learning, and reinforcement learning. It also provides a brief history of machine learning from its early conceptions to modern applications.

Uploaded by

Sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views39 pages

Unit 4

This document provides an overview of machine learning. It defines machine learning as a type of artificial intelligence that enables computers to learn automatically from past data and experiences. The document then describes different types of machine learning techniques including supervised learning, unsupervised learning, and reinforcement learning. It also provides a brief history of machine learning from its early conceptions to modern applications.

Uploaded by

Sonali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

UNIT-4

AI & Its Application


Machine Learning

Machine Learning tutorial provides basic and advanced concepts of machine


learning. Our machine learning tutorial is designed for students and working
professionals.

Machine learning is a growing technology which enables computers to learn


automatically from past data. Machine learning uses various algorithms for building
mathematical models and making predictions using historical data or
information. Currently, it is being used for various tasks such as image
recognition, speech recognition, email filtering, Facebook auto-
tagging, recommender system, and many more.

This machine learning tutorial gives you an introduction to machine learning along
with the wide range of machine learning techniques such
as Supervised, Unsupervised, and Reinforcement learning. You will learn about
regression and classification models, clustering methods, hidden Markov models,
and various sequential models.

What is Machine Learning


In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data
like a human does? So here comes the role of Machine Learning.

71.9M
1.1K
Hello Java Program for Beginners
Machine Learning is said as a subset of artificial intelligence that is mainly
concerned with the development of algorithms which allow a computer to learn from
the data and past experiences on their own. The term machine learning was first
introduced by Arthur Samuel in 1959. We can define it in a summarized way as:

Machine learning enables a machine to automatically learn from data, improve


performance from experiences, and predict things without being explicitly
programmed.

With the help of sample historical data, which is known as training data, machine
learning algorithms build a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning brings computer
science and statistics together for creating predictive models. Machine learning
constructs or uses the algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining
more data.

How does Machine Learning work


A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it. The
accuracy of predicted output depends upon the amount of data, as the huge amount
of data helps to build a better model which predicts the output more accurately.
Suppose we have a complex problem, where we need to perform some predictions,
so instead of writing a code for it, we just need to feed the data to generic
algorithms, and with the help of these algorithms, machine builds the logic as per the
data and predict the output. Machine learning has changed our way of thinking
about the problem. The below block diagram explains the working of Machine
Learning algorithm:

Features of Machine Learning:


o Machine learning uses data to detect various patterns in a given dataset.
o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge
amount of the data.

Need for Machine Learning


The need for machine learning is increasing day by day. The reason behind the need
for machine learning is that it is capable of doing tasks that are too complex for a
person to implement directly. As a human, we have some limitations as we cannot
access the huge amount of data manually, so for this, we need some computer
systems and here comes the machine learning to make things easy for us.

We can train machine learning algorithms by providing them the huge amount of
data and let them explore the data, construct the models, and predict the required
output automatically. The performance of the machine learning algorithm depends
on the amount of data, and it can be determined by the cost function. With the help
of machine learning, we can save both time and money.

The importance of machine learning can be easily understood by its uses cases,
Currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc. Various top companies such
as Netflix and Amazon have build machine learning models that are using a vast
amount of data to analyze the user interest and recommend product accordingly.
Following are some key points which show the importance of Machine
Learning:

o Rapid increment in the production of data


o Solving complex problems, which are difficult for a human
o Decision making in various sector including finance
o Finding hidden patterns and extracting useful information from data.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide
sample labeled data to the machine learning system in order to train it, and on that
basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model
by providing a sample data to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student
learns things in the supervision of the teacher. The example of supervised learning
is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression

2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.

The training is provided to the machine with the set of data that has not been
labeled, classified, or categorized, and the algorithm needs to act on that data
without any supervision. The goal of unsupervised learning is to restructure the input
data into new features or a group of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to


find useful insights from the huge amount of data. It can be further classifieds into
two categories of algorithms:

o Clustering
o Association

3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning
agent gets a reward for each right action and gets a penalty for each wrong action.
The agent learns automatically with these feedbacks and improves its performance.
In reinforcement learning, the agent interacts with the environment and explores it.
The goal of an agent is to get the most reward points, and hence, it improves its
performance.

The robotic dog, which automatically learns the movement of his arms, is an example
of Reinforcement learning.

Note: We will learn about the above types of machine learning in detail in later chapters.
History of Machine Learning
Before some years (about 40-50 years), machine learning was science fiction, but
today it is the part of our daily life. Machine learning is making our day to day life
easy from self-driving cars to Amazon virtual assistant "Alexa". However, the idea
behind machine learning is so old and has a long history. Below some milestones are
given which have occurred in the history of machine learning:

The early history of Machine Learning (Pre-1940):

o 1834: In 1834, Charles Babbage, the father of the computer, conceived a


device that could be programmed with punch cards. However, the machine
was never built, but all modern computers rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine
and execute a set of instructions.

The era of stored program computers:


o 1940: In 1940, the first manually operated computer, "ENIAC" was invented,
which was the first electronic general-purpose computer. After that stored
program computer such as EDSAC in 1949 and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit.
In 1950, the scientists started applying their idea to work and analyzed how
human neurons might work.

Computer machinery and intelligence:

o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery


and Intelligence," on the topic of artificial intelligence. In his paper, he
asked, "Can machines think?"

Machine intelligence in Games:

o 1952: Arthur Samuel, who was the pioneer of machine learning, created a


program that helped an IBM computer to play a checkers game. It performed
better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur
Samuel.

The first "AI" winter:

o The duration of 1974 to 1980 was the tough time for AI and ML researchers,
and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.

Machine Learning from theory to reality

o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce
20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against
the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.

Machine Learning at 21st century

o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new
name to neural net research as "deep learning," and nowadays, it has become
one of the most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to
recognize the image of humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was
the first Chabot who convinced the 33% of human judges that it was not a
machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they
claimed that it could recognize a person with the same precision as a human
can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go
game. In 2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was
able to learn the online trolling. It used to read millions of comments of
different websites to learn to stop online trolling.

Applications of Machine learning


Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it
such as Google Maps, Google assistant, Alexa, etc. Below are some most trending
real-world applications of Machine Learning:
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is
used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we


upload a photo with our Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

Play Video

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is


also known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.

3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or


heavily congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the
performance.

4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment


series, movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars.
Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.
6. Email Spam and Malware Filtering:
Whenever we receive a new email, it is filtered automatically as important, normal,
and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:


We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction. These assistants can help us in various ways
just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.

8. Online Fraud Detection:


Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various
ways that a fraudulent transaction can take place such as fake accounts, fake ids,
and steal money in the middle of a transaction. So to detect this, Feed Forward
Neural network helps us by checking whether it is a genuine transaction or a fraud
transaction.

For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction,
there is a specific pattern which gets change for the fraud transaction hence, it
detects it and makes our online transactions more secure.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:


In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

11. Automatic Language Translation:


Nowadays, if we visit a new place and we are not aware of the language then it is not
a problem at all, as for this also machine learning helps us by converting the text into
our known languages. Google's GNMT (Google Neural Machine Translation) provide
this feature, which is a Neural Machine Learning that translates the text into our
familiar language, and it called as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one
language to another language.

Machine learning Life cycle


Machine learning has given the computer systems the abilities to automatically learn
without being explicitly programmed. But how does a machine learning system
work? So, it can be described using the life cycle of machine learning. Machine
learning life cycle is a cyclic process to build an efficient machine learning project.
The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

o Gathering Data
o Data preparation
o Data Wrangling
o Analyse Data
o Train the model
o Test the model
o Deployment

The most important thing in the complete process is to understand the problem and
to know the purpose of the problem. Therefore, before starting the life cycle, we
need to understand the problem because the good result depends on the better
understanding of the problem.

In the complete life cycle process, to solve a problem, we create a machine learning
system called "model", and this model is created by providing "training". But to train
a model, we need data, hence, life cycle starts by collecting data.
1. Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The goal of this
step is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected
from various sources such as files, database, internet, or mobile devices. It is one
of the most important steps of the life cycle. The quantity and quality of the collected
data will determine the efficiency of the output. The more will be the data, the more
accurate will be the prediction.

This step includes the below tasks:

o Identify various data sources


o Collect data
o Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset.
It will be used in further steps.

2. Data preparation
After collecting the data, we need to prepare it for further steps. Data preparation is
a step where we put our data into a suitable place and prepare it to use in our
machine learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

o Data exploration:
It is used to understand the nature of data that we have to work with. We
need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find
Correlations, general trends, and outliers.
o Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the
next step. It is one of the most important steps of the complete process. Cleaning of
data is required to address the quality issues.

It is not necessary that data we have collected is always of our use as some of the
data may not be useful. In real-world applications, collected data may have various
issues, including:

o Missing Values
o Duplicate data
o Invalid data
o Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.
4. Data Analysis
Now the cleaned and prepared data is passed on to the analysis step. This step
involves:

o Selection of analytical techniques


o Building models
o Review the result

The aim of this step is to build a machine learning model to analyze the data using
various analytical techniques and review the outcome. It starts with the
determination of the type of the problems, where we select the machine learning
techniques such as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build
the model.

5. Train Model
Now the next step is to train the model, in this step we train our model to improve its
performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms.
Training a model is required so that it can understand the various patterns, rules,
and, features.

6. Test Model
Once our machine learning model has been trained on a given dataset, then we test
the model. In this step, we check for the accuracy of our model by providing a test
dataset to it.

Testing the model determines the percentage accuracy of the model as per the
requirement of project or problem.

7. Deployment
The last step of machine learning life cycle is deployment, where we deploy the
model in the real-world system.

Installing Anaconda and Python


To learn machine learning, we will use the Python programming language in this
tutorial. So, in order to use Python for machine learning, we need to install it in our
computer system with compatible IDEs (Integrated Development Environment).

In this topic, we will learn to install Python and an IDE with the help of Anaconda
distribution.

Anaconda distribution is a free and open-source platform for Python/R programming


languages. It can be easily installed on any OS such as Windows, Linux, and MAC OS.
It provides more than 1500 Python/R data science packages which are suitable for
developing machine learning and deep learning models.

Anaconda distribution provides installation of Python with various IDE's such


as Jupyter Notebook, Spyder, Anaconda prompt, etc. Hence it is a very convenient
packaged solution which you can easily download and install in your computer. It will
automatically install Python and some basic IDEs and libraries with it.

Below some steps are given to show the downloading and installing process of
Anaconda and IDE:

Step-1: Download Anaconda Python:

o To download Anaconda in your system, firstly, open your favorite browser and
type Download Anaconda Python, and then click on the first link as given in
the below image. Alternatively, you can directly download it by clicking on this
link, https://www.anaconda.com/distribution/#download-section.
o After clicking on the first link, you will reach to download page of Anaconda,
as shown in the below image:

o Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you can
download it as per your OS type by clicking on available options shown in
below image. It will provide you Python 2.7 and Python 3.7 versions, but the
latest version is 3.7, hence we will download Python 3.7 version. After clicking
on the download option, it will start downloading on your computer.
Note: In this topic, we are downloading Anaconda for Windows you can choose it as per
your OS.

Step- 2: Install Anaconda Python (Python 3.7 version):


Once the downloading process gets completed, go to downloads → double click on
the ".exe" file (Anaconda3-2019.03-Windows-x86_64.exe) of Anaconda. It will
open a setup window for Anaconda installations as given in below image, then click
on Next.
o It will open a License agreement window click on "I Agree" option and move
further.

o In the next window, you will get two options for installations as given in the
below image. Select the first option (Just me) and click on Next.
o Now you will get a window for installing location, here, you can leave it as
default or change it by browsing a location, and then click on Next. Consider
the below image:

o Now select the second option, and click on install.


o Once the installation gets complete, click on Next.

o Now installation is completed, tick the checkbox if you want to learn more
about Anaconda and Anaconda cloud. Click on Finish to end the process.
Note: Here, we will use the Spyder IDE to run Python programs.

Step- 3: Open Anaconda Navigator

o After successful installation of Anaconda, use Anaconda navigator to launch a


Python IDE such as Spyder and Jupyter Notebook.
o To open Anaconda Navigator, click on window Key   and search for
Anaconda navigator, and click on it. Consider the below image:
o After opening the navigator, launch the Spyder IDE by clicking on
the Launch button given below the Spyder. It will install the Spyder IDE in
your system.

Run your Python program in Spyder IDE.

o Open Spyder IDE, it will look like the below image:


o Write your first program, and save it using the .py extension.
o Run the program using the triangle Run button.
o You can check the program's output on console pane at the bottom right side.

Step- 4: Close the Spyder IDE.

Key differences between Artificial Intelligence (AI)


and Machine learning (ML):

Artificial Intelligence Machine learning

Artificial intelligence is a technology which Machine learning is a subset of AI which allows


enables a machine to simulate human a machine to automatically learn from past
behavior. data without programming explicitly.

The goal of AI is to make a smart computer The goal of ML is to allow machines to learn
system like humans to solve complex from data so that they can give accurate
problems. output.

In AI, we make intelligent systems to In ML, we teach machines with data to perform
perform any task like a human. a particular task and give an accurate result.

Machine learning and deep learning are the Deep learning is a main subset of machine
two main subsets of AI. learning.
AI has a very wide range of scope. Machine learning has a limited scope.

AI is working to create an intelligent system Machine learning is working to create machines


which can perform various complex tasks. that can perform only those specific tasks for
which they are trained.

AI system is concerned about maximizing Machine learning is mainly concerned about


the chances of success. accuracy and patterns.

The main applications of AI are Siri, The main applications of machine learning


customer support using catboats, Expert are Online recommender system, Google
System, Online game playing, intelligent search algorithms, Facebook auto friend
humanoid robot, etc. tagging suggestions, etc.

On the basis of capabilities, AI can be Machine learning can also be divided into
divided into three types, which are, Weak mainly three types that are Supervised
AI, General AI, and Strong AI. learning, Unsupervised learning,
and Reinforcement learning.

It includes learning, reasoning, and self- It includes learning and self-correction when
correction. introduced with new data.

AI completely deals with Structured, semi- Machine learning deals with Structured and
structured, and unstructured data. semi-structured data.

Data Mining Techniques


Data mining includes the utilization of refined data analysis tools to find previously
unknown, valid patterns and relationships in huge data sets. These tools can
incorporate statistical models, machine learning techniques, and mathematical
algorithms, such as neural networks or decision trees. Thus, data mining incorporates
analysis and prediction.

Depending on various methods and technologies from the intersection of machine


learning, database management, and statistics, professionals in data mining have
devoted their careers to better understanding how to process and make conclusions
from the huge amount of data, but what are the methods they use to make it
happen?

In recent data mining projects, various major data mining techniques have been
developed and used, including association, classification, clustering, prediction,
sequential patterns, and regression.

1. Classification:
This technique is used to obtain important and relevant information about data and
metadata. This data mining technique helps to classify data in different classes.

Play Video

Data mining techniques can be classified by different criteria, as follows:

i. Classification of Data mining frameworks as per the type of data sources


mined:
This classification is as per the type of data handled. For example, multimedia,
spatial data, text data, time-series data, World Wide Web, and so on..
ii. Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-
oriented database, transactional database, relational database, and so on..
iii. Classification of data mining frameworks as per the kind of knowledge
discovered:
This classification depends on the types of knowledge discovered or data
mining functionalities. For example, discrimination, classification, clustering,
characterization, etc. some frameworks tend to be extensive frameworks
offering a few data mining functionalities together..
iv. Classification of data mining frameworks according to data mining
techniques used:
This classification is as per the data analysis approach utilized, such as neural
networks, machine learning, genetic algorithms, visualization, statistics, data
warehouse-oriented or database-oriented, etc.
The classification can also take into account, the level of user interaction
involved in the data mining procedure, such as query-driven systems,
autonomous systems, or interactive exploratory systems.

2. Clustering:
Clustering is a division of information into groups of connected objects. Describing
the data by a few clusters mainly loses certain confine details, but accomplishes
improvement. It models data by its clusters. Data modeling puts clustering from a
historical point of view rooted in statistics, mathematics, and numerical analysis.
From a machine learning point of view, clusters relate to hidden patterns, the search
for clusters is unsupervised learning, and the subsequent framework represents a
data concept. From a practical point of view, clustering plays an extraordinary job in
data mining applications. For example, scientific data exploration, text mining,
information retrieval, spatial database applications, CRM, Web analysis,
computational biology, medical diagnostics, and much more.

In other words, we can say that Clustering analysis is a data mining technique to
identify similar data. This technique helps to recognize the differences and similarities
between the data. Clustering is very similar to the classification, but it involves
grouping chunks of data together based on their similarities.

3. Regression:
Regression analysis is the data mining process is used to identify and analyze the
relationship between variables because of the presence of the other factor. It is used
to define the probability of the specific variable. Regression, primarily a form of
planning and modeling. For example, we might use it to project certain costs,
depending on other factors such as availability, consumer demand, and competition.
Primarily it gives the exact relationship between two or more variables in the given
data set.

4. Association Rules:
This data mining technique helps to discover a link between two or more items. It
finds a hidden pattern in the data set.

Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of databases.
Association rule mining has several applications and is commonly used to help sales
correlations in data or medical data sets.

The way the algorithm works is that you have various data, For example, a list of
grocery items that you have been buying for the last six months. It calculates a
percentage of items being purchased together.

These are three major measurements technique:

o Lift:
This measurement technique measures the accuracy of the confidence over
how often item B is purchased.
                  (Confidence) / (item B)/ (Entire dataset)
o Support:
This measurement technique measures how often multiple items are
purchased and compared it to the overall dataset.
                  (Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when
item A is purchased as well.
                  (Item A + Item B)/ (Item A)

5. Outer detection:
This type of data mining technique relates to the observation of data items in the
data set, which do not match an expected pattern or expected behavior. This
technique may be used in various domains like intrusion, detection, fraud detection,
etc. It is also known as Outlier Analysis or Outilier mining. The outlier is a data point
that diverges too much from the rest of the dataset. The majority of the real-world
datasets have an outlier. Outlier detection plays a significant role in the data mining
field. Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.

6. Sequential Patterns:
The sequential pattern is a data mining technique specialized for evaluating
sequential data to discover sequential patterns. It comprises of finding interesting
subsequences in a set of sequences, where the stake of a sequence can be measured
in terms of different criteria like length, occurrence frequency, etc.

In other words, this technique of data mining helps to discover or recognize similar
patterns in transaction data over some time.

7. Prediction:
Prediction used a combination of other data mining techniques such as trends,
clustering, classification, etc. It analyzes past events or instances in the right
sequence to predict a future event.

Data Mining Implementation Process


Many different sectors are taking advantage of data mining to boost their business
efficiency, including manufacturing, chemical, marketing, aerospace, etc. Therefore,
the need for a conventional data mining process improved effectively. Data mining
techniques must be reliable, repeatable by company individuals with little or no
knowledge of the data mining context. As a result, a cross-industry standard process
for data mining (CRISP-DM) was first introduced in 1990, after going through many
workshops, and contribution for more than 300 organizations.

Data mining is described as a process of finding hidden precious data by evaluating


the huge quantity of information stored in data warehouses, using multiple data
mining techniques such as Artificial Intelligence (AI), Machine learning and statistics.
Let's examine the implementation process for data mining in details:

The Cross-Industry Standard Process for Data


Mining (CRISP-DM)
Cross-industry Standard Process of Data Mining (CRISP-DM) comprises of six phases
designed as a cyclical method as the given figure: Play Video
1. Business understanding:
It focuses on understanding the project goals and requirements form a business
point of view, then converting this information into a data mining problem afterward
a preliminary plan designed to accomplish the target.

Tasks:

o Determine business objectives


o Access situation
o Determine data mining goals
o Produce a project plan

Determine business objectives:

o It Understands the project targets and prerequisites from a business point of


view.
o Thoroughly understand what the customer wants to achieve.
o Reveal significant factors, at the starting, it can impact the result of the project.
Access situation:

o It requires a more detailed analysis of facts about all the resources,


constraints, assumptions, and others that ought to be considered.

Determine data mining goals:

o A business goal states the target of the business terminology. For example,
increase catalog sales to the existing customer.
o A data mining goal describes the project objectives. For example, It assumes
how many objects a customer will buy, given their demographics details (Age,
Salary, and City) and the price of the item over the past three years.

Produce a project plan:

o It states the targeted plan to accomplish the business and data mining plan.
o The project plan should define the expected set of steps to be performed
during the rest of the project, including the latest technique and better
selection of tools.

2. Data Understanding:
Data understanding starts with an original data collection and proceeds with
operations to get familiar with the data, to data quality issues, to find better insight
in data, or to detect interesting subsets for concealed information hypothesis.

Tasks:

o Collects initial data


o Describe data
o Explore data
o Verify data quality

Collect initial data:

o It acquires the information mentioned in the project resources.


o It includes data loading if needed for data understanding.
o It may lead to original data preparation steps.
o If various information sources are acquired then integration is an extra issue,
either here or at the subsequent stage of data preparation.
Describe data:

o It examines the "gross" or "surface" characteristics of the information


obtained.
o It reports on the outcomes.

Explore data:

o Addressing data mining issues that can be resolved by querying,


visualizing, and reporting, including:
o Distribution of important characteristics, results of simple aggregation.
o Establish the relationship between the small number of attributes.
o Characteristics of important sub-populations, simple statical analysis.
o It may refine the data mining objectives.
o It may contribute or refine the information description, and quality reports.
o It may feed into the transformation and other necessary information
preparation.

Verify data quality:

o It examines the data quality and addressing questions.

3. Data Preparation:

o It usually takes more than 90 percent of the time.


o It covers all operations to build the final data set from the original raw
information.
o Data preparation is probable to be done several times and not in any
prescribed order.

Tasks:

o Select data
o Clean data
o Construct data
o Integrate data
o Format data
Select data:

o It decides which information to be used for evaluation.


o In the data selection criteria include significance to data mining objectives,
quality and technical limitations such as data volume boundaries or data
types.
o It covers the selection of characteristics and the choice of the document in the
table.

Clean data:

o It may involve the selection of clean subsets of data, inserting appropriate


defaults or more ambitious methods, such as estimating missing information
by modeling.

Construct data:

o It comprises of Constructive information preparation, such as generating


derived characteristics, complete new documents, or transformed values of
current characteristics.

Integrate data:

o Integrate data refers to the methods whereby data is combined from various
tables, or documents to create new documents or values.

Format data:

o Formatting data refer mainly to linguistic changes produced to information


that does not alter their significance but may require a modeling tool.

4. Modeling:
In modeling, various modeling methods are selected and applied, and their
parameters are measured to optimum values. Some methods gave particular
requirements on the form of data. Therefore, stepping back to the data preparation
phase is necessary.

Tasks:

o Select modeling technique


o Generate test design
o Build model
o Access model

Select modeling technique:

o It selects the real modeling method that is to be used. For example, decision
tree, neural network.
o If various methods are applied,then it performs this task individually for each
method.

Generate test Design:

o Generate a procedure or mechanism for testing the validity and quality of the
model before constructing a model. For example, in classification, error rates
are commonly used as quality measures for data mining models. Therefore,
typically separate the data set into train and test set, build the model on the
train set and assess its quality on the separate test set.

Build model:

o To create one or more models, we need to run the modeling tool on the
prepared data set.

Assess model:

o It interprets the models according to its domain expertise, the data mining
success criteria, and the required design.
o It assesses the success of the application of modeling and discovers methods
more technically.
o It Contacts business analytics and domain specialists later to discuss the
outcomes of data mining in the business context.

5. Evaluation:

o At the last of this phase, a decision on the use of the data mining results
should be reached.
o It evaluates the model efficiently, and review the steps executed to build the
model and to ensure that the business objectives are properly achieved.
o The main objective of the evaluation is to determine some significant business
issue that has not been regarded adequately.
o At the last of this phase, a decision on the use of the data mining outcomes
should be reached.

Tasks:

o Evaluate results
o Review process
o Determine next steps

Evaluate results:

o It assesses the degree to which the model meets the organization's business
objectives.
o It tests the model on test apps in the actual implementation when time and
budget limitations permit and also assesses other data mining results
produced.
o It unveils additional difficulties, suggestions, or information for future
instructions.

Review process:

o The review process does a more detailed evaluation of the data mining
engagement to determine when there is a significant factor or task that has
been somehow ignored.
o It reviews quality assurance problems.

Determine next steps:

o It decides how to proceed at this stage.


o It decides whether to complete the project and move on to deployment when
necessary or whether to initiate further iterations or set up new data-mining
initiatives.it includes resources analysis and budget that influence the
decisions.

6. Deployment:
Determine:
o Deployment refers to how the outcomes need to be utilized.

Deploy data mining results by:

o It includes scoring a database, utilizing results as company guidelines,


interactive internet scoring.
o The information acquired will need to be organized and presented in a way
that can be used by the client. However, the deployment phase can be as easy
as producing. However, depending on the demands, the deployment phase
may be as simple as generating a report or as complicated as applying a
repeatable data mining method across the organizations.

Tasks:

o Plan deployment
o Plan monitoring and maintenance
o Produce final report
o Review project

Plan deployment:

o To deploy the data mining outcomes into the business, takes the assessment
results and concludes a strategy for deployment.
o It refers to documentation of the process for later deployment.

Plan monitoring and maintenance:

o It is important when the data mining results become part of the day-to-day
business and its environment.
o It helps to avoid unnecessarily long periods of misuse of data mining results.
o It needs a detailed analysis of the monitoring process.

Produce final report:

o A final report can be drawn up by the project leader and his team.
o It may only be a summary of the project and its experience.
o It may be a final and comprehensive presentation of data mining.

Review project:
o Review projects evaluate what went right and what went wrong, what was
done wrong, and what needs to be improved.

You might also like