0% found this document useful (0 votes)

132 views

Unit-5 DS Notes

5 ds module

Uploaded by

krishjain531

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

Unit-5 DS Notes

5 ds module

Uploaded by

krishjain531

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT-5

Syllabus: Importance of Machine learning in Data Science

Machine Learning: Introduction to machine learning, applications for machine
learning in data science, python tools used in machine learning, the modeling process,
types of machine leaning.

Introduction to machine learning:

 What is Machine Learning: In the real world, we are surrounded by humans
who can learn everything from their experiences with their learning capability,
and we have computers or machines which work on our instructions? But can
a machine also learn from experiences or past data like a human does? So here
comes the role of Machine Learning.

 Features of Machine Learning: Machine learning uses data to detect various

patterns in a given dataset. It can learn from past data and improve
automatically.It is a data-driven technology.Machine learning is much similar
to data mining as it also deals with the huge amount of the data.
 Need for Machine Learning: The demand for machine learning is steadily
rising. Because it is able to perform tasks that are too complex for a person to
directly implement, machine learning is required.

N.KOTESWARA RAO, NEC, GUDUR. Page 1

 Humans are constrained by our inability to manually access vast amounts of
data; as a result, we require computer systems, which is where machine
learning comes in to simplify our lives.
 By providing them with a large amount of data and allowing them to
automatically explore the data, build models, and predict the required output,
we can train machine learning algorithms. We can save both time and money
by using machine learning.
 The significance of AI can be handily perceived by its utilization's cases,
Presently, AI is utilized in self-driving vehicles, digital misrepresentation
identification, face acknowledgment, and companion idea by Facebook, and so
on. Different top organizations, for example, Netflix and Amazon have
constructed AI models that are utilizing an immense measure of information to
examine the client interest and suggest item likewise.
 Following are some key points which show the importance of Machine
Learning:
1. Rapid increment in the production of data
2. Solving complex problems, which are difficult for a human
3. Decision making in various sector including finance
4. Finding hidden patterns and extracting useful information from data.
Applications for machine learning in data science:
 Machine Learning is a buzzword in the technology world right now and for
good reason, it represents a major step forward in how computers can learn.
 The following are the Applications of Machine Learning:
1. Traffic Alerts
2. Social Media
3. Transportation and Commuting
4. Products Recommendations
5. Virtual Personal Assistants
6. Self Driving Cars
7. Dynamic Pricing
8. Google Translate

N.KOTESWARA RAO, NEC, GUDUR. Page 2

9. Online Video Streaming
10. Fraud Detection
 1. Traffic Alerts (Maps): Now, Google Maps is probably THE app we use
whenever we go out and require assistance in directions and traffic. The other
day I was traveling to another city and took the expressway and Maps
suggested: “Despite the Heavy Traffic, you are on the fastest route“. But, How
does it know that?
 Well, It’s a combination of People currently using the service, Historic Data of
that route collected over time and few tricks acquired from other companies.
Everyone using maps is providing their location, average speed, the route in
which they are traveling which in turn helps Google collect massive Data about
the traffic, which makes them predict the upcoming traffic and adjust your
route according to it.
 2. Social Media (Facebook): One of the most common applications of Machine
Learning is Automatic Friend Tagging Suggestions in Facebook or any other
social media platform.
 Facebook uses face detection and Image recognition to automatically find the
face of the person which matches it’s Database and hence suggests us to tag that
person based on DeepFace.
 Facebook’s Deep Learning project DeepFace is responsible for the recognition of
faces and identifying which person is in the picture. It also provides Alt Tags
(Alternative Tags) to images already uploaded on facebook. For eg., if we
inspect the following image on Facebook, the alt-tag has a description.
 3. Transportation and Commuting (Uber): If you have used an app to book a
cab, you are already using Machine Learning to an extent. It provides a
personalized application which is unique to you.
 Automatically detects your location and provides options to either go home or
office or any other frequent place based on your History and Patterns.
 It uses Machine Learning algorithm layered on top of Historic Trip Data to
make a more accurate ETA prediction. With the implementation of Machine
Learning, they saw a 26% accuracy in Delivery and Pickup.

N.KOTESWARA RAO, NEC, GUDUR. Page 3

 4.Products Recommendations: Suppose you check an item on Amazon, but you
do not buy it then and there. But the next day, you’re watching videos on
YouTube and suddenly you see an ad for the same item. You switch to
Facebook, there also you see the same ad. So how does this happen?
 Well, this happens because Google tracks your search history, and recommends
ads based on your search history. This is one of the coolest applications of
Machine Learning. In fact, 35% of Amazon’s revenue is generated by Product
Recommendations.
 5. Virtual Personal Assistants: As the name suggests, Virtual Personal Assistants
assist in finding useful information, when asked via text or voice. Few of the
major applications of Machine Learning here are:
1. Speech Recognition
2. Speech to Text Conversion
3. Natural Language Processing
4. Text to Speech Conversion
 All you need to do is ask a simple question like “What is my schedule for
tomorrow?” or “Show my upcoming Flights“. For answering, your personal
assistant searches for information or recalls your related queries to collect info.
Recently personal assistants are being used in Chatbots which are being
implemented in various food ordering apps, online training websites and also in
Commuting apps.
 6. Self Driving Cars: This is one of the coolest application of Machine Learning.
It’s here and people are already using it. Machine Learning plays a very
important role in Self Driving Cars and I’m sure you guys might have heard
about Tesla. The leader in this business and their current Artificial
Intelligence is driven by hardware manufacturer NVIDIA, which is based on
Unsupervised Learning Algorithm.
 NVIDIA stated that they didn’t train their model to detect people or any object
as such. The model works on Deep Learning and it crowd sources data from all
of its vehicles and its drivers. It uses internal and external sensors which are a

N.KOTESWARA RAO, NEC, GUDUR. Page 4

part of IOT. According to the data gathered by McKinsey, the automotive data
will hold a tremendous value of $750 Billion.
 7. Dynamic Pricing: Setting the right price for a good or service is an old
problem in economic theory. There are a vast amount of pricing strategies that
depend on the objective sought. Be it a movie ticket, a plane ticket or cab fares,
everything is dynamically priced. In recent years, artificial intelligence has
enabled pricing solutions to track buying trends and determine more
competitive product prices.
 How does Uber determine the price of your ride?
 Uber’s biggest uses of Machine Learning comes in the form of surge pricing, a
machine learning model nicknamed as “Geosurge”. If you are getting late for
a meeting and you need to book an Uber in a crowded area, get ready to pay
twice the normal fare. Even for flights, if you are traveling in the festive season
the chances are prices will be twice the original price.
 8.Google Translate:Remember the time when you traveled to a new place and
you find it difficult to communicate with the locals or finding local spots where
everything is written in a different language.
 Well, those days are gone now. Google’s GNMT(Google Neural Machine
Translation) is a Neural Machine Learning that works on thousands of
languages and dictionaries, uses Natural Language Processing to provide the
most accurate translation of any sentence or words. Since the tone of the words
also matters, it uses other techniques like POS Tagging, NER (Named Entity
Recognition) and Chunking. It is one of the best and most used Applications of
Machine Learning.
 9. Online Video Streaming (Netflix): With over 100 million subscribers, there
is no doubt that Netflix is the daddy of the online streaming world. Netflix’s
speedy rise has all movie industrialists taken aback – forcing them to ask, “How
on earth could one single website take on Hollywood?”. The answer is Machine
Learning.
 The Netflix algorithm constantly gathers massive amounts of data about users’
activities like:When you pause, rewind, or fast forward What day you watch

N.KOTESWARA RAO, NEC, GUDUR. Page 5

content (TV Shows on Weekdays and Movies on Weekends) The Date and Time
you watch When you pause and leave content (and if you ever come back)
 The ratings Given (about 4 million per day), Searches (about 3 million per
day)Browsing and Scrolling Behavior and a lot more. They collect this data for
each subscriber they have and use their Recommender System and a lot of
Machine Learning Applications. That’s why they have such a huge customer
retention rate.
 10.Fraud Detection: Experts predict online credit card fraud to soar to a
whopping $32 billion in 2020. That’s more than the profit made by Coca Cola
and JP Morgan Chase combined. That’s something to worry about.
 Fraud Detection is one of the most necessary Applications of Machine Learning.
The number of transactions has increased due to a plethora of payment
channels – credit/debit cards, smart phones, numerous wallets, UPI and much
more.
 At the same time, the amount of criminals has become adept at finding
loopholes. Whenever a customer carries out a transaction – the Machine
Learning model thoroughly x-rays their profile searching for suspicious
patterns.
 In Machine Learning, problems like fraud detection are usually framed as
classification problems.
Python tools used in machine learning:
 Python has an overwhelming number of packages that can be used in a machine
learning setting. The Python machine learning ecosystem can be divided into
three main types of packages, as shown in below figure.
 The first type of package shown in below figure is mainly used in simple tasks
and when data fits into memory. The second type is used to optimize your code
when you’ve finished prototyping and run into speed or memory issues.
 The third type is specific to using Python with big data technologies.

N.KOTESWARA RAO, NEC, GUDUR. Page 6

Figure. Overview of Python packages used during the machine-learning phase
Packages for working with data in memory:
 When prototyping, the following packages can get you started by providing
advanced functionalities with a few lines of code:
 SciPy is a library that integrates fundamental packages often used in scientific
computing such as NumPy, matplotlib, Pandas, and SymPy.
 NumPy gives you access to powerful array functions and linear algebra
functions.
 Matplotlib is a popular 2D plotting package with some 3D functionality.

N.KOTESWARA RAO, NEC, GUDUR. Page 7

 Pandas is a high-performance, but easy-to-use, data-wrangling package. It
introduces dataframes to Python, a type of in-memory data table. It’s a concept
that should sound familiar to regular users of R.
 SymPy is a package used for symbolic mathematics and computer algebra.
 StatsModels is a package for statistical methods and algorithms.
 Scikit-learn is a library filled with machine learning algorithms.
 RPy2 allows you to call R functions from within Python. R is a popular open
source statistics program.
 NLTK (Natural Language Toolkit) is a Python toolkit with a focus on text
analytics.
 These libraries are good to get started with, but once you make the decision to
run a certain Python program at frequent intervals, performance comes into
play.
Optimizing operations:
 Once your application moves into production, the libraries listed here can help
you deliver the speed you need. Sometimes this involves connecting to big data
infrastructures such as Hadoop and Spark.
 Numba and NumbaPro —These use just-in-time compilation to speed up
applications written directly in Python and a few annotations. NumbaPro also
allows you to use the power of your graphics processor unit (GPU).
 PyCUDA —This allows you to write code that will be executed on the GPU
instead of your CPU and is therefore ideal for calculation-heavy applications. It
works best with problems that lend themselves to being parallelized and need
little input compared to the number of required computing cycles. An example is
studying the robustness of your predictions by calculating thousands of different
outcomes based on a single start state.
 Cython, or C for Python —This brings the C programming language to Python. C
is a lower-level language, so the code is closer to what the computer eventually
uses (bytecode). The closer code is to bits and bytes, the faster it executes. A
computer is also faster when it knows the type of a variable (called static typing).

N.KOTESWARA RAO, NEC, GUDUR. Page 8

Python wasn’t designed to do this, and Cython helps you to overcome this
shortfall.
 Blaze —Blaze gives you data structures that can be bigger than your computer’s
main memory, enabling you to work with large data sets.
 Dispy and IPCluster —These packages allow you to write code that can be
distributed over a cluster of computers.
 PP —Python is executed as a single process by default. With the help of PP you
can parallelize computations on a single machine or over clusters.
 Pydoop and Hadoopy —These connect Python to Hadoop, a common big data
framework.
 PySpark —This connects Python and Spark, an in-memory big data framework.

The modeling process:

 The modeling phase consists of four steps:
1. Feature engineering and model selection
2. Training the model
3. Model validation and selection
4. Applying the trained model to unseen data
 Before you find a good model, you will probably iterate among the first three
steps. The last step isn’t always present because sometimes the goal isn’t
prediction but explanation (root cause analysis).
 For instance, you might want to find out the causes of species’ extinctions but not
necessarily predict which one is next in line to leave our planet.
 It’s possible to chain or combine multiple techniques. When you chain multiple
models, the output of the first model becomes an input for the second model.
 When you combine multiple models, you train them independently and combine
their results. This last technique is also known as ensemble learning.
 A model consists of constructs of information called features or predictors and
a target or response variable. The best models are those that accurately represent
reality, preferably while staying concise and interpretable.

N.KOTESWARA RAO, NEC, GUDUR. Page 9

 To achieve this, feature engineering is the most important and arguably most
interesting part of modeling.
1. Engineering features and selecting a model:
 With engineering features, you must come up with and create possible
predictors for the model. This is one of the most important steps in the process
because a model recombines these features to achieve its predictions.
 Often you may need to consult an expert or the appropriate literature to come up
with meaningful features.When the initial features are created, a model can
be trained to the data.
2. Training your model
 With the right predictors in place and a modeling technique in mind, you can
progress to model training. In this phase you present to your model data from
which it can learn.
 The most common modeling techniques have industry-ready implementations in
almost every programming language, including Python.
 These enable you to train your models by executing a few lines of code. For more
state-of-the art data science techniques, you’ll probably end up doing heavy
mathematical calculations and implementing them with modern computer
science techniques.
 Once a model is trained, it’s time to test whether it can be extrapolated to reality:
model validation.
3. Validating a model
 Data science has many modeling techniques, and the question is which one is the
right one to use. A good model has two properties: it has good predictive power
and it generalizes well to data it hasn’t seen. To achieve this you define an error
measure (how wrong the model is) and a validation strategy.
 Two common error measures in machine learning are the classification error
rate for classification problems and the mean squared error for regression
problems.
 The classification error rate is the percentage of observations in the test data set
that your model mislabeled; lower is better. The mean squared error measures

N.KOTESWARA RAO, NEC, GUDUR. Page 10

how big the average error of your prediction is. Squaring the average error has
two consequences: you can’t cancel out a wrong prediction in one direction with
a faulty prediction in the other direction.
 Many validation strategies exist, including the following common ones:
 Dividing your data into a training set with X% of the observations and keeping
the rest as a holdout data set (a data set that’s never used for model creation)—
This is the most common technique.
 K-folds cross validation —This strategy divides the data set into k parts and uses
each part one time as a test data set while using the others as a training data set.
This has the advantage that you use all the data available in the data set.
 Leave-1 out —This approach is the same as k-folds but with k=1. You always
leave one observation out and train on the rest of the data. This is used only on
small data sets, so it’s more valuable to people evaluating laboratory experiments
than to big data analysts.
 Validation is extremely important because it determines whether your model
works in real-life conditions.
 Once you have constructed a good model, you can (optionally) use it to predict
the future.
.4. Predicting new observations:
 If you have implemented the first three steps successfully, you now have a
performant model that generalizes to unseen data. The process of applying your
model to new data is called model scoring.
 In fact, model scoring is something you implicitly did during validation, only
now you don’t know the correct outcome.
 Model scoring involves two steps. First, you prepare a data set that has features
exactly as defined by your model. This boils down to repeating the data
preparation you did in step one of the modeling process but for a new data set.
Then you apply the model on this new data set, and this result in a prediction.

N.KOTESWARA RAO, NEC, GUDUR. Page 11

Types of machine leaning:
 Machine learning is a subset of AI, which enables the machine to automatically
learn from data, improve performance from past experiences, and make
predictions.
 Machine learning contains a set of algorithms that work on a huge amount of
data. Data is fed to these algorithms to train them, and on the basis of training,
they build the model & perform a specific task.
 These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
 Based on the methods and way of learning, machine learning is divided into
mainly four types, which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

N.KOTESWARA RAO, NEC, GUDUR. Page 12

1. Supervised Machine Learning:
 As its name suggests, supervised machine learning is based on supervision. It
means in the supervised learning technique, we train the machines using the
"labelled" dataset, and based on the training, the machine predicts the output.
 Here, the labelled data specifies that some of the inputs are already mapped to
the output. More preciously, we can say; first, we train the machine with the
input and corresponding output, and then we ask the machine to predict the
output using the test dataset.
 Let's understand supervised learning with an example. Suppose we have an input
dataset of cats and dog images. So, first, we will provide the training to the
machine to understand the images, such as the shape & size of the tail of cat and
dog, Shape of eyes, colour, height (dogs are taller, cats are smaller), etc.
 After completion of training, we input the picture of a cat and ask the machine to
identify the object and predict the output. Now, the machine is well trained, so it
will check all the features of the object, such as height, shape, colour, eyes, ears,
tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the
process of how the machine identifies the objects in Supervised Learning.
 The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y). Some real-world applications of
supervised learning are Risk Assessment, Fraud Detection, Spam filtering, etc.
Categories of Supervised Machine Learning:
 Supervised machine learning can be classified into two types of problems, which
are given below:
1.Classification
2.Regression
a) Classification:
 Classification algorithms are used to solve the classification problems in which
the output variable is categorical, such as "Yes" or No, Male or Female, Red or
Blue, etc.The classification algorithms predict the categories present in the
dataset. Some real-world examples of classification algorithms are Spam
Detection, Email filtering, etc.

N.KOTESWARA RAO, NEC, GUDUR. Page 13

Some popular classification algorithms are given below:
1. Random Forest Algorithm
2. Decision Tree Algorithm
3. Logistic Regression Algorithm
4. Support Vector Machine Algorithm
b) Regression
 Regression algorithms are used to solve regression problems in which there is a
linear relationship between input and output variables. These are used to
predict continuous output variables, such as market trends, weather prediction,
etc.
Some popular Regression algorithms are given below:
1. Simple Linear Regression Algorithm
2. Multivariate Regression Algorithm
3. Decision Tree Algorithm
4. Lasso Regression
Advantages and Disadvantages of Supervised Learning:
Advantages:
1. Since supervised learning work with the labelled dataset so we can have an
exact idea about the classes of objects.
2. These algorithms are helpful in predicting the output on the basis of prior
experience.
Disadvantages:
1. These algorithms are not able to solve complex tasks.
2. It may predict the wrong output if the test data is different from the
training data.
3. It requires lots of computational time to train the algorithm.
2. Unsupervised Machine Learning:
 Unsupervised learning is different from the Supervised learning technique; as
its name suggests, there is no need for supervision. It means, in unsupervised
machine learning, the machine is trained using the unlabeled dataset, and the
machine predicts the output without any supervision.

N.KOTESWARA RAO, NEC, GUDUR. Page 14

 In unsupervised learning, the models are trained with the data that is neither
classified nor labelled, and the model acts on that data without any supervision.
 The main aim of the unsupervised learning algorithm is to group or categories
the unsorted dataset according to the similarities, patterns, and
differences. Machines are instructed to find the hidden patterns from the input
dataset.
 Let's take an example to understand it more preciously; suppose there is a basket
of fruit images, and we input it into the machine learning model. The images
are totally unknown to the model, and the task of the machine is to find the
patterns and categories of the objects.
 So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the
test dataset.
Categories of Unsupervised Machine Learning:
 Unsupervised Learning can be further classified into two types, which are given
below:
1. Clustering
2.Association
1) Clustering:
 The clustering technique is used when we want to find the inherent groups
from the data. It is a way to group the objects into a cluster such that the objects
with the most similarities remain in one group and have fewer or no similarities
with the objects of other groups. An example of the clustering algorithm is
grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:
1. K-Means Clustering algorithm
2. Mean-shift algorithm
3. DBSCAN Algorithm
4. Principal Component Analysis
5. Independent Component Analysis

N.KOTESWARA RAO, NEC, GUDUR. Page 15

2) Association
 Association rule learning is an unsupervised learning technique, which finds
interesting relations among variables within a large dataset. The main aim of
this learning algorithm is to find the dependency of one data item on another
data item and map those variables accordingly so that it can generate maximum
profit. This algorithm is mainly applied in Market Basket analysis, Web usage
mining, continuous production, etc.
 Some popular algorithms of Association rule learning are Apriori Algorithm,
Eclat, FP-growth algorithm.
Advantages and Disadvantages of Unsupervised Learning Algorithm
Advantages:
1. These algorithms can be used for complicated tasks compared to the
supervised ones because these algorithms work on the unlabeled
dataset.
2. Unsupervised algorithms are preferable for various tasks as getting the
unlabeled dataset is easier as compared to the labelled dataset.
Disadvantages:
1. The output of an unsupervised algorithm can be less accurate as the
dataset is not labelled, and algorithms are not trained with the exact
output in prior.
2. Working with Unsupervised learning is more difficult as it works with
the unlabelled dataset that does not map with the output.
3. Semi-Supervised Learning:
 Semi-Supervised learning is a type of Machine Learning algorithm that lies
between Supervised and Unsupervised machine learning. It represents the
intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.
 Although Semi-supervised learning is the middle ground between supervised
and unsupervised learning and operates on the data that consists of a few

N.KOTESWARA RAO, NEC, GUDUR. Page 16

labels, it mostly consists of unlabeled data. As labels are costly, but for corporate
purposes, they may have few labels. It is completely different from supervised
and unsupervised learning as they are based on the presence & absence of
labels.
 To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced. The main
aim of semi-supervised learning is to effectively use all the available data,
rather than only labelled data like in supervised learning.
 Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabeled data into labelled data. It
is because labelled data is a comparatively more expensive acquisition than
unlabeled data.
 We can imagine these algorithms with an example. Supervised learning is
where a student is under the supervision of an instructor at home and college.
Further, if that student is self-analysing the same concept without any help
from the instructor, it comes under unsupervised learning. Under semi-
supervised learning, the student has to revise himself after analyzing the same
concept under the guidance of an instructor at college.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
1. It is simple and easy to understand the algorithm.
2. It is highly efficient.
3. It is used to solve drawbacks of Supervised and Unsupervised Learning
algorithms.
Disadvantages:
1. Iterations results may not be stable.
2. We cannot apply these algorithms to network-level data.
3. Accuracy is low.

N.KOTESWARA RAO, NEC, GUDUR. Page 17

4. Reinforcement Learning:
 Reinforcement learning works on a feedback-based process, in which an AI
agent (A software component) automatically explore its surrounding by hitting
& trail, taking action, learning from experiences, and improving its
performance. Agent gets rewarded for each good action and get punished for
each bad action; hence the goal of reinforcement learning agent is to maximize
the rewards.
 In reinforcement learning, there is no labelled data like supervised learning,
and agents learn from their experiences only.
 The reinforcement learning process is similar to a human being; for example, a
child learns various things by experiences in his day-to-day life. An example of
reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a
high score. Agent receives feedback in terms of punishment and rewards.
 Due to its way of working, reinforcement learning is employed in different
fields such as Game theory, Operation Research, Information theory, multi-
agent systems.
 A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment
and performs actions; at each action, the environment responds and generates a
new state.
Categories of Reinforcement Learning
Reinforcement learning is categorized mainly into two types of methods/algorithms:
 Positive Reinforcement Learning: Positive reinforcement learning specifies
increasing the tendency that the required behaviour would occur again by
adding something. It enhances the strength of the behaviour of the agent and
positively impacts it.
 Negative Reinforcement Learning: Negative reinforcement learning works
exactly opposite to the positive RL. It increases the tendency that the specific
behaviour would occur again by avoiding the negative condition.

N.KOTESWARA RAO, NEC, GUDUR. Page 18

Real-world Use cases of Reinforcement Learning
1. VideoGames: RL algorithms are much popular in gaming applications. It is used
to gain super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
2. Resource Management:The "Resource Management with Deep Reinforcement
Learning" paper showed that how to use RL in computer to automatically learn
and schedule resources to wait for different jobs in order to minimize average
job slowdown.
3. Robotics:
RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more powerful
with reinforcement learning. There are different industries that have their
vision of building intelligent robots using AI and Machine learning technology.
4. Text Mining:
Text-mining, one of the great applications of NLP, is now being implemented
with the help of Reinforcement Learning by Salesforce company.
Advantages and Disadvantages of Reinforcement Learning
Advantages
5. It helps in solving complex real-world problems which are difficult to
be solved by general techniques.
6. The learning model of RL is similar to the learning of human beings;
hence most accurate results can be found.
7. Helps in achieving long term results.
Disadvantage
1. RL algorithms are not preferred for simple problems.
2. RL algorithms require huge data and computations.
3. Too much reinforcement learning can lead to an overload of states
which can weaken the results.

N.KOTESWARA RAO, NEC, GUDUR. Page 19

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
RapidMiner Minibook
No ratings yet
RapidMiner Minibook
121 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Soft Computing UNIT 3
No ratings yet
Soft Computing UNIT 3
10 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Ontology Engineering PDF
No ratings yet
Ontology Engineering PDF
25 pages
BDA Lab ManuaL[1]
No ratings yet
BDA Lab ManuaL[1]
83 pages
CS 3 - Problem Solving Agent
No ratings yet
CS 3 - Problem Solving Agent
80 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
63 pages
Krr Unit i Notes
No ratings yet
Krr Unit i Notes
32 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
Lab Manual C AIDS - 2
No ratings yet
Lab Manual C AIDS - 2
50 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
CP4252 Machine Learning lab manual
No ratings yet
CP4252 Machine Learning lab manual
37 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ml unit 2
No ratings yet
ml unit 2
23 pages
Expression Tree
No ratings yet
Expression Tree
18 pages
AI_UNIT - 3-1
No ratings yet
AI_UNIT - 3-1
31 pages
CP5191 Machine Learning Techniques L T P C3 0 0 3
No ratings yet
CP5191 Machine Learning Techniques L T P C3 0 0 3
7 pages
Module 2 Principle of AI
No ratings yet
Module 2 Principle of AI
15 pages
Revised CS8383 (Eee) Oop Lab Man
No ratings yet
Revised CS8383 (Eee) Oop Lab Man
85 pages
UE20CS332 Unit2 Slides PDF
No ratings yet
UE20CS332 Unit2 Slides PDF
264 pages
New Advances in Machine Learning: ISBN 978-953-307-034-6
No ratings yet
New Advances in Machine Learning: ISBN 978-953-307-034-6
378 pages
Java Lab Manual r23 You Can Get Basic Information by Reading The Document
No ratings yet
Java Lab Manual r23 You Can Get Basic Information by Reading The Document
41 pages
Unit-IV AI
No ratings yet
Unit-IV AI
33 pages
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
38 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Neuro Fuzzy Systems
100% (1)
Neuro Fuzzy Systems
27 pages
Distributed File Systems
No ratings yet
Distributed File Systems
75 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
18AI61
No ratings yet
18AI61
3 pages
deep-learning-r18-jntuh-lab-manual
No ratings yet
deep-learning-r18-jntuh-lab-manual
20 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
CS3591 Computer Networks Lab manual finalized (3)
No ratings yet
CS3591 Computer Networks Lab manual finalized (3)
67 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
DL Unit-2
No ratings yet
DL Unit-2
31 pages
__7 Market Segmentation 3 Data Analysis(1) (1)
No ratings yet
__7 Market Segmentation 3 Data Analysis(1) (1)
33 pages
9781000867824
No ratings yet
9781000867824
140 pages
Abstract Diploma Thesis Example
100% (3)
Abstract Diploma Thesis Example
5 pages
Canfis
No ratings yet
Canfis
12 pages
Dendrograms & PFGE Analysis
No ratings yet
Dendrograms & PFGE Analysis
28 pages
Lesson 4.1 - Unsupervised Learning Partitioning Methods
No ratings yet
Lesson 4.1 - Unsupervised Learning Partitioning Methods
32 pages
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
100% (2)
A Systematic Literature Review On Fault Prediction Performance in Software Engineering
7 pages
CCE-Data Mining
No ratings yet
CCE-Data Mining
4 pages
ML Visualization NeurIPS Tutorial
No ratings yet
ML Visualization NeurIPS Tutorial
145 pages
Segmentation White Paper Final 111505
No ratings yet
Segmentation White Paper Final 111505
7 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
CORPORATE GOVERNANCE Article Review
No ratings yet
CORPORATE GOVERNANCE Article Review
6 pages
Intel Technology Journal
No ratings yet
Intel Technology Journal
14 pages
Wa0036.
No ratings yet
Wa0036.
32 pages
Potential Fishing Zone Estimation by Rough Cluster Predictions
No ratings yet
Potential Fishing Zone Estimation by Rough Cluster Predictions
6 pages
AI tools
No ratings yet
AI tools
16 pages
THE Communication of Mental Health Information: A Comparison Opinions of Experts and Public With Mass Media Presentations'
No ratings yet
THE Communication of Mental Health Information: A Comparison Opinions of Experts and Public With Mass Media Presentations'
9 pages
Fast IR Drop Estimation With Machine Learning Invited Paper
No ratings yet
Fast IR Drop Estimation With Machine Learning Invited Paper
8 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Deep Clustering Based On Embedded Auto Encoder
No ratings yet
Deep Clustering Based On Embedded Auto Encoder
16 pages
Formative Assignment 2 - Coding Exercise-2
No ratings yet
Formative Assignment 2 - Coding Exercise-2
3 pages
B.tech. Open Elective III & IV List 4th Year VIII Semester 2021-22
100% (1)
B.tech. Open Elective III & IV List 4th Year VIII Semester 2021-22
26 pages
Applied Energy: Sciencedirect
No ratings yet
Applied Energy: Sciencedirect
19 pages
Research paper
No ratings yet
Research paper
5 pages
K Product Recommender System
No ratings yet
K Product Recommender System
45 pages
1994 2006
No ratings yet
1994 2006
148 pages
Cz4041 1a Introduction
No ratings yet
Cz4041 1a Introduction
55 pages
Density
No ratings yet
Density
3 pages
Vosviewer Manual: Nees Jan Van Eck and Ludo Waltman 3 April 2019
No ratings yet
Vosviewer Manual: Nees Jan Van Eck and Ludo Waltman 3 April 2019
53 pages

Unit-5 DS Notes

Uploaded by

Unit-5 DS Notes

Uploaded by

UNIT-5

Syllabus: Importance of Machine learning in Data Science

Introduction to machine learning:

 Features of Machine Learning: Machine learning uses data to detect various

N.KOTESWARA RAO, NEC, GUDUR. Page 1

N.KOTESWARA RAO, NEC, GUDUR. Page 2

N.KOTESWARA RAO, NEC, GUDUR. Page 3

N.KOTESWARA RAO, NEC, GUDUR. Page 4

N.KOTESWARA RAO, NEC, GUDUR. Page 5

N.KOTESWARA RAO, NEC, GUDUR. Page 6

N.KOTESWARA RAO, NEC, GUDUR. Page 7

N.KOTESWARA RAO, NEC, GUDUR. Page 8

The modeling process:

N.KOTESWARA RAO, NEC, GUDUR. Page 9

N.KOTESWARA RAO, NEC, GUDUR. Page 10

N.KOTESWARA RAO, NEC, GUDUR. Page 11

N.KOTESWARA RAO, NEC, GUDUR. Page 12

N.KOTESWARA RAO, NEC, GUDUR. Page 13

N.KOTESWARA RAO, NEC, GUDUR. Page 14

N.KOTESWARA RAO, NEC, GUDUR. Page 15

N.KOTESWARA RAO, NEC, GUDUR. Page 16

N.KOTESWARA RAO, NEC, GUDUR. Page 17

N.KOTESWARA RAO, NEC, GUDUR. Page 18

N.KOTESWARA RAO, NEC, GUDUR. Page 19

You might also like