0% found this document useful (0 votes)

49 views21 pages

Unit 1

Machine learning (ML) is a discipline of artificial intelligence that allows machines to learn from data and past experiences to identify patterns and make predictions with minimal human intervention. ML helps build automated systems that can learn by themselves and enhance their performance over time without human guidance. ML has many applications including in the automotive, robotics, computer vision, healthcare, and finance industries. Common ML algorithms used include linear regression, logistic regression, and neural networks. Probability distributions and linear algebra are important foundations for ML.

Uploaded by

read4free

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views21 pages

Unit 1

Uploaded by

read4free

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

What Is Machine Learning?

Machine learning (ML) is a discipline of artificial intelligence (AI) that provides machines
with the ability to automatically learn from data and past experiences while identifying
patterns to make predictions with minimal human intervention.
Machine Learning helps to build automated systems that can learn by themselves. Then, the
system enhances their performance by learning from experience without any human
intervention.

Scope of Machine Learning

i. Automotive Industry
The automotive industry is one of the areas where Machine Learning is excelling by
changing the definition of ‘safe’ driving. There are a few major companies such as Google,
Tesla, Mercedes Benz, Nissan, etc. that have invested hugely in Machine Learning to come
up with novel innovations.

ii. Robotics
Robotics is one of the fields that always gain the interest of researchers as well as the
common. Researchers all over the world are still working on creating robots that mimic the
human brain. They are using neural networks, AI, ML, computer vision, and many other
technologies in this research.
iii. Computer Vision

As the name suggests, computer vision gives a vision to a computer or a machine. Giving
the ability to a machine to recognize and analyze images, videos, graphics, etc. is the goal of
computer vision.

iv. Healthcare industry

Machine learning is being increasingly adopted in the healthcare industry, credit to wearable
devices and sensors such as wearable fitness trackers, smart health watches, etc. All such
devices monitor users’ health data to assess their health in real-time.

v. Finance sector

Today, several financial organizations and banks use machine learning technology to tackle
fraudulent activities and draw essential insights from vast volumes of data.

Regression
Regression is basically a statistical approach to find the relationship between variables. In
machine learning, it is used to predict the outcome of an event based on the relationship
between variables obtained from the data-set.

1. Linear Regression
Linear regression is the most simple and popular technique for predicting a continuous
variable. It assumes a linear relationship between the outcome and the predictor variables.
In linear regression, the objective is to fit a hyper-plane (a line for 2D data points) by
minimizing the sum of mean-squared error for each data point.
The linear regression equation can be written as y = b0 + b*x + e
 b0 is the intercept,
b is the slop or regression weight or coefficient associated with the predictor variable x.
e is the residual error
Technically, the linear regression coefficients are detetermined so that the error in predicting
the outcome value is minimized. This method of computing the beta coefficients is called the
Ordinary Least Squares Method (LSM).

2. Multiple Linear Regression (MLR)

For the given equation of Linear Regression,

If there is more than 1 predictor available then it is known as Multiple Linear Regression.
The equation for MLR will be:

β1 = coefficient for X 1 variable

β2 = coefficient for X 2 variable
β3 = coefficient for X 3 variable and so on…
β0 is the intercept (constant term). While doing the prediction, there is an error term which
is associated with the equation.

3. Polynomial Regression
Linear regression assumes that the relationship between the dependant (y) and independent
(x) variables are linear. It fails to fit the data points when the relationship between them is
not linear. Polynomial regression expands the fitting capabilities of linear regression by
fitting a polynomial of degree n to the data points instead.
The equation of polynomial becomes something like this.
y = a0 + a1x1 + a2x12 + … + anx1n
For lower degrees, the relationship has a specific name (i.e., n = 2 is called quadratic, h = 3
is called cubic and so on).

4. Logistic Regression
Logistic Regression is a Machine Learning algorithm which is used for the classification
problems; it is a predictive analysis algorithm and based on the concept of probability.
Logistic regression is generally used where we have to classify the data into two or more
classes. One is binary and the other is multi-class logistic regression.
Logistic Regression is a classification algorithm for categorical variables like Yes/No,
True/False, 0/1, etc.”
Logistic (inverse-logit) is a strategy to map infinitely stretching space (-inf, inf) to a
probability space of [0,1].

1. LINEAR ALGEBRA
Linear algebra is a sub-field of mathematics concerned with vectors, matrices, and linear
transforms.
A vector is an array of numbers. A vector has magnitude and direction.

Examples of Linear Algebra in Machine Learning

Below are some popular examples of linear algebra in Machine learning:

 Datasets and Data Files

 Linear Regression
 Recommender Systems
 One-hot encoding
 Regularization
 Principal Component Analysis
 Images and Photographs
 Singular-Value Decomposition
 Deep Learning
 Latent Semantic Analysis

Probability and Statistics for Machine Learning

Whenever we work on a project that uses a machine-learning algorithm, there are two
significant steps involved. The first one is to understand the dataset, and this is where you
require knowledge of statistics. The second is predicting the probability of an event, for
example, estimating how likely a patient will have diabetes based on the information
received from their medical tests. Thus, this suggests how significant probability and
statistics are for machine learning.
Probability
Probability denotes the possibility of something happening. Probability simply talks about
how likely is the event to occur, and its value always lies between 0 and 1.

Probability Distributions
Distributions are an integral part of Machine learning as it helps to analyze the data.
Probability distributions are simply a collection of data (or scores) of a particular random
variable. Usually, these collections of data are arranged in some order and can be presented
graphically.
A probability distribution is a statistical function that describes all the possible values
and probabilities for a random variable within a given range. This range will be bound
by the minimum and maximum possible values, but where the possible value would be
plotted on the probability distribution will be determined by a number of
Characteristics.

Distribution Characteristics
Data distributions have different shapes; the data set used to draw the distribution defines
the distribution’s shape. We can describe each distribution using three characteristics: the
mean, the variance and the standard deviation. These characteristics can tell us different
things about the distribution’s shape and behaviour.

i. Mean
The mean (μ) is simply the average of a data set. For example, if we have a set of discrete
data {4,7,6,3,1}, the mean is 4.2.

The mean (μ) is simply the average of a data set. For example, if we have a set of discrete
data {4,7,6,3,1}, the mean is 4.2.
ii. Variance

The variance (var(X)) is the average of the squared differences from the mean. For example,
if we have the same data set from before {4,7,6,3,1}, then the variance will be 5.7.

iii. Standard Deviation

The standard deviation (σ) is a measure of how spread out the numbers is in a data set. So, a
small standard deviation indicates that the values are closer to each other, while a large
standard deviation indicates the data set values are spread out.

4 Common Probability Distributions for Machine Learning

1. Normal (Gaussian) distribution

2. Binomial distribution
3. Uniform distribution
4. Poisson distribution
1. Normal Distribution

Gaussian distribution / Normal distribution is famous for its bell-like shape, and it’s one of
the most commonly used distributions in ML and data science.

The key characteristics of the normal distribution are:

 The curve is symmetric at the centre, which means it can be divided into two even
sections around the mean.

 Because the normal distribution is a probability distribution, the area under the
distribution curve is equal to one.

x =random continuous Variable

σ =standard deviation and σ²=variance

μ=mean

if suppose: μ=0 and σ²=1

then this equation will become in the form of (ignoring the constant terms)

y=exp(-x²) ,which looks like a bell curve

2. Binomial Distribution

A binomial experiment is a statistical experiment, where a binomial random variable is the

number of successes (x) in repeated trials of a binomial experiment (n). The probability
distribution of a binomial random variable is called a binomial distribution.

Conditions of the Binomial Distribution

1. It is made of independent trials.

2. Each trial can be classified as either success or failure, where the probability of
success is p while the probability of failure is 1-p.

3. It has a fixed number of trials (n).

4. The probability of success in each trial is constant.

It is the discrete probabilistic distribution with parameters n and p, where n is number of

trails (independent trails) and p is the success probability of each trial, where each variable
has Boolean valued outcome (Yes i.e Probability = p / No i.e Probability = 1-p).
Notation: X~B(n,p)
Here, Probability Mass Function for a random variable X is defined when
n € N(population size) and p €[0,1]

where, k success occurs with p^k probability and n-k failures with (1 − p)n − k probability
Binomial coefficient

3. Uniform Distribution
In statistics, uniform distribution refers to a type of probability distribution in which all
outcomes are equally likely. A deck of cards has within it uniform distributions because the
likelihood of drawing a heart, a club, a diamond, or a spade is equally likely

A uniform distribution, also called a rectangular distribution, is a probability

distribution that has a constant probability, such as flipping a coin or rolling dice. This
distribution has two types. The most common type in elementary statistics is the continuous
uniform distribution (which forms the shape of a rectangle). The second type is the discrete
uniform distribution.

3.1 Discrete Uniform Distribution:

Probability Mass Function (which is calculated for Discrete Random Variable), is defined as
the when finite number of variables are equally likely to be observed, where every variable
has probability of 1/n where n is no of finite values.
Notation: U{a,b}or unif{a,b}
where ,
b≥a and n=b-a+1
so,here n=5 where a=2 and b=6 and each variable has probability of 1/6.

3.2 Uniform Continuous Distribution:

Probability density Function (which is calculated for Continuous Random Variable), is
defined as a symmetric Probabilistic distribution where random variables in the interval are
equally probable having probability of 1/(b-a).
Notation:U(a,b)
Where a and b are defined as minimum and maximum values
4. Poisson Distribution

The Poisson distribution is a discrete distribution that measures the probability of a given
number of events happening in a specified time period.

The Poisson random variable satisfies the following conditions:

1. The number of successes in two disjointed time intervals is independent, and the
average of successes is μ.
2. The probability of success during a small time interval is proportional to the entire
length of the time interval.

The probability mass function of random variable X is given by:

P(X = x) = e-(𝜇x/x!),

where 𝜇 is the mean number of events and x is the number of events in that interval.
Poisson distribution

HYPOTHESIS
Hypothesis (h)
Hypothesis is some event which may have the probability of happening or not happening.
Hypothesis is described as a recommended solution for an incident which doesn’t into
current theory.
Hypothesis is a provisional idea that requires some evaluations.

Hypothesis Space (H)

Hypothesis space is the set of the entire possible hypothesis. It is a set of solutions from
which the machine learning algorithm determines the best possible approach.

Types
 Null Hypothesis (H0): No difference between ‘Results’ and ‘Assumption’.
 Alternative Hypothesis (HA): Results disapproves ‘Assumptions’.

Level of Significance:
It refers to the degree of significance in which we accept or reject the null-hypothesis. It is
the value that is the base of selection between Null and Alternative Hypothesis.
100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select
a level of significance that is usually 5%.
This is normally denoted with alpha (maths symbol) and generally it is 0.05 or 5% , which
means your output should be 95% confident to give similar kind of result in each sample.

If Theoretical_Data < Observed_Data

Null Hypothesis is selected
else
Alternative Hypothesis is selected.

P-value :- The P value, or calculated probability, is the probability of finding the observed
results when the null hypothesis (H0) of a study question is true.
If your P value is less than the chosen significance level then you reject the null hypothesis.

Hypothesis Testing

What is hypothesis testing?

Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the
population parameter.

All the assumption needs some statistic way to prove those. We need some mathematical
conclusion whatever we are assuming is true.

Hypothesis Test Methods:

We choose the type of test statistic based on the predictor variable – quantitative or
categorical. Below are a few of the commonly used test statistics for quantitative data:

Z-statistic – Z Test
Z-statistic is used when the sample follows a normal distribution. It is calculated based on
the population parameters like mean and standard deviation.
One sample Z test is used when we want to compare a sample mean with a population mean.
Two sample Z test is used when we want to compare the mean of two samples.

T-statistic – T-Test
T-statistic is used when the sample follows a T distribution and population parameters are
unknown. T distribution is similar to a normal distribution; it is shorter than normal
distribution and has a flatter tail.
If the sample size is less than 30 and population parameters are not known, we use T
distribution. Here also, we can use one Sample T-test and a two-sample T-test.

F-statistic – F test
For samples involving three or more groups, we prefer the F Test. Performing T-test on
multiple groups increases the chances of Type-1 error. ANOVA is used in such cases.

Analysis of variance (ANOVA) can determine whether the means of three or more groups are
different. ANOVA uses F-tests to statistically test the equality of means.

F-statistic is used when the data is positively skewed and follows an F distribution. F
distributions are always positive and skewed right.

F = Variation between the sample means/variation within the samples

For negatively skewed data we would need to perform feature transformation

Chi-Square Test
For categorical variables, we would be performing a chi-Square test.

Following are the two types of chi-squared tests:

1. Chi-squared test of independence – We use the Chi-Square test to determine whether

or not there is a significant relationship between two categorical variables.
2. Chi-squared Goodness of fit helps us determine if the sample data correctly represents
the population.

Chi-Square Test example: In an election survey, voters might be classified by gender (male
or female) and voting preference (Democrat, Republican, or Independent). We could use a
chi-square test for independence to determine whether gender is related to voting preference.

CONVEX OPTIMIZATION
Convex Optimization is one of the most important techniques in the field of mathematical programming,
which has many applications. It also has much broader applicability beyond mathematics to disciplines like
Machine learning, data science, economics, medicine, and engineering.

The objective function is subjected to equality constraints and inequality constraints. Inequality
constraints indicate that the solution should lie in some range whereas equality constraint requires it to lie
exactly at a given point.

Convexity plays an important role in convex optimizations. Convexity is defined as the continuity of a
convex function’s first derivative. It ensures that convex optimization problems are smooth and have well-
defined derivatives to enable the use of gradient descent.

For convexity, convex sets are the most important. A convex set is a set that contains all points on or inside
its boundary and contains all convex combinations of points in its interior. A convex set is defined as a set of
all convex functions. Simply speaking, the convex function has a shape that is like a hill.

A convex optimization problem is thus to find the global maximum or minimum of convex function.
Convex sets are often used in convex optimization techniques because convex sets can be manipulated
through certain types of operations to maximize or minimize a convex function.

Machine Learning Fundamentals A Concise Introduction by Hui Jiang
No ratings yet
Machine Learning Fundamentals A Concise Introduction by Hui Jiang
423 pages
Probability and Statistics For Machine Learning - A Textbook
No ratings yet
Probability and Statistics For Machine Learning - A Textbook
530 pages
Customer Churn Data - A Project Based On Logistic Regression
100% (12)
Customer Churn Data - A Project Based On Logistic Regression
31 pages
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
100% (1)
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
949 pages
Question Bank - Machine Learning (Repaired)
100% (1)
Question Bank - Machine Learning (Repaired)
78 pages
Logistic Regression
0% (1)
Logistic Regression
49 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
DJ 14 Ai&ds 3
No ratings yet
DJ 14 Ai&ds 3
20 pages
Prob Toc
No ratings yet
Prob Toc
12 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
112 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
AI Week 14
No ratings yet
AI Week 14
3 pages
Toc 1
No ratings yet
Toc 1
17 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 2 - Machine Learning - WWW - Rgpvnotes.in PDF
10 pages
Build ETL Using Python
No ratings yet
Build ETL Using Python
7 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Lecture1 Intro ML
No ratings yet
Lecture1 Intro ML
60 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Probability and Statistics For Machine Learning A Textbook Charu
No ratings yet
Probability and Statistics For Machine Learning A Textbook Charu
854 pages
What Are Data Distributions, and Why Are They Important
No ratings yet
What Are Data Distributions, and Why Are They Important
4 pages
Tutorial 3
No ratings yet
Tutorial 3
30 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
ML 3
No ratings yet
ML 3
14 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
ML
No ratings yet
ML
22 pages
Machine Learning
100% (3)
Machine Learning
46 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Statss
No ratings yet
Statss
4 pages
Handbook Feb23
No ratings yet
Handbook Feb23
377 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Statistical Foundations of Machine Learning: The Handbook
No ratings yet
Statistical Foundations of Machine Learning: The Handbook
364 pages
Da One
No ratings yet
Da One
53 pages
Statistics For Machine Learning
No ratings yet
Statistics For Machine Learning
28 pages
Chapter 5 - Machine Learning
No ratings yet
Chapter 5 - Machine Learning
59 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
Proba&Stats For ML TelParis
No ratings yet
Proba&Stats For ML TelParis
17 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
No ratings yet
Machine Learning Techniques - Types of Machine Learning - Applications Mathematical Foundations of Machine Learning
15 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Mml-Book (1) Removed
No ratings yet
Mml-Book (1) Removed
371 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
No ratings yet
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
54 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Probability Is A Branch of Mathematics That Deals With Measuring The Likelihood of Events
No ratings yet
Probability Is A Branch of Mathematics That Deals With Measuring The Likelihood of Events
34 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
CS-601 Machine Learning Unit-1 New
No ratings yet
CS-601 Machine Learning Unit-1 New
70 pages
DS ML CompleteSlides PDF
No ratings yet
DS ML CompleteSlides PDF
211 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
17 pages
Boss
No ratings yet
Boss
13 pages
ML Academy - Part II
No ratings yet
ML Academy - Part II
8 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Unit 1 (DS)
No ratings yet
Unit 1 (DS)
15 pages
Maths Roadmap For Machine Learning - Statistics
No ratings yet
Maths Roadmap For Machine Learning - Statistics
5 pages
Probability and Statistics For ML - Cwa
No ratings yet
Probability and Statistics For ML - Cwa
822 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
ML 1
No ratings yet
ML 1
35 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
ML Super Imp
No ratings yet
ML Super Imp
19 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Unit 1
No ratings yet
Unit 1
17 pages
ML 5
No ratings yet
ML 5
20 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
Super Important ML
No ratings yet
Super Important ML
17 pages
Super Important ML
No ratings yet
Super Important ML
16 pages
Application of Bio Statistics in Pharmacy
33% (3)
Application of Bio Statistics in Pharmacy
13 pages
Investigators Id No: Kuri Mohammedsafi 1204536 Abdulehi Seid 1104224 Senayt Mosewa 1201281 Hana Tefera 1204151
No ratings yet
Investigators Id No: Kuri Mohammedsafi 1204536 Abdulehi Seid 1104224 Senayt Mosewa 1201281 Hana Tefera 1204151
64 pages
TB Proposal Sis Asgedom
No ratings yet
TB Proposal Sis Asgedom
36 pages
Academic Selfd-Efficacy, Self-Esteem, and Grit in Higher Online Education (... ) (Neroni, J. E.A. Social Psycholgoy of Education (2022) 25
No ratings yet
Academic Selfd-Efficacy, Self-Esteem, and Grit in Higher Online Education (... ) (Neroni, J. E.A. Social Psycholgoy of Education (2022) 25
25 pages
Araújo Et Al (2010) - Bottom-Up Effects On Selection of Trees by Termites
No ratings yet
Araújo Et Al (2010) - Bottom-Up Effects On Selection of Trees by Termites
10 pages
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
100% (8)
Analysis of Longitudinal Data (Oxford Statistical Science Series), 2nd Edition Accessible DOCX Download
15 pages
Challenges For Developing Tourism in Awi Zone
100% (2)
Challenges For Developing Tourism in Awi Zone
9 pages
The Diffusion of Online Shopping in Australia: Comparing The Bass, Logistic and Gompertz Growth Models
No ratings yet
The Diffusion of Online Shopping in Australia: Comparing The Bass, Logistic and Gompertz Growth Models
12 pages
Probit and Logit-Madesh
No ratings yet
Probit and Logit-Madesh
22 pages
Applications of Machine Learning For Prediction of Liver Disease
No ratings yet
Applications of Machine Learning For Prediction of Liver Disease
3 pages
MoE S Model Exit Exam Solution (Economics) July 04, 2023
No ratings yet
MoE S Model Exit Exam Solution (Economics) July 04, 2023
118 pages
Infectious Disease Modelling: Lamiaa A. Amar, Ashraf A. Taha, Marwa Y. Mohamed
No ratings yet
Infectious Disease Modelling: Lamiaa A. Amar, Ashraf A. Taha, Marwa Y. Mohamed
13 pages
Published Article - Iyakaremye Ernest - 22nd February 2024
No ratings yet
Published Article - Iyakaremye Ernest - 22nd February 2024
22 pages
Relationship Between HR Practices and Employee Engagement in Indian Insurance Companies
No ratings yet
Relationship Between HR Practices and Employee Engagement in Indian Insurance Companies
10 pages
MS&E 448 Final Presentation High Frequency Algorithmic Trading
No ratings yet
MS&E 448 Final Presentation High Frequency Algorithmic Trading
29 pages
Beyond - Cost - Model (Price Elasticity)
No ratings yet
Beyond - Cost - Model (Price Elasticity)
29 pages
Data Science in R Interview Questions and Answers
No ratings yet
Data Science in R Interview Questions and Answers
56 pages
Research Analysis Offender Assessment System PDF
100% (1)
Research Analysis Offender Assessment System PDF
367 pages
2012 Edition: Integrating Theory in Research
No ratings yet
2012 Edition: Integrating Theory in Research
26 pages
Iternship Final CHANDU
No ratings yet
Iternship Final CHANDU
39 pages
Statistician Biostatistician Data Scientist in Dayton OH Resume Converse Griffith
No ratings yet
Statistician Biostatistician Data Scientist in Dayton OH Resume Converse Griffith
2 pages
Logistics - Theory and Practice PDF
100% (1)
Logistics - Theory and Practice PDF
47 pages
Statistical Modeling For Biomedical Researchers 1st Edition William D. Dupont PDF Download
100% (1)
Statistical Modeling For Biomedical Researchers 1st Edition William D. Dupont PDF Download
63 pages
Adaptive Fault Detection Scheme Using An Optimized Self-Healing Ensemble Machine Learning Algorithm
100% (1)
Adaptive Fault Detection Scheme Using An Optimized Self-Healing Ensemble Machine Learning Algorithm
12 pages
Minors4proj 1
No ratings yet
Minors4proj 1
13 pages
Deep Learning (Part 8) - Coursesteach
No ratings yet
Deep Learning (Part 8) - Coursesteach
16 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

What Is Machine Learning?

Scope of Machine Learning

iv. Healthcare industry

2. Multiple Linear Regression (MLR)

β1 = coefficient for X 1 variable

Examples of Linear Algebra in Machine Learning

Below are some popular examples of linear algebra in Machine learning:

 Datasets and Data Files

Probability and Statistics for Machine Learning

iii. Standard Deviation

4 Common Probability Distributions for Machine Learning

1. Normal (Gaussian) distribution

The key characteristics of the normal distribution are:

x =random continuous Variable

σ =standard deviation and σ²=variance

if suppose: μ=0 and σ²=1

y=exp(-x²) ,which looks like a bell curve

A binomial experiment is a statistical experiment, where a binomial random variable is the

Conditions of the Binomial Distribution

3. It has a fixed number of trials (n).

4. The probability of success in each trial is constant.

It is the discrete probabilistic distribution with parameters n and p, where n is number of

A uniform distribution, also called a rectangular distribution, is a probability

3.1 Discrete Uniform Distribution:

3.2 Uniform Continuous Distribution:

The Poisson random variable satisfies the following conditions:

The probability mass function of random variable X is given by:

Hypothesis Space (H)

If Theoretical_Data < Observed_Data

What is hypothesis testing?

Hypothesis Test Methods:

F = Variation between the sample means/variation within the samples

For negatively skewed data we would need to perform feature transformation

Following are the two types of chi-squared tests:

1. Chi-squared test of independence – We use the Chi-Square test to determine whether

You might also like