Unit 1
Unit 1
Machine learning (ML) is a discipline of artificial intelligence (AI) that provides machines
with the ability to automatically learn from data and past experiences while identifying
patterns to make predictions with minimal human intervention.
Machine Learning helps to build automated systems that can learn by themselves. Then, the
system enhances their performance by learning from experience without any human
intervention.
i. Automotive Industry
The automotive industry is one of the areas where Machine Learning is excelling by
changing the definition of ‘safe’ driving. There are a few major companies such as Google,
Tesla, Mercedes Benz, Nissan, etc. that have invested hugely in Machine Learning to come
up with novel innovations.
ii. Robotics
Robotics is one of the fields that always gain the interest of researchers as well as the
common. Researchers all over the world are still working on creating robots that mimic the
human brain. They are using neural networks, AI, ML, computer vision, and many other
technologies in this research.
iii. Computer Vision
As the name suggests, computer vision gives a vision to a computer or a machine. Giving
the ability to a machine to recognize and analyze images, videos, graphics, etc. is the goal of
computer vision.
Machine learning is being increasingly adopted in the healthcare industry, credit to wearable
devices and sensors such as wearable fitness trackers, smart health watches, etc. All such
devices monitor users’ health data to assess their health in real-time.
v. Finance sector
Today, several financial organizations and banks use machine learning technology to tackle
fraudulent activities and draw essential insights from vast volumes of data.
Regression
Regression is basically a statistical approach to find the relationship between variables. In
machine learning, it is used to predict the outcome of an event based on the relationship
between variables obtained from the data-set.
1. Linear Regression
Linear regression is the most simple and popular technique for predicting a continuous
variable. It assumes a linear relationship between the outcome and the predictor variables.
In linear regression, the objective is to fit a hyper-plane (a line for 2D data points) by
minimizing the sum of mean-squared error for each data point.
The linear regression equation can be written as y = b0 + b*x + e
b0 is the intercept,
b is the slop or regression weight or coefficient associated with the predictor variable x.
e is the residual error
Technically, the linear regression coefficients are detetermined so that the error in predicting
the outcome value is minimized. This method of computing the beta coefficients is called the
Ordinary Least Squares Method (LSM).
If there is more than 1 predictor available then it is known as Multiple Linear Regression.
The equation for MLR will be:
3. Polynomial Regression
Linear regression assumes that the relationship between the dependant (y) and independent
(x) variables are linear. It fails to fit the data points when the relationship between them is
not linear. Polynomial regression expands the fitting capabilities of linear regression by
fitting a polynomial of degree n to the data points instead.
The equation of polynomial becomes something like this.
y = a0 + a1x1 + a2x12 + … + anx1n
For lower degrees, the relationship has a specific name (i.e., n = 2 is called quadratic, h = 3
is called cubic and so on).
4. Logistic Regression
Logistic Regression is a Machine Learning algorithm which is used for the classification
problems; it is a predictive analysis algorithm and based on the concept of probability.
Logistic regression is generally used where we have to classify the data into two or more
classes. One is binary and the other is multi-class logistic regression.
Logistic Regression is a classification algorithm for categorical variables like Yes/No,
True/False, 0/1, etc.”
Logistic (inverse-logit) is a strategy to map infinitely stretching space (-inf, inf) to a
probability space of [0,1].
1. LINEAR ALGEBRA
Linear algebra is a sub-field of mathematics concerned with vectors, matrices, and linear
transforms.
A vector is an array of numbers. A vector has magnitude and direction.
Whenever we work on a project that uses a machine-learning algorithm, there are two
significant steps involved. The first one is to understand the dataset, and this is where you
require knowledge of statistics. The second is predicting the probability of an event, for
example, estimating how likely a patient will have diabetes based on the information
received from their medical tests. Thus, this suggests how significant probability and
statistics are for machine learning.
Probability
Probability denotes the possibility of something happening. Probability simply talks about
how likely is the event to occur, and its value always lies between 0 and 1.
Probability Distributions
Distributions are an integral part of Machine learning as it helps to analyze the data.
Probability distributions are simply a collection of data (or scores) of a particular random
variable. Usually, these collections of data are arranged in some order and can be presented
graphically.
A probability distribution is a statistical function that describes all the possible values
and probabilities for a random variable within a given range. This range will be bound
by the minimum and maximum possible values, but where the possible value would be
plotted on the probability distribution will be determined by a number of
Characteristics.
Distribution Characteristics
Data distributions have different shapes; the data set used to draw the distribution defines
the distribution’s shape. We can describe each distribution using three characteristics: the
mean, the variance and the standard deviation. These characteristics can tell us different
things about the distribution’s shape and behaviour.
i. Mean
The mean (μ) is simply the average of a data set. For example, if we have a set of discrete
data {4,7,6,3,1}, the mean is 4.2.
The mean (μ) is simply the average of a data set. For example, if we have a set of discrete
data {4,7,6,3,1}, the mean is 4.2.
ii. Variance
The variance (var(X)) is the average of the squared differences from the mean. For example,
if we have the same data set from before {4,7,6,3,1}, then the variance will be 5.7.
The standard deviation (σ) is a measure of how spread out the numbers is in a data set. So, a
small standard deviation indicates that the values are closer to each other, while a large
standard deviation indicates the data set values are spread out.
Gaussian distribution / Normal distribution is famous for its bell-like shape, and it’s one of
the most commonly used distributions in ML and data science.
The curve is symmetric at the centre, which means it can be divided into two even
sections around the mean.
Because the normal distribution is a probability distribution, the area under the
distribution curve is equal to one.
μ=mean
then this equation will become in the form of (ignoring the constant terms)
2. Each trial can be classified as either success or failure, where the probability of
success is p while the probability of failure is 1-p.
where, k success occurs with p^k probability and n-k failures with (1 − p)n − k probability
Binomial coefficient
3. Uniform Distribution
In statistics, uniform distribution refers to a type of probability distribution in which all
outcomes are equally likely. A deck of cards has within it uniform distributions because the
likelihood of drawing a heart, a club, a diamond, or a spade is equally likely
The Poisson distribution is a discrete distribution that measures the probability of a given
number of events happening in a specified time period.
P(X = x) = e-(𝜇x/x!),
where 𝜇 is the mean number of events and x is the number of events in that interval.
Poisson distribution
HYPOTHESIS
Hypothesis (h)
Hypothesis is some event which may have the probability of happening or not happening.
Hypothesis is described as a recommended solution for an incident which doesn’t into
current theory.
Hypothesis is a provisional idea that requires some evaluations.
Types
Null Hypothesis (H0): No difference between ‘Results’ and ‘Assumption’.
Alternative Hypothesis (HA): Results disapproves ‘Assumptions’.
Level of Significance:
It refers to the degree of significance in which we accept or reject the null-hypothesis. It is
the value that is the base of selection between Null and Alternative Hypothesis.
100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select
a level of significance that is usually 5%.
This is normally denoted with alpha (maths symbol) and generally it is 0.05 or 5% , which
means your output should be 95% confident to give similar kind of result in each sample.
P-value :- The P value, or calculated probability, is the probability of finding the observed
results when the null hypothesis (H0) of a study question is true.
If your P value is less than the chosen significance level then you reject the null hypothesis.
Hypothesis Testing
Hypothesis testing is a statistical method that is used in making statistical decisions using
experimental data. Hypothesis Testing is basically an assumption that we make about the
population parameter.
All the assumption needs some statistic way to prove those. We need some mathematical
conclusion whatever we are assuming is true.
Z-statistic – Z Test
Z-statistic is used when the sample follows a normal distribution. It is calculated based on
the population parameters like mean and standard deviation.
One sample Z test is used when we want to compare a sample mean with a population mean.
Two sample Z test is used when we want to compare the mean of two samples.
T-statistic – T-Test
T-statistic is used when the sample follows a T distribution and population parameters are
unknown. T distribution is similar to a normal distribution; it is shorter than normal
distribution and has a flatter tail.
If the sample size is less than 30 and population parameters are not known, we use T
distribution. Here also, we can use one Sample T-test and a two-sample T-test.
F-statistic – F test
For samples involving three or more groups, we prefer the F Test. Performing T-test on
multiple groups increases the chances of Type-1 error. ANOVA is used in such cases.
Analysis of variance (ANOVA) can determine whether the means of three or more groups are
different. ANOVA uses F-tests to statistically test the equality of means.
F-statistic is used when the data is positively skewed and follows an F distribution. F
distributions are always positive and skewed right.
Chi-Square Test
For categorical variables, we would be performing a chi-Square test.
Chi-Square Test example: In an election survey, voters might be classified by gender (male
or female) and voting preference (Democrat, Republican, or Independent). We could use a
chi-square test for independence to determine whether gender is related to voting preference.
CONVEX OPTIMIZATION
Convex Optimization is one of the most important techniques in the field of mathematical programming,
which has many applications. It also has much broader applicability beyond mathematics to disciplines like
Machine learning, data science, economics, medicine, and engineering.
The objective function is subjected to equality constraints and inequality constraints. Inequality
constraints indicate that the solution should lie in some range whereas equality constraint requires it to lie
exactly at a given point.
Convexity plays an important role in convex optimizations. Convexity is defined as the continuity of a
convex function’s first derivative. It ensures that convex optimization problems are smooth and have well-
defined derivatives to enable the use of gradient descent.
For convexity, convex sets are the most important. A convex set is a set that contains all points on or inside
its boundary and contains all convex combinations of points in its interior. A convex set is defined as a set of
all convex functions. Simply speaking, the convex function has a shape that is like a hill.
A convex optimization problem is thus to find the global maximum or minimum of convex function.
Convex sets are often used in convex optimization techniques because convex sets can be manipulated
through certain types of operations to maximize or minimize a convex function.