0% found this document useful (0 votes)
52 views62 pages

Data Science Unit - 3 - 31.8.23

Uploaded by

rishavsingh7478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views62 pages

Data Science Unit - 3 - 31.8.23

Uploaded by

rishavsingh7478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

1

UNIT III
MATHEMATICAL FOUNDATION
SYLLABUS
Linear Algebra:
• Vectors,
• Matrices

Statistics:
• Describe single set of data
• Correlation
• Bayes’s Theorem
• Random Variables
• Continuous Distributions and Normal Distribution.
2

Linear Algebra
3
Linear Algebra

Linear Algebra
 Linear Algebra is a branch of mathematics that is extremely useful in data science and
machine learning.
 Linear algebra is the most important math skill in machine learning.
 Most machine learning models can be expressed in matrix form.
 A dataset itself is often represented as a matrix.
 Linear algebra is used in data preprocessing, data transformation, and model evaluation.
4
5

What is optimization in data science?


 Data optimization is the process by which organizations extract, analyze, and store data for
maximum efficiency. There are a wide variety of data optimization tools that you can use to
optimize your data, including automation solutions.
 It is at the heart of almost all machine learning and statistical techniques used in data
science. It helps to find minimum error or best solution for a problem. For example, in
regression, error is calculated as: Optimization helps find a minimum value for the loss
function
6
7
8

Do you need algebra for data science?


 Linear algebra is very helpful in machine learning and data science. The
most crucial mathematical ability in machine learning is linear algebra. Most
artificial intelligence models can be written as matrices. A dataset is
frequently shown as a matrix.
What kind of math is used in data science?
 Calculus, linear algebra, and statistics are the three Math subjects needed
for data science.
Do you need to know linear algebra for data science?
 In data science and machine learning, linear algebra is a crucial technique.
Beginners who are interested in data science should therefore become
9

How is linear algebra used in data science?


 Linear algebra is used to solve systems of linear equations, which are
used to model many real-world problems. It is also used to represent
and manipulate large sets of data, such as images, videos, and audio.

 Loss functions, regularization, support vector classification, image


recognition, dimensionality reduction and many other machine
learning techniques are all applications of linear algebra.
10

The list of following Linear Algebra methods:


 Vectors
 Matrices
 Transpose of a matrix
 Inverse of a matrix
 Determinant of a matrix
 Trace of a matrix
 Dot product
 Eigenvalues
 Eigenvectors
11

Some of the key applications of linear


algebra in data science are :
 Linear Regression
 Principal Component Analysis (PCA)
 Computer Vision
 Singular Value Decomposition (SVD)
 Natural Language Processing (NLP)
 Eigenvalue Decomposition
 Clustering
 Matrix Factorization
 Image Processing
 Latent Semantic Analysis (LSA)
 Optimization
 Recommender Systems
 Signal Processing
 Neural Networks
 Control Systems.
12

Elementary Linear Algebra


 Elementary linear algebra introduces students to the basics of linear algebra. This
includes simple matrix operations, various computations that can be done on a system
of linear equations, and certain aspects of vectors. Some important terms associated
with elementary linear algebra are given below:

 Scalars - A scalar is a quantity that only has magnitude and not direction. It is an
element that is used to define a vector space. In linear algebra, scalars are usually real
numbers.

 Vectors - A vector is an element in a vector space. It is a quantity that can describe


both the direction and magnitude of an element.
13

 Vector Space - The vector space consists of vectors that may be added
together and multiplied by scalars.

 Matrix - A matrix is a rectangular array wherein the information is


organized in the form of rows and columns. Most linear algebra
properties can be expressed in terms of a matrix.

 Matrix Operations - These are simple arithmetic operations such as


addition, subtraction, and multiplication that can be conducted on
matrices.
14

Advanced Linear Algebra

 More advanced concepts related to linear equations, vectors, and


matrices. Certain important terms that are used in advanced
linear algebra are as follows:

 Linear Transformations - The transformation of a function from


one vector space to another by preserving the linear structure of
each vector space.
15

 Inverse of a Matrix - When an inverse of a matrix is multiplied with the


given original matrix then the resultant will be the identity matrix. Thus,
A-1A = I.

 Eigenvector - An eigenvector is a non-zero vector that changes by a


scalar factor (eigenvalue) when a linear transformation is applied to it.

 Linear Map - It is a type of mapping that preserves vector addition and


vector multiplication.
16

Vectors
17

Definition
 Vectors are mathematical objects that contain both
magnitude and direction, and they can be represented by the
directed line segments (lines having directions) whose
lengths are their magnitude.

 Itis used to describe the movement of an object from one


point to another.
18
19

 The direction of the vector is from its tail to its head.


 Notation: The vector between two points (A & B) can be given by:
 a or A or AB
 Standard Form: A = ai +bj+ck, where a, b, c are real numbers, and i, j, and k are the
unit vectors along the x, y, and z-axis.
 Example: Velocity of a car.
 In simple terms, a car’s velocity means the car’s speed is moving in a particular direction.
 It can also be defined as a Tuple of one or more scalar values.
 Example: V = (a, b, c); here, a, b, and c are scalars (real values).
20

 Magnitude
 The magnitude of any vector can be easily calculated by taking the square root of its
component, i.e., if
 A = ai + bj + ck, then,
 Magnitude of A = |A| = sqrt (a2 + b2 + c2)
 Example: a = 3i + 4j – 7k. Find the magnitude of a.
 Answer:
21

Types of Vectors
22

types of vectors, but here we will discuss seven different types of vectors that are commonly
used:
Zero Vector
 A vector is said to be a zero vector if the magnitude of the vector is zero.
 It is denoted by: O = (0, 0, 0).
 Also known as Additive Identity
 i.e., A + O = A = O + A
23

Unit Vector
 A vector is said to be a unit vector if the magnitude of the vector is one.
24

Negative Vector
 A vector is said to be a negative vector of a given vector if it has the same magnitude but
points in the opposite direction.
 In simple terms, when we multiply any vector with the -1, it changes the direction of the
vector.
 i.e., (-1)v = -v
25

Parallel Vectors or Collinear Vectors


 Two vectors, a, and b are said to parallel if they have the same direction but not the same
magnitude.
26

Equal Vectors
 Two vectors (a and b) are said to be equal if they have the same
magnitude and direction.
 If a = x1i + y1j + z1k, and b = x2i + y2j + z2k, then
 a = b if and only if x1 = x2, y1 = y2, and z1 = z2
 i.e., two vectors are equal if their corresponding components are equal.

Orthogonal Vectors
 Two vectors, a, and b, are orthogonal if and only if they are
perpendicular to each other.
 The angle between them is a right angle.
 Mathematically, vectors are orthogonal if the dot product of vectors is
zero.
27

 Co-Initial Vectors
 A vector is said to be a co-initial vector when two or more vectors have the same starting point,
for example, Vectors AB and AC are called co-initial vectors because they have the same
starting point A.
28

Applications of Vectors in Data Science:

Vectors are used in various machine learning algorithms


and operations such as
• regression
• classification
• clustering
• dimensionality reduction.
29

Matrices
30

 Algebra of Matrices is the branch of mathematics, which deals with the vector spaces between
different dimensions.
 The innovation of matrix algebra came into existence because of n-dimensional planes present in
our coordinate space.
 A matrix (plural: matrices) is an arrangement of numbers, expressions or symbols in a
rectangular array. This arrangement is done in horizontal-rows and vertical-columns, having an
order of number of rows x number of columns.
 Every pair of points in a Three-dimensional space represent a unique equation with one or more
than one solution.
 The basic idea or the central idea of applied mathematics revolves around Linear Algebra.
31

Algebra of Matrix
 Algebraof matrix involves the operation of matrices, such as
Addition, subtraction, multiplication etc.
 Addition/Subtraction of Matrices
 Two matrices can be added/subtracted, iff (if and only if) the
number of rows and columns of both the matrices are same,
or the order of the matrices are equal.
 For
addition/subtraction, each element of the first matrix is
added/subtracted to the elements present in the 2nd matrix.
32
33
34

Matrix Multiplication
 Like Matrix can be Multiplied two ways,
 (i) Scalar Multiplication
 (ii) Multiplication with another matrix
 Scalar Multiplication – It involves multiplying a scalar quantity to the matrix. Every element
inside the matrix is to be multiplied by the scalar quantity to form a new matrix.
 For example-Scalar Multiplication
35
36

Multiplication with another matrix or Matrix Multiplication


 Consider two matrix M1 & M2, having order of m1 × n1 and m2 × n2.

 The matrices can be multiplied if and only if n 1 = m2.

 The matrices, given above satisfies the condition for matrix multiplication, hence it is possible
to multiply those matrices.

 The resultant matrix obtained by multiplication of two matrices, is the order of m1, n2, where
m1 is the number of rows in the 1st matrix and n2 is the number of column of the 2nd matrix.
37
38
39

Rule of Matrix Algebra


The algebra of matrix follows some rules for addition and multiplication. Let us consider A, B
and C are three different square matrices. A’ is the transpose and A-1 is the inverse of A. I is
the identity matrix and R is a real number.

Now as per the rules of laws of matrices:

 A+B = B+A →Commutative Law of Addition


 A+B+C = A +(B+C) = (A+B)+C →Associative law of addition
 ABC = A(BC) = (AB)C →Associative law of multiplication
 A(B+C) = AB + AC →Distributive law of matrix algebra
40

Also, see here rules for The inverse rules of matrices are
transposition of matrices: as follows:
 (A’)’ = A  AI = IA = A
 (A+B)’ = A’+B’  AA-1 = A-1A = I
 (AB)’ = B’A’  (A-1)-1 = A
 (ABC)' = C’B’A’  (AB)-1 = B-1A-1
 (ABC)-1 = C-1B-1A-1
 (A’)-1 = (A-1)’
41

 (vi) Geology
Applications of Matrices
 (vii) Robotics and animation
 (viii) Wireless communication and signal
 Matrices have many applications in diverse
processing
fields of science, commerce and social
science. Matrices are used in:  (ix) Finance ices
 (i) Computer Graphics  (x) Mathematics
 (ii) Optics
 (iii) Cryptography
 (iv) Economics
 (v) Chemistry
42

Statistics
43

 What is Statistics?
 A visual and mathematical portrayal of information is statistics.
 Data science is all about making calculations with data.
 We make decisions based on that data using mathematical conditions
known as models.
 Numerous fields, including data science, machine learning, business
intelligence, computer science, and many others have become
increasingly dependent on statistics.
44
45

 Variable: A variable is anything that can be counted, be it a number, a property, or another


type of quantity. A data point is another name for it.
 Population: A population is a group of resources from which data can be gathered.
 Statistical Parameter: A statistical or population parameter is essentially a measurement
that aids in indexing a group of probability distributions, such as the mean, median, or mode of
a population.
 Probability Distribution: A probability distribution is a mathematical idea that mainly
provides the odds of occurring various potential outcomes, typically for an experiment by
statisticians.
 Sample: A sample is simply a portion of the population that is used to sample data and to
make predictions using inferential statistics.
46

Why is statistics important in data?


 Statistics is an important field because it helps us understand the
general trends and patterns in a given data set.
 Statistics can be used for analysing data and drawing conclusions from
it. It can also be used for making predictions about future events and
behaviours.
 There are two main data types: numerical and categorical.
 Numerical data is quantitative and can be represented by numbers.
 Categorical data is qualitative and can be represented by labels or
names.
47

 What is numerical data in data


science?
 Numerical data, also known as
quantitative data, is data that
you typically present in number
form, and it doesn't include any
language or descriptive form.
 It's always measurable, and you
can add it together.
48

What is Categorical Data?

 Categorical data is a type


 of data that is used to group
information with similar
characteristics, while numerical
data is a type of data that
expresses information in the
form of numbers.
49

Describe single set of data


50

What is a single set of data?


 A data set is an ordered
collection of data.
 A collection of information
obtained through observations,
measurements, study, or analysis
is referred to as data.
 It could include information such
as facts, numbers, figures,
names, or even basic
descriptions of objects.
51

 A Dataset is a set or collection of data.


 This set is normally presented in a tabular pattern. Every column
describes a particular variable.
 And each row corresponds to a given member of the data set, as per the
given question.
 This is a part of data management.
 Data sets describe values for each variable for unknown quantities such
as height, weight, temperature, volume, etc., of an object or values of
random numbers.
52

• A data set, or a dataset or data


collection, refers to a structured
collection of data points or
observations organized and
stored together for analysis or
processing.

• These are used in fields and


applications, including scientific
research, business analytics, and
machine learning.
53

 They provide a structured representation of real-world data and


serve as the basis for conducting statistical analyses, deriving
insights, and building predictive models.
A data set typically consists of individual data elements, often
called data points or samples, representing specific entities,
events, or instances.
 The attributes may be numerical (e.g., age, height) or
categorical (e.g., gender, color), depending on the nature of
the data.
54

Some essential properties of data sets are:


 Size: The size of a data refers to the number of
data points or observations it contains. It can
range from small sets with a few samples to large-
scale data with millions or billions of data points.
The data size impacts computational requirements,
storage capacity, and analysis techniques.
 Dimensions: The dimensions of a data refer to
the number of variables or features associated
with each data point. It represents the structure of
the tabular representation, where each column
corresponds to a specific attribute. The number of
dimensions can vary, and high-dimensional data
sets pose challenges in visualization, analysis, and
55

 Granularity: The granularity of data refers to the


level of detail or specificity. It determines the
precision and resolution of the observations. For
example, a sales data set could be recorded at a
daily, monthly, or yearly level, with each level of
granularity providing different insights and analysis
possibilities.
 Data Types: Data sets consist of data points with
different data types. Common data types include
numerical (e.g., integers, decimals), categorical
(e.g., labels, categories), text, dates, or binary
values. Understanding the data types in a data set
is essential for appropriate data handling,
preprocessing, and analysis.
56

Structure:
 Data sets can have structured or unstructured formats.
 Structured data sets have a well-defined schema or
organization, typically represented in tabular form.
Unstructured data sets need a predefined structure, such
as text documents, images, or social media posts.
 In addition, unstructured data requires specialized
extraction, transformation, and analysis techniques.
57

Correlation
58

What is a correlation in statistics?


 Correlation is a statistical measure (expressed as a number) that
describes the size and direction of a relationship between two or more
variables. A correlation between variables and Correlation refers to the
statistical relationship between the two entities. It measures the extent
to which two variables are linearly related.

 For example, the height and weight of a person are related, and taller
people tend to be heavier than shorter people.
59

There are three types of correlation:


 Positive Correlation: A positive correlation means that this linear relationship is positive, and the two
variables increase or decrease in the same direction.
 Negative Correlation: A negative correlation is just the opposite. The relationship line has a negative
slope, and the variables change in opposite directions, i.e., one variable decreases while the other increases.
 No Correlation: No correlation simply means that the variables behave very differently and thus, have no
linear relationship.
60

What is Correlation Coefficient?

 Correlation coefficients give you the measure of


the strength of the linear relationship between
two variables.
 The letter r denotes the value, and it ranges
between -1 and +1

 If r < 0, it implies negative correlation


 If r > 0, it implies positive correlation
 If r = 0, it implies no correlation
61

Types of Correlation Coefficient r = Coefficient of correlation


xbar = Mean of x-variable
 There are mainly two types of correlation ybar = Mean of y-variable
coefficients.
xi yi = Samples of variable x,y
Pearson’s Product Moment Correlation
 The Pearson correlation coefficient is defined in
statistics as the measurement of the strength
of the relationship between two variables and
their association. It is denoted by r.
 The correlation coefficient can be calculated by
using the below formula:
62

Spearman’s Rank Correlation


 Spearman’s rank correlation measures the strength and direction of
association between two ranked variables.
 It basically gives the measure of monotonicity of the relation between two
variables i.e. how well the relationship between two variables could be
represented using a monotonic function.

ρ= Spearman rank correlation


di= Difference between the ranks of corresponding
variables
n= Number of Observations00000

You might also like