Data Science | Solving Linear Equations
Last Updated :
29 Jul, 2024
Linear Algebra is a very fundamental part of Data Science. When one talks about Data Science, data representation becomes an important aspect of Data Science. Data is represented usually in a matrix form. The second important thing in the perspective of Data Science is if this data contains several variables of interest, then one is interested to know how many of these are very important. And if there are relationships between these variables, then how can one uncover these relationships?
You can go through the Introduction to Data Science: Skills Required article to have a basic understanding of what Data Science is?
Linear algebraic tools allow us to understand these data. So, a Data Science enthusiast needs to have a good understanding of this concept before going to understand complex machine learning algorithms.
Matrices and Linear Algebra
There are many ways to represent the data, matrices provide you with a convenient way to organize these data.
- Matrices can be used to represent samples with multiple attributes in a compact form
- Matrices can also be used to represent linear equations in a compact and simple fashion
- Linear algebra provides tools to understand and manipulate matrices to derive useful knowledge from data
Identification of Linear Relationships Among Attributes
We identify the linear relationship between attributes using the concept of null space and nullity. Before proceeding further, go through Null Space and Nullity of a Matrix.
Preliminaries
Generalized linear equations are represented as below:
Ax=b
Where,
- A is an mXn matrix
- x is nX1 variables or unknown terms
- b is the dependent variable mX1
- m & n are the numbers of equations and variables
In general, there are three cases one needs to understand:

We will consider these three cases independently.
Full row rank and full column rank
For a matrix A (m x n)
Full Row Rank | Full Column Rank |
---|
When all the rows of the matrix are linearly independent, | When all the columns of the matrix are linearly independent, |
Linearly independent rows mean no row can be expressed as a linear combination of the other rows. In other words, every row contributes uniquely to the overall information in the matrix. If a matrix has full row rank, the number of linearly independent rows is equal to the number of columns in the matrix. | Linearly independent columns mean no column can be expressed as a linear combination of the other columns. In this case, each column carries unique information in the matrix. If a matrix has full column rank, the number of linearly independent columns is equal to the number of rows in the matrix. |
Data sampling does not present a linear relationship – samples are independent | Attributes are linearly independent |
Note: In general whatever the size of the matrix it is established that the row rank is always equal to the column rank. It means for any size of the matrix if we have a certain number of independent rows, we will have those many numbers of independent columns.
In general case, if we have a matrix m x n and m is smaller than n then the maximum rank of the matrix can only be m. So, the maximum rank is always the less of the two numbers m and n.
Case 1: m = n
The solution for this type of linear equation if A is a full rank matrix having determinant of A is equal to 0 will be:
[Tex]\begin{aligned} Ax&=b \\ x &= A^{-1}b \end{aligned}[/Tex]

Example 1.1:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&3\\ 2&4\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 7\\ 10\\ \end{bmatrix}[/Tex]
- |A| is not equal to zero
- rank(A) = 2 = no. of columns this implies that A is full rank
[Tex]\begin{aligned} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}
&= \begin{bmatrix} 1&3\\ 2&4\\ \end{bmatrix}^{-1}
\begin{bmatrix} 7\\ 10\\ \end{bmatrix}
\\&= \begin{bmatrix} -2&1.5\\ 1&-0.5\\ \end{bmatrix}
\begin{bmatrix} 7\\ 10\\ \end{bmatrix} \\&= \begin{bmatrix} 1\\ 2\\ \end{bmatrix} \end{aligned}[/Tex]
Therefore, the solution for the given example is [Tex](x_1, x_2) = (1, 2) [/Tex]
Example 1.2:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&2\\ 2&4\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 10\\ \end{bmatrix}[/Tex]
- |A| is not equal to zero
- rank(A) = 1nullity = 1
Checking consistency
[Tex]\begin{bmatrix} x_1 + 2x_2\\ 2x_1 + 4x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 10\\ \end{bmatrix}[/Tex]
- Row (2) = 2
- Row (1)
- The equations are consistent with only one linearly independent equation
- The solution set for (x_1, x_2) is infinite because we have only one linearly independent equation and two variables
Explanation: In the above example we have only one linearly independent equation i.e. [Tex]x_1+2x_2 = 5 [/Tex]. So, if we take [Tex]x_2 = 0 [/Tex], then we have [Tex]x_1 = 5 [/Tex]; if we take [Tex]x_2 = 1 [/Tex], then we have [Tex]x_1 = 3 [/Tex]. In a similar fashion, we can have many solutions to this equation. We can take any value of [Tex]x_2 [/Tex]( we have infinite choices for [Tex]x_2 [/Tex]) and correspondingly for each value of [Tex]x_2 [/Tex]we will get one [Tex]x_1 [/Tex]. Hence, we can say that this equation has infinite solutions.
Example 1.3:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&2\\ 2&4\\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 9\\ \end{bmatrix} [/Tex]
- |A| is not equal to zero
- rank(A) = 1
- nullity = 1
Checking consistency
[Tex]\begin{bmatrix} x_1 + 2x_2\\ 2x_1 + 4x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 9\\ \end{bmatrix} [/Tex]
2 Row (1) = [Tex]2x_1 + 4x_2 = 10 \neq 9 [/Tex]
Therefore, the equations are inconsistent
We cannot find the solution to ([Tex]x_1, x_2 [/Tex])
Case 2: m > n
- In this case, the number of variables or attributes is less than the number of equations.
- Here, not all the equations can be satisfied.
- So, it is sometimes termed as a case of no solution.
- But, we can try to identify an appropriate solution by viewing this case from an optimization perspective.
An optimization perspective
-Rather than finding an exact solution of a system of linear equations Ax = b, we can find an [Tex]x [/Tex] such that (Ax-b) can be minimized.
Here, Ax-b is a vector.
There will be as many error terms as the number of equations- Denote Ax-b = e (m x 1); there are m errors e_i, i = 1:m- We can minimize all the errors collectively by minimizing [Tex]\sum_{i=1}^{m} e_i^{2}- [/Tex] This is the same as minimizing (Ax-b)^{T}(Ax-b)
So, the optimization problem becomes
[Tex]\begin{aligned} \sum_{i=1}^{m} e_i^{2}&=min[(Ax-b)^{T}(Ax-b)] \\&=min[(x^{T}A^{T}-b^{T})(Ax-b)] \\&=min[x^{T}A^{T}Ax-b^{T}Ax-x^{T}A^{T}b+b^{T}b] &\\&=f(x) \end{aligned}[/Tex]
Here, we can notice that the optimization problem is a function of x. When we solve this optimization problem, it will give us the solution for x. We can obtain the solution to this optimization problem by differentiating [Tex]f(x) [/Tex]with respect to x and setting the differential to zero.
[Tex] \nabla f(x) = 0 [/Tex]
– Now, differentiating f(x) and setting the differential to zero results in
[Tex]\begin{aligned} \nabla f(x) &= 0 \\ 2(A^{T}A)x – 2A^{T}b &= 0 \\A^{T}Ax &= A^{T}b \end{aligned}[/Tex]
– Assuming that all the columns are linearly independent
[Tex]x = (A^{T}A)^{-1}A^{T}b [/Tex]
Note: While this solution x might not satisfy all the equations but it will ensure that the errors in the equations are collectively minimized.
Example 2.1:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 1\\ -0.5\\ 5\\ \end{bmatrix}[/Tex]
Using the optimization concept
[Tex]\begin{aligned} x
&= (A^{T}A)^{-1}A^{T}b
\\\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}
&= \begin{bmatrix} 0.2&-0.6\\ -0.6&2.8
\\ \end{bmatrix}
\begin{bmatrix} 15\\ 5\\ \end{bmatrix}
\\ &= \begin{bmatrix} 0\\ 5\\ \end{bmatrix}
\end{aligned}[/Tex]
Therefore, the solution for the given linear equation is [Tex](x_1, x_2) = (0, 5)[/Tex]
Substituting in the equation shows
[Tex]\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} 0\\ 5\\ \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ 5\\ \end{bmatrix} \neq \begin{bmatrix} 1\\ -0.5\\ 5\\ \end{bmatrix}[/Tex]
Example 2.2:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 5\\ \end{bmatrix}[/Tex]
Using the optimization concept
[Tex]\begin{aligned} x &= (A^{T}A)^{-1}A^{T}b
\\ \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}
&= \begin{bmatrix} 0.2&-0.6\\ -0.6&2.8\\ \end{bmatrix}
\begin{bmatrix} 20\\ 5\\ \end{bmatrix}
\\ &= \begin{bmatrix} 1\\ 2\\ \end{bmatrix}
\end{aligned}[/Tex]
Therefore, the solution for the given linear equation is [Tex](x_1, x_2) = (1, 2)[/Tex]
Substituting in the equation shows:
[Tex]\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} 1\\ 2\\ \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 5\\ \end{bmatrix} = \begin{bmatrix} 1\\ 2\\ 5\\ \end{bmatrix}[/Tex]
So, the important point to notice in case 2 is that if we have more equations than variables then we can always use the least square solution which is [Tex]x = (A^{T}A)^{-1}A^{T}b [/Tex].
There is one thing to keep in mind is that [Tex](A^{T}A)^{-1} [/Tex] exists if the columns of A are linearly independent.
Case 3: m < n
- This case deals with more number of attributes or variables than equations
- Here, we can obtain multiple solutions for the attributes
- This is an infinite solution case.
- We will see how we can choose one solution from the set of infinite possible solution
In this case, also we have an optimization perspective. Know what is Lagrange function here.
– Given below is the optimization problem [Tex]min\left[ \frac{1}{2}x^{T}x \right] [/Tex]
such that, Ax=b
– We can define a Lagrangian function
[Tex]min[ f(x, \lambda)] =min\left[ \frac{1}{2}x^{T}x + \lambda^{T}(Ax-b) \right] [/Tex]
– Differentiate the Lagrangian with respect to x, and set it to zero, then we will get,
[Tex]\begin{aligned} x + A^{T}\lambda &= 0 \\ x &= -A^{T}\lambda \end{aligned}[/Tex]
Pre – multiplying by A
[Tex]\begin{aligned} Ax&=b \\A(-A^{T}\lambda) &= b \\ -AA^{T}\lambda&=b \\ \lambda &= -(AA^{T})^{-1}b \end{aligned}[/Tex]
assuming that all the rows are linearly independent
[Tex]\begin{aligned} x &= -A^{T}\lambda \\ &= A^{T}(AA^{T})^{-1}b \end{aligned}[/Tex]
Example 3.1:
Consider the given matrix equation:
[Tex]\begin{bmatrix} 1&2&3\\ 0&0&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} = \begin{bmatrix} 2\\ 1\\ \end{bmatrix}[/Tex]
Using the optimization concept
[Tex]\begin{aligned} x &= A^{T}(AA^{T})^{-1}b
\\ &= \begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix}
\left( \begin{bmatrix} 1&2&3\\ 0&0&1\\ \end{bmatrix}
\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\
\end{bmatrix} \right )^{-1}
\begin{bmatrix} 2\\ 1\\ \end{bmatrix}
\\ &= \begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix}
\begin{bmatrix} -0.2\\ 1.6\\ \end{bmatrix}
\\ \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix}
&= \begin{bmatrix} -0.2\\ -0.4\\ 1\\ \end{bmatrix}
\end{aligned}[/Tex]
The solution for the given sample is ([Tex]x_1, x_2, x_3 [/Tex]) = (-0.2, -0.4, 1)
You can easily verify that
[Tex]\begin{bmatrix} 1&0\\ 2&0\\ 3&1\\ \end{bmatrix} % \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} = \begin{bmatrix} 2\\ 1\\ \end{bmatrix}[/Tex]
Generalization
- The above-described cases cover all the possible scenarios that one may encounter while solving linear equations.
- The concept we use to generalize the solutions for all the above cases is called Moore – Penrose Pseudoinverse of a matrix.
- Singular Value Decomposition can be used to calculate the psuedoinverse or the generalized inverse ([Tex]A^+ [/Tex]).
Similar Reads
Maths for Machine Learning
Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
5 min read
Linear Algebra and Matrix
Matrices
Matrices are key concepts in mathematics, widely used in solving equations and problems in fields like physics and computer science. A matrix is simply a grid of numbers, and a determinant is a value calculated from a square matrix. Example: [Tex]\begin{bmatrix} 6 & 9 \\ 5 & -4 \\ \end{bmatr
3 min read
Scalar and Vector
Scalar and Vector Quantities are used to describe the motion of an object. Scalar Quantities are defined as physical quantities that have magnitude or size only. For example, distance, speed, mass, density, etc. However, vector quantities are those physical quantities that have both magnitude and di
8 min read
Add Two Matrices - Python
The task of adding two matrices in Python involves combining corresponding elements from two given matrices to produce a new matrix. Each element in the resulting matrix is obtained by adding the values at the same position in the input matrices. For example, if two 2x2 matrices are given as: The su
3 min read
Python Program to Multiply Two Matrices
Given two matrices, we will have to create a program to multiply two matrices in Python. Example: Python Matrix Multiplication of Two-Dimension [GFGTABS] Python matrix_a = [[1, 2], [3, 4]] matrix_b = [[5, 6], [7, 8]] result = [[0, 0], [0, 0]] for i in range(2): for j in range(2): result[i][j] = (mat
5 min read
Vector Operations
Vectors are fundamental quantities in physics and mathematics, that have both magnitude and direction. So performing mathematical operations on them directly is not possible. So we have special operations that work only with vector quantities and hence the name, vector operations. Thus, It is essent
8 min read
Product of Vectors
Vector operations are used almost everywhere in the field of physics. Many times these operations include addition, subtraction, and multiplication. Addition and subtraction can be performed using the triangle law of vector addition. In the case of products, vector multiplication can be done in two
6 min read
Scalar Product of Vectors
Two vectors or a vector and a scalar can be multiplied. There are mainly two kinds of products of vectors in physics, scalar multiplication of vectors and Vector Product (Cross Product) of two vectors. The result of the scalar product of two vectors is a number (a scalar). The common use of the scal
9 min read
Dot and Cross Products on Vectors
A quantity that has both magnitude and direction is known as a vector. Various operations can be performed on such quantities, such as addition, subtraction, and multiplication (products), etc. Some examples of vector quantities are: velocity, force, acceleration, and momentum, etc. Vectors can be m
8 min read
Transpose a matrix in Single line in Python
Transpose of a matrix is a task we all can perform very easily in Python (Using a nested loop). But there are some interesting ways to do the same in a single line. In Python, we can implement a matrix as a nested list (a list inside a list). Each element is treated as a row of the matrix. For examp
4 min read
Transpose of a Matrix
A Matrix is a rectangular arrangement of numbers (or elements) in rows and columns. It is often used in mathematics to represent data, solve systems of equations, or perform transformations. A matrix is written as: [Tex]A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9\e
12 min read
Adjoint and Inverse of a Matrix
Given a square matrix, find the adjoint and inverse of the matrix. We strongly recommend you to refer determinant of matrix as a prerequisite for this. Adjoint (or Adjugate) of a matrix is the matrix obtained by taking the transpose of the cofactor matrix of a given square matrix is called its Adjoi
15+ min read
How to inverse a matrix using NumPy
In this article, we will see NumPy Inverse Matrix in Python before that we will try to understand the concept of it. The inverse of a matrix is just a reciprocal of the matrix as we do in normal arithmetic for a single number which is used to solve the equations to find the value of unknown variable
3 min read
Program to find Determinant of a Matrix
The determinant of a Matrix is defined as a special number that is defined only for square matrices (matrices that have the same number of rows and columns). A determinant is used in many places in calculus and other matrices related to algebra, it actually represents the matrix in terms of a real n
15+ min read
Program to find Normal and Trace of a matrix
Given a 2D matrix, the task is to find Trace and Normal of matrix.Normal of a matrix is defined as square root of sum of squares of matrix elements.Trace of a n x n square matrix is sum of diagonal elements. Examples : Input : mat[][] = {{7, 8, 9}, {6, 1, 2}, {5, 4, 3}}; Output : Normal = 16 Trace =
6 min read
Data Science | Solving Linear Equations
Linear Algebra is a very fundamental part of Data Science. When one talks about Data Science, data representation becomes an important aspect of Data Science. Data is represented usually in a matrix form. The second important thing in the perspective of Data Science is if this data contains several
9 min read
Data Science - Solving Linear Equations with Python
A collection of equations with linear relationships between the variables is known as a system of linear equations. The objective is to identify the values of the variables that concurrently satisfy each equation, each of which is a linear constraint. By figuring out the system, we can learn how the
4 min read
System of Linear Equations
In mathematics, a system of linear equations consists of two or more linear equations that share the same variables. These systems often arise in real-world applications, such as engineering, physics, economics, and more, where relationships between variables need to be analyzed. Understanding how t
8 min read
System of Linear Equations in three variables using Cramer's Rule
Cramer's rule: In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknown variables. It expresses the solution in terms of the determinants of the coefficient matrix and of matrices obtained from it by replacing one colu
12 min read
Eigenvalues and Eigenvectors
Eigenvectors are the directions that remain unchanged during a transformation, even if they get longer or shorter. Eigenvalues are the numbers that indicate how much something stretches or shrinks during that transformation. These ideas are important in many areas of math and engineering, including
15+ min read
Applications of Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors play a crucial role in a wide range of applications across engineering and science. Fields like control theory, vibration analysis, electric circuits, advanced dynamics, and quantum mechanics frequently rely on these concepts. One key application involves transforming ma
7 min read
How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?
In this article, we will discuss how to compute the eigenvalues and right eigenvectors of a given square array using NumPy library. Example: Suppose we have a matrix as: [[1,2], [2,3]] Eigenvalue we get from this matrix or square array is: [-0.23606798 4.23606798] Eigenvectors of this matrix are: [[
2 min read
Statistics for Machine Learning
Descriptive Statistic
Statistics serves as the backbone of data science providing tools and methodologies to extract meaningful insights from raw data. Data scientists rely on statistics for every crucial task - from cleaning messy datasets and creating powerful visualizations to building predictive models that glimpse i
5 min read
Measures of Central Tendency
Usually, frequency distribution and graphical representation are used to depict a set of raw data to attain meaningful conclusions from them. However, sometimes, these methods fail to convey a proper and clear picture of the data as expected. Therefore, some measures, also known as Measures of Centr
5 min read
Measures of Dispersion | Types, Formula and Examples
Measures of Dispersion are used to represent the scattering of data. These are the numbers that show the various aspects of the data spread across various parameters. Let's learn about the measure of dispersion in statistics, its types, formulas, and examples in detail. Dispersion in StatisticsDispe
10 min read
Mean, Variance and Standard Deviation
Mean, Variance and Standard Deviation are fundamental concepts in statistics and engineering mathematics, essential for analyzing and interpreting data. These measures provide insights into data's central tendency, dispersion, and spread, which are crucial for making informed decisions in various en
10 min read
Calculate the average, variance and standard deviation in Python using NumPy
Numpy in Python is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python. Numpy provides very easy methods to calculate the average, variance
5 min read
Random Variable
Random variable is a fundamental concept in statistics that bridges the gap between theoretical probability and real-world data. A Random variable in statistics is a function that assigns a real value to an outcome in the sample space of a random experiment. For example: if you roll a die, you can a
11 min read
Difference between Parametric and Non-Parametric Methods
Statistical analysis plays a crucial role in understanding and interpreting data across various disciplines. Two prominent approaches in statistical analysis are Parametric and Non-Parametric Methods. While both aim to draw inferences from data, they differ in their assumptions and underlying princi
8 min read
Probability Distribution - Function, Formula, Table
A probability distribution describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a frequency distribution shows how often outcomes occur in a sample or
15+ min read
Confidence Interval
Confidence Interval (CI) is a range of values that estimates where the true population value is likely to fall. Instead of just saying The average height of students is 165 cm a confidence interval allow us to say We are 95% confident that the true average height is between 160 cm and 170 cm. Before
9 min read
Covariance and Correlation
Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. In this article, we will learn about the differences
5 min read
Program to Find Correlation Coefficient
The correlation coefficient is a statistical measure that helps determine the strength and direction of the relationship between two variables. It quantifies how changes in one variable correspond to changes in another. This coefficient, sometimes referred to as the cross-correlation coefficient, al
9 min read
Robust Correlation
Correlation is a statistical tool that is used to analyze and measure the degree of relationship or degree of association between two or more variables. There are generally three types of correlation: Positive correlation: When we increase the value of one variable, the value of another variable inc
8 min read
Normal Probability Plot
The probability plot is a way of visually comparing the data coming from different distributions. These data can be of empirical dataset or theoretical dataset. The probability plot can be of two types: P-P plot: The (Probability-to-Probability) p-p plot is the way to visualize the comparing of cumu
3 min read
Quantile Quantile plots
The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it
8 min read
True Error vs Sample Error
True Error The true error can be said as the probability that the hypothesis will misclassify a single randomly drawn sample from the population. Here the population represents all the data in the world. Let's consider a hypothesis h(x) and the true/target function is f(x) of population P. The proba
3 min read
Bias-Variance Trade Off - Machine Learning
It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. There is a tradeoff between a modelâs ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. A p
3 min read
Understanding Hypothesis Testing
Hypothesis method compares two opposite statements about a population and uses sample data to decide which one is more likely to be correct.To test this assumption we first take a sample from the population and analyze it and use the results of the analysis to decide if the claim is valid or not. Su
14 min read
T-test
After learning about the Z-test we now move on to another important statistical test called the t-test. While the Z-test is useful when we know the population variance. The t-test is used to compare the averages of two groups to see if they are significantly different from each other. Suppose You wa
11 min read
Paired T-Test - A Detailed Overview
Studentâs t-test or t-test is the statistical method used to determine if there is a difference between the means of two samples. The test is often performed to find out if there is any sampling error or unlikeliness in the experiment. This t-test is further divided into 3 types based on your data a
5 min read
P-value in Machine Learning
P-value helps us determine how likely it is to get a particular result when the null hypothesis is assumed to be true. It is the probability of getting a sample like ours or more extreme than ours if the null hypothesis is correct. Therefore, if the null hypothesis is assumed to be true, the p-value
6 min read
F-Test in Statistics
F test is a statistical test that is used in hypothesis testing that determines whether the variances of two samples are equal or not. The article will provide detailed information on f test, f statistic, its calculation, critical value and how to use it to test hypotheses. To understand F test firs
6 min read
Z-test : Formula, Types, Examples
After learning about inferential statistics we now move on to a more specific technique used for making decisions based on sample data â the Z-test. Studying entire populations can be time-consuming, costly and sometimes impossible. so instead you take a sample from that population. This is where th
9 min read
Residual Leverage Plot (Regression Diagnostic)
In linear or multiple regression, it is not enough to just fit the model into the dataset. But, it may not give the desired result. To apply the linear or multiple regression efficiently to the dataset. There are some assumptions that we need to check on the dataset that made linear/multiple regress
5 min read
Difference between Null and Alternate Hypothesis
Hypothesis is a statement or an assumption that may be true or false. There are six types of hypotheses mainly the Simple hypothesis, Complex hypothesis, Directional hypothesis, Associative hypothesis, and Null hypothesis. Usually, the hypothesis is the start point of any scientific investigation, I
3 min read
Mann and Whitney U test
Mann and Whitney's U-test or Wilcoxon rank-sum testis the non-parametric statistic hypothesis test that is used to analyze the difference between two independent samples of ordinal data. In this test, we have provided two randomly drawn samples and we have to verify whether these two samples is from
5 min read
Wilcoxon Signed Rank Test
The Wilcoxon Signed Rank Test is a non-parametric statistical test used to compare two related groups. It is often applied when the assumptions for the paired t-test (such as normality) are not met. This test evaluates whether there is a significant difference between two paired observations, making
5 min read
Kruskal Wallis Test
The Kruskal-Wallis test (H test) is a nonparametric statistical test used to compare three or more independent groups to determine if there are statistically significant differences between them. It is an extension of the Mann-Whitney U test, which is used for comparing two groups. Unlike the one-wa
5 min read
Friedman Test
The Friedman Test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. It is often used when the data is in the form of rankings or ordinal data, and when you have more than two related groups or repeated measures. The Friedman test is the non-
6 min read
Probability Class 10 Important Questions
Probability is a fundamental concept in mathematics for measuring of chances of an event happening By assigning numerical values to the chances of different outcomes, probability allows us to model, analyze, and predict complex systems and processes. Probability Formulas for Class 10 It says the pos
4 min read
Probability and Probability Distributions
Mathematics - Law of Total Probability
Probability theory is the branch of mathematics concerned with the analysis of random events. It provides a framework for quantifying uncertainty, predicting outcomes, and understanding random phenomena. In probability theory, an event is any outcome or set of outcomes from a random experiment, and
13 min read
Bayes's Theorem for Conditional Probability
Bayes's Theorem for Conditional Probability: Bayes's Theorem is a fundamental result in probability theory that describes how to update the probabilities of hypotheses when given evidence. Named after the Reverend Thomas Bayes, this theorem is crucial in various fields, including engineering, statis
9 min read
Mathematics | Probability Distributions Set 1 (Uniform Distribution)
Prerequisite - Random Variable In probability theory and statistics, a probability distribution is a mathematical function that can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. For instance, if the random variable X is used to denote the
4 min read
Mathematics | Probability Distributions Set 4 (Binomial Distribution)
The previous articles talked about some of the Continuous Probability Distributions. This article covers one of the distributions which are not continuous but discrete, namely the Binomial Distribution. Introduction - To understand the Binomial distribution, we must first understand what a Bernoulli
5 min read
Mathematics | Probability Distributions Set 5 (Poisson Distribution)
The previous article covered the Binomial Distribution. This article talks about another Discrete Probability Distribution, the Poisson Distribution. Introduction -Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we c
4 min read
Uniform Distribution | Formula, Definition and Examples
A Uniform Distribution is a type of probability distribution in which every outcome in a given range is equally likely to occur. That means there is no biasâno outcome is more likely than another within the specified set. It is also known as rectangular distribution (continuous uniform distribution)
12 min read
Mathematics | Probability Distributions Set 2 (Exponential Distribution)
The previous article covered the basics of Probability Distributions and talked about the Uniform Probability Distribution. This article covers the Exponential Probability Distribution which is also a Continuous distribution just like Uniform Distribution. Introduction - Suppose we are posed with th
5 min read
Mathematics | Probability Distributions Set 3 (Normal Distribution)
The previous two articles introduced two Continuous Distributions: Uniform and Exponential. This article covers the Normal Probability Distribution, also a Continuous distribution, which is by far the most widely used model for continuous measurement. Introduction - Whenever a random experiment is r
5 min read
Mathematics | Beta Distribution Model
The Beta Distribution is a continuous probability distribution defined on the interval [0, 1], widely used in statistics and various fields for modeling random variables that represent proportions or probabilities. It is particularly useful when dealing with scenarios where the outcomes are bounded
12 min read
Gamma Distribution Model in Mathematics
Introduction : Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we can think of it as a random variable. Now, if this random variable X has gamma distribution, then its probability density function is given as follows
2 min read
Chi-Square Test for Feature Selection - Mathematical Explanation
One of the primary tasks involved in any supervised Machine Learning venture is to select the best features from the given dataset to obtain the best results. One way to select these features is the Chi-Square Test. Mathematically, a Chi-Square test is done on two distributions two determine the lev
4 min read
Student's t-distribution in Statistics
As we know normal distribution assumes two important characteristics about the dataset: a large sample size and knowledge of the population standard deviation. However, if we do not meet these two criteria, and we have a small sample size or an unknown population standard deviation, then we use the
10 min read
Python - Central Limit Theorem
Central Limit Theorem (CLT) is a foundational principle in statistics, and implementing it using Python can significantly enhance data analysis capabilities. Statistics is an important part of data science projects. We use statistical tools whenever we want to make any inference about the population
7 min read
Limits, Continuity and Differentiability
Limits, Continuity, and Differentiation are fundamental concepts in calculus. They are essential for analyzing and understanding function behavior and are crucial for solving real-world problems in physics, engineering, and economics. Table of Content LimitsKey Characteristics of LimitsExample of Li
10 min read
Implicit Differentiation
Implicit Differentiation is the process of differentiation in which we differentiate the implicit function without converting it into an explicit function. For example, we need to find the slope of a circle with an origin at 0 and a radius r. Its equation is given as x2 + y2 = r2. Now, to find the s
6 min read
Calculus for Machine Learning
Partial Derivatives in Engineering Mathematics
Partial derivatives are a basic concept in multivariable calculus. They convey how a function would change when one of its input variables changes, while keeping all the others constant. This turns out to be particularly useful in fields such as physics, engineering, economics, and computer science,
10 min read
Advanced Differentiation
Derivatives are used to measure the rate of change of any quantity. This process is called differentiation. It can be considered as a building block of the theory of calculus. Geometrically speaking, the derivative of any function at a particular point gives the slope of the tangent at that point of
8 min read
How to find Gradient of a Function using Python?
The gradient of a function simply means the rate of change of a function. We will use numdifftools to find Gradient of a function. Examples: Input : x^4+x+1 Output :Gradient of x^4+x+1 at x=1 is 4.99 Input :(1-x)^2+(y-x^2)^2 Output :Gradient of (1-x^2)+(y-x^2)^2 at (1, 2) is [-4. 2.] Approach: For S
2 min read
Optimization techniques for Gradient Descent
Gradient Descent is a widely used optimization algorithm for machine learning models. However, there are several optimization techniques that can be used to improve the performance of Gradient Descent. Here are some of the most popular optimization techniques for Gradient Descent: Learning Rate Sche
4 min read
Higher Order Derivatives
Higher order derivatives refer to the derivatives of a function that are obtained by repeatedly differentiating the original function. The first derivative of a function, fâ²(x), represents the rate of change or slope of the function at a point.The second derivative, fâ²â²(x), is the derivative of the
6 min read
Taylor Series
A Taylor series represents a function as an infinite sum of terms, calculated from the values of its derivatives at a single point. Taylor series is a powerful mathematical tool used to approximate complex functions with an infinite sum of terms derived from the function's derivatives at a single po
8 min read
Application of Derivative - Maxima and Minima
Derivatives have many applications, like finding rate of change, approximation, maxima/minima and tangent. In this section, we focus on their use in finding maxima and minima. Note: If f(x) is a continuous function, then for every continuous function on a closed interval has a maximum and a minimum
6 min read
Absolute Minima and Maxima
Absolute Maxima and Minima are the maximum and minimum values of the function defined on a fixed interval. A function in general can have high values or low values as we move along the function. The maximum value of the function in any interval is called the maxima and the minimum value of the funct
12 min read
Optimization for Data Science
From a mathematical foundation viewpoint, it can be said that the three pillars for data science that we need to understand quite well are Linear Algebra , Statistics and the third pillar is Optimization which is used pretty much in all data science algorithms. And to understand the optimization con
5 min read
Unconstrained Multivariate Optimization
Wikipedia defines optimization as a problem where you maximize or minimize a real function by systematically choosing input values from an allowed set and computing the value of the function. That means when we talk about optimization we are always interested in finding the best solution. So, let sa
4 min read
Lagrange Multipliers | Definition and Examples
In mathematics, a Lagrange multiplier is a potent tool for optimization problems and is applied especially in the cases of constraints. Named after the Italian-French mathematician Joseph-Louis Lagrange, the method provides a strategy to find maximum or minimum values of a function along one or more
8 min read
Lagrange's Interpolation
What is Interpolation? Interpolation is a method of finding new data points within the range of a discrete set of known data points (Source Wiki). In other words interpolation is the technique to estimate the value of a mathematical function, for any intermediate value of the independent variable. F
7 min read
Linear Regression in Machine learning
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It provides valuable insights for prediction and data analysis. This article will explore its types, assumptions, implementation, advantages and evaluation met
15+ min read
Ordinary Least Squares (OLS) using statsmodels
Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
3 min read
Regression in Machine Learning