Python statistics | variance()
Last Updated :
29 Jul, 2024
Statistics module provides very powerful tools, which can be used to compute anything related to Statistics. variance() is one such function. This function helps to calculate the variance from a sample of data (sample is a subset of populated data).
variance() function should only be used when variance of a sample needs to be calculated. There's another function known as pvariance(), which is used to calculate the variance of an entire population.
In pure statistics, variance is the squared deviation of a variable from its mean. Basically, it measures the spread of random data in a set from its mean or median value. A low value for variance indicates that the data are clustered together and are not spread apart widely, whereas a high value would indicate that the data in the given set are much more spread apart from the average value.
Variance is an important tool in the sciences, where statistical analysis of data is common. It is the square of standard deviation of the given data-set and is also known as second central moment of a distribution. It is usually represented by s^{2}, \sigma ^{2}, \operatorname {Var} (X) in pure Statistics.
Variance is calculated by the following formula :
It's calculated by mean of square minus square of mean
\operatorname {Var} (X)=\operatorname {E} \left[(X-\mu )^{2}\right]
Syntax : variance( [data], xbar )
Parameters :
[data] : An iterable with real valued numbers.
xbar (Optional) : Takes actual mean of data-set as value.
Returntype : Returns the actual variance of the values passed as parameter.
Exceptions :
StatisticsError is raised for data-set less than 2-values passed as parameter.
Throws impossible values when the value provided as xbar doesn't match actual mean of the data-set.
Code #1 :
Python3
# Python code to demonstrate the working of
# variance() function of Statistics Module
# Importing Statistics module
import statistics
# Creating a sample of data
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]
# Prints variance of the sample set
# Function will automatically calculate
# it's mean and set it as xbar
print("Variance of sample set is % s"
%(statistics.variance(sample)))
Output :
Variance of sample set is 0.40924
Code #2 : Demonstrates variance() on a range of data-types
Python3
# Python code to demonstrate variance()
# function on varying range of data-types
# importing statistics module
from statistics import variance
# importing fractions as parameter values
from fractions import Fraction as fr
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s " %(variance(sample1)))
print("Variance of Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s " %(variance(sample3)))
print("Variance of Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s " %(variance(sample5)))
Output :
Variance of Sample 1 is 15.80952380952381
Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006
Code #3 : Demonstrates the use of xbar parameter
Python3
# Python code to demonstrate
# the use of xbar parameter
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# calculating the variance of sample set
print("Variance of Sample set is % s"
%(statistics.variance(sample, xbar = m)))
Output :
Variance of Sample set is 0.3656666666666667
Code #4 : Demonstrates the Error when value of xbar is not same as the mean/average value
Python3
# Python code to demonstrate the error caused
# when garbage value of xbar is entered
# Importing statistics module
import statistics
# creating a sample list
sample = (1, 1.3, 1.2, 1.9, 2.5, 2.2)
# calculating the mean of sample set
m = statistics.mean(sample)
# Actual value of mean after calculation
# comes out to 1.6833333333333333
# But to demonstrate xbar error let's enter
# -100 as the value for xbar parameter
print(statistics.variance(sample, xbar = -100))
Output :
0.3656666666663053
Note : It is different in precision from the output in Code #3
Code #4 : Demonstrates StatisticsError
Python3
# Python code to demonstrate StatisticsError
# importing Statistics module
import statistics
# creating an empty data-srt
sample = []
# will raise Statistics Error
print(statistics.variance(sample))
Output :
Traceback (most recent call last):
File "/home/64bf6d80f158b65d2b75c894d03a7779.py", line 10, in
print(statistics.variance(sample))
File "/usr/lib/python3.5/statistics.py", line 555, in variance
raise StatisticsError('variance requires at least two data points')
statistics.StatisticsError: variance requires at least two data points
Applications :
Variance is a very important tool in Statistics and handling huge amounts of data. Like, when the omniscient mean is unknown (sample mean) then variance is used as biased estimator. Real world observations like the value of increase and decrease of all shares of a company throughout the day cannot be all sets of possible observations. As such, variance is calculated from a finite set of data, although it won't match when calculated taking the whole population into consideration, but still it will give the user an estimate which is enough to chalk out other calculations.
Similar Reads
Statistics with Python
Statistics, in general, is the method of collection of data, tabulation, and interpretation of numerical data. It is an area of applied mathematics concerned with data collection analysis, interpretation, and presentation. With statistics, we can see how data can be used to solve complex problems. I
11 min read
sympy.stats.variance() function in Python
In mathematics, the variance is the way to check the difference between the actual value and any random input, i.e variance can be calculated as a squared difference of these two values. With the help of sympy.stats.variance() method, we can calculate the value of variance by using this method. Synt
1 min read
Wand statistic function - Python
The statistic() function is an inbuilt function in the Python Wand ImageMagick library which is used to replace each pixel with the statistic results from neighboring pixel values. The width & height defines the size, or aperture, of the neighboring pixels. Syntax: statistic(stat, width, height,
2 min read
Python statistics | pvariance()
Prerequisite : Python statistics | variance()pvariance() function helps to calculate the variance of an entire, rather than that of a sample. The only difference between variance() and pvariance() is that while using variance(), only the sample mean is taken into consideration, while during pvarianc
5 min read
Python - Triangular Distribution in Statistics
scipy.stats.triang () is a triangular continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. Parameters : q : lower and upper tail probability x : quantiles lo
2 min read
Python Variables
In Python, variables are used to store data that can be referenced and manipulated during program execution. A variable is essentially a name that is assigned to a value. Unlike many other programming languages, Python variables do not require explicit declaration of type. The type of the variable i
6 min read
sciPy stats.variation() function | Python
scipy.stats.variation(arr, axis = None) function computes the coefficient of variation. It is defined as the ratio of standard deviation to mean. Parameters : arr : [array_like] input array. axis : [int or tuples of int] axis along which we want to calculate the coefficient of variation. -> axis = 0
2 min read
statistics mean() function - Python
The mean() function from Pythonâs statistics module is used to calculate the average of a set of numeric values. It adds up all the values in a list and divides the total by the number of elements. For example, if we have a list [2, 4, 6, 8], the mean would be (2 + 4 + 6 + 8) / 4 = 5.0. This functio
4 min read
Python - Wald Distribution in Statistics
scipy.stats.wald() is a Wald continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. Parameters : q : lower and upper tail probability x : quantiles loc : [opti
2 min read
stdev() method in Python statistics module
The stdev() function in Python's statistics module is used to calculate the standard deviation of a dataset. It helps to measure the spread or variation of values in a sample. Standard deviation (SD) measures the spread of data points around the mean. A low SD indicates data points are close to the
2 min read