Descriptive Analysis in R Programming

Exploring Statistical Measures in R: Average, Variance, and Standard Deviation Explained

Last Updated : 06 Sep, 2024

Statistical measures like average, variance, and standard deviation are crucial in data analysis. These metrics help summarize data and understand its distribution. In this article, we'll explore how to calculate these measures in R. The R language provides very easy methods to calculate the average, variance, and standard deviation.

Table of Content

Average in R Programming
Variance in R Programming Language
Standard Deviation in R Programming Language

Average in R Programming

An average is a number expressing the central or typical value in a set of data, in particular the mode, median, or (most commonly) the mean, which is calculated by dividing the sum of the values in the set by their number.

[ \mu = \frac{1}{n} \sum_{i=1}^{n} x_i ]

where:

xi represents the data points.
n represents the total number of data points.

Suppose there are 8 data points. 2, 4, 4, 4, 5, 5, 7, 9 and the average of these 8 data points is,

A = \frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = 5

Computing Average in R Programming

To compute the average of values, R provides a pre-defined function mean(). This function takes a Numerical Vector as an argument and results in the average/mean of that Vector. The basic Syntax is:

mean(x, na.rm)
Parameters:
x: Numeric Vector
na.rm: Boolean value to ignore NA value

Lets discuss one example for Computing Average in R Programming.

R

# R program to get average of a list

# Taking a list of elements
list = c(2, 4, 4, 4, 5, 5, 7, 9)

# Calculating average using mean()
print(mean(list))

Output:

[1] 5

Example 2: Calculating average using mean() for list of elements

R

# R program to get average of a list

# Taking a list of elements
list = c(2, 40, 2, 502, 177, 7, 9)

# Calculating average using mean()
print(mean(list))

Output:

[1] 105.5714

Variance in R Programming Language

Variance is the sum of squares of differences between all numbers and means. The mathematical formula for variance is as follows,
\sigma^{2}= \frac { \sum_{i=1}^{N} (x_{i}-\mu)^{2}}{N}

where,

\mu \, is\, Mean,
N is the total number of elements or frequency of distribution.

Let's consider the same dataset that we have taken in average. First, calculate the deviations of each data point from the mean, and square the result of each,

\begin{array}{lll} (2-5)^2 = (-3)^2 = 9 && (5-5)^2 = 0^2 = 0 \\ (4-5)^2 = (-1)^2 = 1 && (5-5)^2 = 0^2 = 0 \\ (4-5)^2 = (-1)^2 = 1 && (7-5)^2 = 2^2 = 4 \\ (4-5)^2 = (-1)^2 = 1 && (9-5)^2 = 4^2 = 16. \\ \end{array}

variance = \frac{9 + 1 + 1 + 1 + 0 + 0 + 4 + 16}{8} = 4

Computing Variance in R Programming

We can calculate the variance by using var() function in R. the basic syntax for this:

var(x)
Where,
x: numeric vector

Lets discuss one example for Computing Variance in R Programming.

R

# R program to get variance of a list

# Taking a list of elements
list = c(2, 4, 4, 4, 5, 5, 7, 9)

# Calculating variance using var()
print(var(list))

Output:

[1] 4.571429

Standard Deviation in R Programming Language

Standard Deviation is the square root of variance. It is a measure of the extent to which data varies from the mean. The mathematical formula for calculating standard deviation is as follows,
Standard Deviation = \sqrt{ variance }

Standard Deviation for the above data,
Standard Deviation = \sqrt{ 4 } = 2

Computing Standard Deviation in R

One can calculate the standard deviation by using sd() function in R. The basic syntax for this:

sd(x)
Parameters:
x: numeric vector

Lets discuss one example for Computing Standard Deviation in R.

R

# R program to get 
# standard deviation of a list

# Taking a list of elements
list = c(2, 4, 4, 4, 5, 5, 7, 9)

# Calculating standard 
# deviation using sd()
print(sd(list))

Output:

[1] 2.13809

Calculating All Three Metrics for a Dataset

Let’s calculate the mean, variance, and standard deviation for the following dataset:

R

# Define the dataset
data <- c(12, 15, 18, 22, 30, 35)

# Calculate the mean
mean_value <- mean(data)
print(paste("Mean:", mean_value))

# Calculate the variance
variance_value <- var(data)
print(paste("Variance:", variance_value))

# Calculate the standard deviation
sd_value <- sd(data)
print(paste("Standard Deviation:", sd_value))

Output:

[1] "Mean: 22"

[1] "Variance: 79.6"

[1] "Standard Deviation: 8.92188320927819"

Visualizing Mean, Variance, and Standard Deviation Together

We will generate a dataset, create a density plot, and overlay the mean, standard deviation, and variance on the plot.

R

# Load the ggplot2 package
library(ggplot2)

# Define a dataset
set.seed(123)  # for reproducibility
data <- rnorm(100, mean = 50, sd = 10)  # 100 random values, mean 50, sd 10

# Calculate mean, variance, and standard deviation
mean_value <- mean(data)
sd_value <- sd(data)

# Calculate the variance
variance_value <- var(data)

# Create the plot with variance annotation
ggplot(data.frame(data), aes(x = data)) +
  geom_density(fill = "lightblue", alpha = 0.5) +  # Density plot
  geom_vline(aes(xintercept = mean_value), color = "red", linetype = "dashed", size = 1.2) +  # Mean line
  geom_vline(aes(xintercept = mean_value + sd_value), color = "green", linetype = "dotted", size = 1) +  # SD line (right)
  geom_vline(aes(xintercept = mean_value - sd_value), color = "green", linetype = "dotted", size = 1) +  # SD line (left)
  labs(title = "Visualization of Mean, Variance, and Standard Deviation",
       x = "Data Values",
       y = "Density") +
  theme_minimal() +
  annotate("text", x = mean_value, y = 0.03, label = paste("Mean =", round(mean_value, 2)), color = "red", vjust = -1) + 
  annotate("text", x = mean_value + sd_value, y = 0.02, label = paste("Mean + SD =", round(mean_value + sd_value, 2)), color = "green", vjust = -1) +
  annotate("text", x = mean_value - sd_value, y = 0.02, label = paste("Mean - SD =", round(mean_value - sd_value, 2)), color = "green", vjust = -1) +
  annotate("text", x = mean_value + 20, y = 0.04, label = paste("Variance =", round(variance_value, 2)), color = "blue", vjust = -1)

Output:

Screenshot-2024-09-05-135638 — Visualizing Mean, Variance, and Standard Deviation

The mean as a red dashed line in the center of the distribution.
The standard deviation lines as green dotted lines on both sides of the mean, indicating the spread of the data.
Variance annotation: The annotate() function adds a label showing the variance in blue text at a specified location (in this case, to the right of the mean).

This visualization provides the way to see how the data is distributed around the mean and how spread out it is using the standard deviation. The variance is inherently visualized as part of the spread between the standard deviation lines.

Conclusion

In R, calculating the average, variance, and standard deviation is simple and efficient using the mean(), var(), and sd() functions. These metrics provide valuable insights into the central tendency and dispersion of your data, helping to summarize and understand the distribution.

Descriptive Analysis in R Programming

A

AmiyaRanjanRout

Improve

Article Tags :

Similar Reads

R Tutorial | Learn R Programming Language

R is an interpreted programming language widely used for statistical computing, data analysis and visualization. R language is open-source with large community support. R provides structured approach to data manipulation, along with decent libraries and packages like Dplyr, Ggplot2, shiny, Janitor a