Open In App

How to Calculate Confidence Intervals in Python?

Last Updated : 28 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Confidence interval (CI) is a statistical range that estimates the true value of a population parameter, like the population mean, with a specified probability. It provides a range where the true value is likely to lie, based on sample data. The confidence level (e.g., 95%) indicates how certain we are that the true value is within this range. Formula:

Formula
Confidence interval
  • x: sample mean
  • t: t-value that corresponds to the confidence level
  • s: sample standard deviation
  • n: sample size

Using scipy.stats.t.interval

This method involves providing the confidence level, sample mean, sample standard deviation and sample size. It is particularly useful for small samples where the population standard deviation is unknown and you want to estimate the range within which the true mean lies, with a specified confidence level.

Python
import numpy as np
import scipy.stats as stats

d = [12, 15, 14, 10, 13, 17, 14, 15, 16, 14]
cl = 0.95 # confidence level

# confidence interval
ci = stats.t.interval(confidence_level, df=len(d)-1, loc=np.mean(d), scale=np.std(d, ddof=1) / np.sqrt(len(d)))
print(ci)

Output

(np.float64(12.56928618802332), np.float64(15.43071381197668))

Explanation:

  • np.mean(d) computes the sample mean of the data.
  • np.std(d, ddof=1) calculates the sample standard deviation, using ddof=1 for sample standard deviation (Bessel's correction).
  • len(d) determines the number of data points (sample size).
  • stats.t.interval() calculates the confidence interval using the confidence level, degrees of freedom,sample mean and standard error (sample standard deviation divided by the square root of sample size).

Using statsmodels

This approach uses Ordinary Least Squares (OLS) regression to calculate confidence intervals for the regression coefficients. The sm.OLS function fits a linear regression model, and conf_int() is used to retrieve the confidence intervals for the model's parameters. It’s ideal for understanding the uncertainty in the estimated parameters of a regression model.

Python
import numpy as np
import statsmodels.api as sm

d = [12, 15, 14, 10, 13, 17, 14, 15, 16, 14] 

# Create a model
X = sm.add_constant(np.array(d))
model = sm.OLS(d, X)
res = model.fit()

# Confidence interval
ci = res.conf_int(alpha=0.05)
print(ci)

Output

[[-3.98583075e-14  7.88388442e-15]
[ 1.00000000e+00 1.00000000e+00]]

Explanation:

  • sm.add_constant(np.array(d)) adds an intercept term to the data for linear regression.
  • sm.OLS(d, X) creates an OLS regression model with d as the dependent variable and X (with the intercept) as the independent variable.
  • model.fit() fits the regression model to the data.
  • res.conf_int(alpha=0.05) computes the 95% confidence interval for the model's coefficients.

Using numpy and scipy

This method manually computes the confidence interval by first calculating the t-value, sample standard deviation and standard error. The margin of error is then determined and added or subtracted from the sample mean to form the confidence interval. This hands-on approach is straightforward and works well when you need to compute the interval using basic statistical formulas.

Python
import numpy as np
import scipy.stats as stats

d = [12, 15, 14, 10, 13, 17, 14, 15, 16, 14] 
m, s, n = np.mean(d), np.std(d, ddof=1), len(d)  # Mean, SD, Size
t = stats.t.ppf(0.975, df=n-1)  # t-value

e = t * (s / np.sqrt(n))  # Margin
print(m - e, m + e) 

Output

12.56928618802332 15.43071381197668

Explanation:

  • stats.t.ppf(0.975, df=n-1) calculates the t-value for a 95% confidence level with n-1 degrees of freedom.
  • t * (s / np.sqrt(n)) computes the margin of error by multiplying the t-value with the standard error of the mean.

Using pandas

This approach is similar to the previous one but utilizes pandas for easier data manipulation. It calculates the sample mean and standard deviation from a DataFrame and uses the t-value and margin of error to compute the confidence interval. This method is helpful when working with structured data in pandas, especially for large datasets.

Python
import pandas as pd
import numpy as np
import scipy.stats as stats

d = [12, 15, 14, 10, 13, 17, 14, 15, 16, 14]  # Data
df = pd.DataFrame(d, columns=['data'])

m, s, n = df['data'].mean(), df['data'].std(ddof=1), len(df)
t = stats.t.ppf(0.975, df=n-1)  # t-value
e = t * (s / np.sqrt(n))  # Margin
print(m - e, m + e) 

Output

12.56928618802332 15.43071381197668

Explanation:

  • stats.t.ppf(0.975, df=n-1) finds the t-value for a 95% confidence level with n-1 degrees of freedom.
  • t * (s / np.sqrt(n)) calculates the margin of error as the t-value multiplied by the standard error of the mean.

Next Article
Practice Tags :

Similar Reads