Sem-6_Statistical-Analysis_26089
Sem-6_Statistical-Analysis_26089
Date: __ - __ - 2025
(Arvind Kumar Patel)
Practical_01
Aim: Generate sequences of N random numbers M (at least 10000) number of times from
different distributions (e.g. Binomial, Poisson, Normal). Use the arithmetic mean of each
random vector (of size N) and plot the distribution of the arithmetic means. Verify the
Central Limit Theorem (CLT) for each distribution. Show that CLT is violated for the Cauchy-Lorentz
distribution.
The Central Limit Theorem (CLT) is a fundamental concept in statistics. It states that:
When we take a large number of samples (each of size N) from any independent and
identically distributed (i.i.d.) population with finite mean and variance, the distribution
of their sample means tends to follow a Normal (Gaussian) distribution — regardless of
the original distribution.
Works for: Binomial, Poisson, Normal (because they have finite mean and variance).
Fails for: Cauchy-Lorentz distribution (mean and variance are undefined → CLT doesn’t
apply).
This program verify this concept by generating sample means from different distributions and plotting
them.
Algorithm:
Step 1: Import necessary libraries:
- numpy for random number generation and statistics
- matplotlib for plotting
- scipy's norm for Gaussian curve
Output:
Project_02
Hypothesis testing is used to decide whether data supports a certain claim or assumption (called a
hypothesis).
We use a Binomial Test to compute the p-value — the probability of getting the observed
result (or more extreme) under the assumption H₀ is true.
Algorithm:
Step 1: Import Required Libraries
- numpy: for simulating random coin tosses.
- scipy.stats.binomtest: for performing the binomial hypothesis test.
Step 2: Initialize Parameters
- n: number of coin tosses (e.g., 100).
- q: actual probability of getting heads (e.g., 0.6).
- Set a random seed using np.random.seed() to ensure reproducibility.
Step 3: Simulate Coin Tosses
- Use np.random.binomial(trials=1, prob=q, size=n) to simulate n tosses.
- Each toss returns 1 for head and 0 for tail.
- Store the toss results in a variable (e.g., `tosses`).
Step 4: Count Number of Heads
- Use np.sum(tosses) to count how many times head appeared.
- Store this in a variable (e.g., `heads`).
Step 5: Perform Binomial Test
- Use binomtest(k=heads, n=n, p=0.5, alternative='two-sided') to test:
H₀: The coin is fair (q = 0.5)
H₁: The coin is biased (q ≠ 0.5)
- Store the result in a variable (e.g., `test`).
Step 6: Print Results
- Display the number of heads and total tosses.
- Print the p-value of the test.
- If p-value < 0.05 (5% significance level), conclude that we "Reject H₀".
- Otherwise, conclude "Fail to Reject H₀" (not enough evidence to say it's biased).
Code:
import numpy as np
from scipy.stats import binomtest
# Parameters
n = 100 # number of coin tosses
q = 0.6 # actual probability of heads
np.random.seed(1)
# Simulate coin tosses (1=head, 0=tail)
tosses = np.random.binomial(1, q, n)
heads = np.sum(tosses)
# Perform two-sided binomial test
test = binomtest(heads, n, 0.5, alternative='two-sided')
print(f"Heads: {heads}/{n}")
print(f"p-value: {test.pvalue:.4f}")
print("Result:", "Reject H₀" if test.pvalue < 0.05 else "Fail to
Reject H₀")
print("Conclusion: The coin is biased" if test.pvalue < 0.05
else "Conclusion: The coin is fair")
Output:
PS D:/B.Sc_6th_sem/Core/stat-analysis/unit2.1/Hypothesis_testing.py
Heads: 49/100
p-value: 0.9204
Result: Fail to Reject H₀
Conclusion: The coin is fair
PS D:/B.Sc_6th_sem/Core/stat-analysis/unit2.1/Hypothesis_testing.py
Heads: 61/100
p-value: 0.0352
Result: Reject H₀
Conclusion: The coin is biased
Project_03
Aim: Write a code to generate a Markov chain by defining (a finite number of) M (say 2)
states.
Encode states using a number and assign their probabilities for changing from state i to
state j. Compute the transition matrix for 1, 2,…, N steps. Following the rule, write a code
for Markovian Brownian motion of a particle.
Theory:
Markov Chain:
A Markov Chain is a stochastic process that moves through a set of discrete states with
transition probabilities. The key property is:
Markov Property: The future state depends only on the present state, not on the sequence of
events that preceded it.
The transition matrix (P) defines the probability of moving from one state to another:
perl
CopyEdit
P[i][j] = probability of moving from state i to state j
● The next state depends only on the current state and the transition matrix P.
Over time, this stochastic process resembles random walk (Brownian motion) but with
memory (via transition probabilities) — hence “Markovian”.
Algorithm:
Step 1: Import necessary libraries
- numpy for matrix operations and random number generation
- matplotlib.pyplot for plotting motion
Code:
import numpy as np
import matplotlib.pyplot as plt
# Transition matrix for 2 states
P = np.array([[0.7, 0.3],
[0.4, 0.6]])
# Compute transition matrices up to N steps
N = 5
for n in range(1, N + 1):
print(f"Step {n}:\n{np.linalg.matrix_power(P, n)}\n")
# Simulate Markovian Brownian motion
state, pos, motion = 0, 0, [0]
for _ in range(100):
state = 0 if np.random.rand() < P[state][0] else 1
pos += -1 if state == 0 else 1
motion.append(pos)
# Plot the motion
plt.plot(motion)
plt.title("Markovian Brownian Motion")
plt.xlabel("Time")
plt.ylabel("Position")
plt.grid(True)
plt.show()
Output:
Step 1:
[[0.7 0.3]
[0.4 0.6]]
Step 2:
[[0.61 0.39]
[0.52 0.48]]
Step 3:
[[0.583 0.417]
[0.556 0.444]]
Step 4:
[[0.5749 0.4251]
[0.5668 0.4332]]
Step 5:
[[0.57247 0.42753]
[0.57004 0.42996]]
Result : The transition matrices for steps 1 through N correctly show the evolving
state-to-state probabilities over time.
The simulated Markovian Brownian motion reflects a random walk where the particle’s
direction depends on state transitions. The motion plot shows fluctuating position over time,
confirming the behavior of a Markov-driven stochastic process.
Project_04
Aim: Write a code to minimize the cost function (mean squared error) in the linear
regression using gradient descent (an iterative optimization algorithm, which finds the
minimum of a differentiable function) with at least two independent variables. Determine
the correlation matrix for the regression parameters.
Theory:
Linear Regression using Gradient Descent
Linear Regression estimates the relationship between dependent variable Y and independent
variables X₁, X₂.... The model is:
θ = θ - α * ∇(Cost)
Where α is the learning rate and ∇(Cost) is the gradient of the MSE.
Algorithm:
Step 1: Import numpy for calculations.
Step 2: Generate synthetic data with two independent variables (X1, X2) and one dependent
variable Y.
Step 4: Initialize parameter vector theta = [θ₀, θ₁, θ₂] with zeros.
Step 5: Define learning rate and number of iterations for gradient descent.
Step 8: Calculate and display the correlation matrix using np.corrcoef() for [X1, X2, Y].
Code:
Output:
Estimated Parameters (theta): [1.02250584 2.98536193 1.98830098]
Correlation Matrix:
[[ 1. -0.0248661 0.80470196]
[-0.0248661 1. 0.56574719]
[ 0.80470196 0.56574719 1. ]]
Result :
After training, the model estimates the parameters close to the true values used for
generating data (approximately θ ≈ [1, 3, 2]).
The correlation matrix shows a strong linear relationship between the dependent variable Y
and the independent variables X₁ and X₂, validating the effectiveness of the regression model.
–ThankYou Sir
___________