0% found this document useful (0 votes)
5 views11 pages

Sem-6_Statistical-Analysis_26089

The document outlines a series of practical projects conducted by Arvind Kumar Patel at Atma Ram Sanatan Dharma College, University of Delhi, focusing on statistical analysis in physics. It includes aims, theories, algorithms, and code for various projects such as the Central Limit Theorem demonstration, hypothesis testing using a binomial test, Markov chain simulation, and linear regression using gradient descent. Each project includes detailed steps and results, showcasing the application of statistical concepts and programming in Python.

Uploaded by

ANIKET SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Sem-6_Statistical-Analysis_26089

The document outlines a series of practical projects conducted by Arvind Kumar Patel at Atma Ram Sanatan Dharma College, University of Delhi, focusing on statistical analysis in physics. It includes aims, theories, algorithms, and code for various projects such as the Central Limit Theorem demonstration, hypothesis testing using a binomial test, Markov chain simulation, and linear regression using gradient descent. Each project includes detailed steps and results, showcasing the application of statistical concepts and programming in Python.

Uploaded by

ANIKET SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Atma Ram Sanatan Dharma College, University of Delhi

Name: Arvind Kumar Patel


College-Roll-No: 22/26089
University-Roll-No: 22003567041
Course: B.Sc Hons Physics
Paper: Statistical Analysis in Physics

Date: __ - __ - 2025
(Arvind Kumar Patel)
Practical_01

Aim: Generate sequences of N random numbers M (at least 10000) number of times from
different distributions (e.g. Binomial, Poisson, Normal). Use the arithmetic mean of each
random vector (of size N) and plot the distribution of the arithmetic means. Verify the
Central Limit Theorem (CLT) for each distribution. Show that CLT is violated for the Cauchy-Lorentz
distribution.

Theory: Central Limit Theorem (CLT)

The Central Limit Theorem (CLT) is a fundamental concept in statistics. It states that:

When we take a large number of samples (each of size N) from any independent and
identically distributed (i.i.d.) population with finite mean and variance, the distribution
of their sample means tends to follow a Normal (Gaussian) distribution — regardless of
the original distribution.

Works for: Binomial, Poisson, Normal (because they have finite mean and variance).

Fails for: Cauchy-Lorentz distribution (mean and variance are undefined → CLT doesn’t
apply).​

This program verify this concept by generating sample means from different distributions and plotting
them.

Algorithm:
Step 1: Import necessary libraries:
- numpy for random number generation and statistics
- matplotlib for plotting
- scipy's norm for Gaussian curve

Step 2: Define constants:


- N = size of each random sequence
- M = number of sequences (i.e., how many times we repeat sampling)

Step 3: Create a dictionary of distributions with generator functions:


- Binomial, Poisson, Normal (CLT should hold)
- Cauchy (CLT should fail)

Step 4: Loop through each distribution:


a) Generate M random sequences (each of size N)
b) Compute the mean of each sequence
c) (For Cauchy) Remove extreme outliers for better plot visibility
d) Plot histogram of sample means
e) Overlay a fitted Gaussian curve to observe convergence

Step 5: Show all plots together to compare behavior.


Code:
import numpy as np, matplotlib.pyplot as plt;​
from scipy.stats import norm​

N, M = 1000, 10000; np.random.seed(42)​
d = {"Binomial":lambda:np.random.binomial(10,0.5,N),​
"Poisson":lambda:np.random.poisson(4,N),​
"Normal":lambda:np.random.normal(0,1,N),​
"Cauchy":lambda:np.random.standard_cauchy(N)}​

f, a = plt.subplots(2,2,figsize=(10,8))​
for ax,(k,v) in zip(a.ravel(),d.items()):​
m = [np.mean(v()) for _ in range(M)]​
if k=="Cauchy": m = [i for i in m if -50<i<50]​
ax.hist(m,50,density=1,alpha=.6);
mu,std=np.mean(m),np.std(m)​
x = np.linspace(mu-4*std, mu+4*std,500)​
ax.plot(x, norm.pdf(x,mu,std), 'r--'); ax.set_title(k)​

plt.suptitle("Central Limit Theorem Demonstration", fontsize=16)​
plt.tight_layout();​
plt.show()

Output:
Project_02

Aim: Hypothesis testing


Make a random number generator to simulate the tossing of a coin n times with the
probability for the head being q. Write a code for a Binomial test with the Null hypothesis
Ho (q = 0.5) against the alternative hypothesis HI (q ≠ 0.5).

Theory: Hypothesis Testing (Binomial Test)

Hypothesis testing is used to decide whether data supports a certain claim or assumption (called a
hypothesis).

We simulate tossing a biased coin n times with actual probability of heads q.

Null Hypothesis (H₀): q = 0.5 (fair coin)

Alternative Hypothesis (H₁): q ≠ 0.5 (biased coin)

We use a Binomial Test to compute the p-value — the probability of getting the observed
result (or more extreme) under the assumption H₀ is true.​

If p-value < 0.05 → we reject H₀ (coin is biased).

Algorithm:
Step 1: Import Required Libraries
- numpy: for simulating random coin tosses.
- scipy.stats.binomtest: for performing the binomial hypothesis test.
Step 2: Initialize Parameters
- n: number of coin tosses (e.g., 100).
- q: actual probability of getting heads (e.g., 0.6).
- Set a random seed using np.random.seed() to ensure reproducibility.
Step 3: Simulate Coin Tosses
- Use np.random.binomial(trials=1, prob=q, size=n) to simulate n tosses.
- Each toss returns 1 for head and 0 for tail.
- Store the toss results in a variable (e.g., `tosses`).
Step 4: Count Number of Heads
- Use np.sum(tosses) to count how many times head appeared.
- Store this in a variable (e.g., `heads`).
Step 5: Perform Binomial Test
- Use binomtest(k=heads, n=n, p=0.5, alternative='two-sided') to test:
H₀: The coin is fair (q = 0.5)
H₁: The coin is biased (q ≠ 0.5)
- Store the result in a variable (e.g., `test`).
Step 6: Print Results
- Display the number of heads and total tosses.
- Print the p-value of the test.
- If p-value < 0.05 (5% significance level), conclude that we "Reject H₀".
- Otherwise, conclude "Fail to Reject H₀" (not enough evidence to say it's biased).

Code:
import numpy as np​
from scipy.stats import binomtest​

# Parameters​
n = 100 # number of coin tosses​
q = 0.6 # actual probability of heads​
np.random.seed(1)​

# Simulate coin tosses (1=head, 0=tail)​
tosses = np.random.binomial(1, q, n)​
heads = np.sum(tosses)​

# Perform two-sided binomial test​
test = binomtest(heads, n, 0.5, alternative='two-sided')​

print(f"Heads: {heads}/{n}")​
print(f"p-value: {test.pvalue:.4f}")​
print("Result:", "Reject H₀" if test.pvalue < 0.05 else "Fail to
Reject H₀")​
print("Conclusion: The coin is biased" if test.pvalue < 0.05
else "Conclusion: The coin is fair")

Output:

PS D:/B.Sc_6th_sem/Core/stat-analysis/unit2.1/Hypothesis_testing.py
Heads: 49/100
p-value: 0.9204
Result: Fail to Reject H₀
Conclusion: The coin is fair

PS D:/B.Sc_6th_sem/Core/stat-analysis/unit2.1/Hypothesis_testing.py
Heads: 61/100
p-value: 0.0352
Result: Reject H₀
Conclusion: The coin is biased
Project_03

Aim: Write a code to generate a Markov chain by defining (a finite number of) M (say 2)
states.
Encode states using a number and assign their probabilities for changing from state i to
state j. Compute the transition matrix for 1, 2,…, N steps. Following the rule, write a code
for Markovian Brownian motion of a particle.

Theory:
Markov Chain:
A Markov Chain is a stochastic process that moves through a set of discrete states with
transition probabilities. The key property is:
Markov Property: The future state depends only on the present state, not on the sequence of
events that preceded it.
The transition matrix (P) defines the probability of moving from one state to another:
perl
CopyEdit
P[i][j] = probability of moving from state i to state j

Markovian Brownian Motion:


Here, the particle moves left or right in 1D space:
●​ If it's in state 0, it moves left (−1).​

●​ If in state 1, it moves right (+1).​

●​ The next state depends only on the current state and the transition matrix P.​

Over time, this stochastic process resembles random walk (Brownian motion) but with
memory (via transition probabilities) — hence “Markovian”.

Algorithm:
Step 1: Import necessary libraries
- numpy for matrix operations and random number generation
- matplotlib.pyplot for plotting motion

Step 2: Define the transition matrix P for M=2 states


- Example:
P = [[0.7, 0.3],
[0.4, 0.6]]
- This means:
From state 0: 70% chance to stay, 30% to go to state 1
From state 1: 40% to go to state 0, 60% to stay

Step 3: Compute transition matrices for multiple steps


- For steps n = 1 to N:
Use numpy’s matrix power: np.linalg.matrix_power(P, n)
- These matrices show the probability of transitioning between states over multiple steps.
Step 4: Simulate Markovian Brownian motion
- Start at initial state = 0 and position = 0
- For a fixed number of steps (e.g., 100):
- Use np.random.rand() to randomly decide next state based on P
- Update the state (0 or 1)
- Move position:
- If state is 0: move left (−1)
- If state is 1: move right (+1)
- Record the position at each step in a list

Step 5: Plot the motion


- Use matplotlib to plot position vs time
- This shows how the particle’s position evolves over time

Code:
import numpy as np​
import matplotlib.pyplot as plt​

# Transition matrix for 2 states​
P = np.array([[0.7, 0.3],​
[0.4, 0.6]])​

# Compute transition matrices up to N steps​
N = 5​
for n in range(1, N + 1):​
print(f"Step {n}:\n{np.linalg.matrix_power(P, n)}\n")​

# Simulate Markovian Brownian motion​
state, pos, motion = 0, 0, [0]​
for _ in range(100):​
state = 0 if np.random.rand() < P[state][0] else 1​
pos += -1 if state == 0 else 1​
motion.append(pos)​

# Plot the motion​
plt.plot(motion)​
plt.title("Markovian Brownian Motion")​
plt.xlabel("Time")​
plt.ylabel("Position")​
plt.grid(True)​
plt.show()

Output:
Step 1:
[[0.7 0.3]
[0.4 0.6]]

Step 2:
[[0.61 0.39]
[0.52 0.48]]

Step 3:
[[0.583 0.417]
[0.556 0.444]]

Step 4:
[[0.5749 0.4251]
[0.5668 0.4332]]

Step 5:
[[0.57247 0.42753]
[0.57004 0.42996]]

Result : The transition matrices for steps 1 through N correctly show the evolving
state-to-state probabilities over time.
The simulated Markovian Brownian motion reflects a random walk where the particle’s
direction depends on state transitions. The motion plot shows fluctuating position over time,
confirming the behavior of a Markov-driven stochastic process.
Project_04

Aim: Write a code to minimize the cost function (mean squared error) in the linear
regression using gradient descent (an iterative optimization algorithm, which finds the
minimum of a differentiable function) with at least two independent variables. Determine
the correlation matrix for the regression parameters.

Theory:
Linear Regression using Gradient Descent
Linear Regression estimates the relationship between dependent variable Y and independent
variables X₁, X₂.... The model is:

Y = θ₀ + θ₁X₁ + θ₂X₂ + ... + ε

Mean Squared Error (MSE) is the cost function to minimize:

MSE = (1/n) * Σ (Yᵢ - Ŷᵢ)²

Gradient Descent updates parameters iteratively:

θ = θ - α * ∇(Cost)

Where α is the learning rate and ∇(Cost) is the gradient of the MSE.

Algorithm:
Step 1: Import numpy for calculations.

Step 2: Generate synthetic data with two independent variables (X1, X2) and one dependent
variable Y.

Step 3: Add a bias term (column of ones) to the feature matrix X.

Step 4: Initialize parameter vector theta = [θ₀, θ₁, θ₂] with zeros.

Step 5: Define learning rate and number of iterations for gradient descent.

Step 6: For each iteration:


- Compute predictions: Ŷ = X @ θ
- Calculate error: (Ŷ - Y)
- Compute gradient: grad = (2/n) * Xᵗ @ error
- Update parameters: θ -= learning_rate * grad

Step 7: After convergence, print the estimated parameters.

Step 8: Calculate and display the correlation matrix using np.corrcoef() for [X1, X2, Y].
Code:

import numpy as np​



# Generate synthetic data​
np.random.seed(1)​
n = 100​
X1 = np.random.rand(n)​
X2 = np.random.rand(n)​
Y = 3*X1 + 2*X2 + 1 + np.random.randn(n)*0.1 # True relation + noise​

# Add bias term and stack features​
X = np.c_[np.ones(n), X1, X2] # Shape: (n, 3)​
theta = np.zeros(3) # Initialize weights [bias, w1, w2]​

# Gradient descent​
lr, epochs = 0.1, 1000​
for _ in range(epochs):​
preds = X @ theta​
error = preds - Y​
grad = (2/n) * X.T @ error​
theta -= lr * grad​

print(f"Estimated Parameters (theta): {theta}")​

# Correlation matrix of parameters (sample-based estimate)​
params_matrix = np.corrcoef([X1, X2, Y])​
print("\nCorrelation Matrix:\n", params_matrix)

Output:
Estimated Parameters (theta): [1.02250584 2.98536193 1.98830098]

Correlation Matrix:
[[ 1. -0.0248661 0.80470196]
[-0.0248661 1. 0.56574719]
[ 0.80470196 0.56574719 1. ]]

Result :
After training, the model estimates the parameters close to the true values used for
generating data (approximately θ ≈ [1, 3, 2]).
The correlation matrix shows a strong linear relationship between the dependent variable Y
and the independent variables X₁ and X₂, validating the effectiveness of the regression model.
–ThankYou Sir
___________

You might also like