100% found this document useful (2 votes)
3K views

Fresco

The document discusses preprocessing steps for a CIFAR-10 image classification dataset including normalization, ZCA whitening, and singular value decomposition. The CIFAR-10 dataset is downloaded and split into training and test sets. The images are normalized to have zero mean and unit variance. ZCA whitening is applied to reduce data redundancy and correlation between features. Singular value decomposition is used to extract features from the images.

Uploaded by

vinay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
3K views

Fresco

The document discusses preprocessing steps for a CIFAR-10 image classification dataset including normalization, ZCA whitening, and singular value decomposition. The CIFAR-10 dataset is downloaded and split into training and test sets. The images are normalized to have zero mean and unit variance. ZCA whitening is applied to reduce data redundancy and correlation between features. Singular value decomposition is used to extract features from the images.

Uploaded by

vinay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 17

<audio id=""demo"" src=""audio.

mp3""></audio>
<div>
<button onclick=""document.getElementById('demo').play()"">
Play the Audio </button>
</div>

from itertools import combinations


from itertools import permutations
import numpy as np
import math
arr=np.array(['H','O','R','S','E'])
print(len(list(combinations(arr, 2)) ))
print(len(list(permutations(arr,2) )))
ort numpy as np
from scipy import stats
import statistics

def measures(arr):
#Write your code here
'''
Input: arr : numpy array
Return : mean,median,std_deviation,variance,mode,iqr : float

Note:
1. Assign the values to designated variables
2. Round off to 2 decimal places
'''
n=len(arr)
mean=np.mean(arr)
median=np.median(arr)
std_deviation=round(np.std(arr),2)
variance=round(np.var(arr),2)
mode=int(stats.mode(numbers)[0])
iqr=(np.percentile(arr,75,interpolation='midpoint')-
np.percentile(arr,25,interpolation='midpoint'))

return mean,median,std_deviation,variance,mode,iqr

if __name__=='__main__':
array1=[]
n=int(input())
for i in range(n):
array1.append(float(input()))
narray1=np.array(array1)
print(measures(narray1))

import math
red=math.factorial(10)/((math.factorial(7))*math.factorial(3))
blue=math.factorial(8)/((math.factorial(3))*math.factorial(5))
print(red*blue)
from scipy import stats
import numpy as np
import statistics
data ={"A": 90,"B": 86,"C":70,"D":95,"E":95,"F":95,"G":95}
values = list(data.values())
print("Mean")
print(np.mean(values))
print("Median")
print(np.median(values))
print("Mode")
print(statistics.mode(values))
print("Standard Deviation")
print(np.std(values))
print("Variance")
print(np.var(values))
print("range")
print(stats.iqr(values))

Find

P(getting even or prime)=P(A∪B)P(A\cup B)P(A∪B)

prob_getting_even=3/6
prob_getting_prime=3/6
prob_even_and_prime=1/6
prob_even_or_prime=0.5+0.5-0.16
print(prob_even_or_prime)

prob_getting_king=1/13
prob_getting_queen=1/13
prob_king_or_queen=2/13
print(prob_king_or_queen)

Result

0.153

Question:

80 % of people who purchase pet insurance are women. If 9 pet insurance owners are
randomly selected, find the probability that precisely 6 are women.

Solution:

#n=9
#p=0.80
#k=6
from scipy import stats
probability=stats.binom.pmf(6,9,0.80)
print(probability)

Question:

If the number of vehicles that pass through a junction on a busy road is at an


average rate of 300 per hour, find the probability that no vehicle passes in a
given minute.

Python Code

from scipy import stats


averagepass=300/60
probability=stats.poisson.pmf(0, averagepass)
print(probability)

from scipy.stats import chi2_contingency


from scipy.stats import chi2
table =[[30, 10], [15, 25], [15, 5]]
stat,p,dof,expected = chi2_contingency(table)
prob = 0.05
critical = chi2.ppf(prob, dof)
if abs(stat) >= critical:
print('Dependent (reject H0)')
else:
print('Independent (fail to reject H0)')

Output

[[30, 10], [15, 25], [15, 5]]


-----------------------
Expected Values
[[24. 16.]
[24. 16.]
[12. 8.]]
-------------------------------------
Chi-Square Statistic = 14.062
Degree of Freedom =2
P value = 0.001
-------------------------------------
Significance level =0.050, P-Value=0.001
Dependent : Reject the null Hypothesis

from scipy.stats import chi2_contingency


from scipy.stats import chi2

def chi_test():
'''
Output
1. stat: Float
2. dof : Integer
3. p_val: Float
4. res: String
'''
#Note: Round off the Float values to 2 decimal places.

table = [[18, 36, 21, 9, 6], [12, 36, 45, 36, 21], [6, 9, 9, 3, 3], [3, 9, 9,6,
3]]

stat,p,dof,expected= chi2_contingency(table)

if p<=0.05:
res='Reject the Null Hypothesis'
else:
res='Failed to reject the Null Hypothesis'

return stat,dof,p_val

if __name__=='__main__':
print(chi_test())

Dataset Download
Dataset Download

The dataset is available at the location: http://www.cs.toronto.edu/~kriz/cifar-10-


python.tar.gz

Open the terminal and type the below command to download :

curl http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz>cifar-10-python.tar.gz

This downloads the data and saves it as cifar-10-python.tar.gz.

Extract the tar.gz file using the subsequent command:

tar -xzvf cifar-10-python.tar.gz

import os
import numpy as np
def _load_cifar10_batch(file):
import cPickle
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict['data'].reshape(-1, 32, 32, 3), dict['labels'] # reshaping the data
to 32 x 32 x 3
print('Loading...')
batch_fns = [os.path.join("./", 'cifar-10-batches-py', 'data_batch_' + str(i)) for
i in range(1, 6)]
data_batches = [_load_cifar10_batch(fn) for fn in batch_fns]

Data Stacking
Data Stacking
The batches loaded are stacked into a big array.

data_all = np.vstack([data_batches[i][0] for i in


range(len(data_batches))]).astype('float')
labels_all = np.vstack([data_batches[i][1] for i in
range(len(data_batches))]).flatten()

Subset Generation

As explained in dataset description, we use only a subset of CIFAR-10 dataset.

The dataset with 50,000 samples is split in the ratio 92:8. This split is done
to take a smaller portion of 50000 samples (i.e the 8% contains only 4000 images).

These 4000 samples are used for generating the train and test sets for
classification.

Here, StratifiedShuffleSplit is used to split the dataset. It splits the data by


taking equal number of samples from each class in a random manner.

#Splitting the whole training set into 92:8


seed=7
from sklearn.cross_validation import StratifiedShuffleSplit
data_split = StratifiedShuffleSplit(labels_all,1, test_size=0.08,random_state=seed)
#creating data_split object with 8% test size
for train_index, test_index in data_split:
split_data_92, split_data_8 = data_all[train_index], data_all[test_index]

split_label_92, split_label_8 = labels_all[train_index], labels_all[test_index]

Data Splitting
Data Splitting

4000 samples are split in the ratio 7:3. (i.e., 2800 for training and 1200 for
testing) using StratifiedShuffleSplit.

#Splitting the training set into 70 and 30


train_test_split = StratifiedShuffleSplit(split_label_8,1,
test_size=0.3,random_state=seed) #test_size=0.3 denotes that 30 % of the dataset is
used for testing.
for train_index, test_index in train_test_split:
train_data_70, test_data_30 = split_data_8[train_index],
split_data_8[test_index]
train_label_70, test_label_30 = split_label_8[train_index],
split_label_8[test_index]
train_data = train_data_70 #assigning to variable train_data
train_labels = train_label_70 #assigning to variable train_labels
test_data = test_data_30
test_labels = test_label_30

You can see the size of the above variables using:

print 'train_data : ', train_data.shape


print 'train_labels : ', train_labels.shape
print 'test_data : ', test_data.shape
print 'test_labels : ', test_labels.shape

Normalization

Normalization is the process of converting the pixel intensity values to a normal


state.

It follows a normal distribution.

A normalized image has mean = 0 and variance = 1

# definition of normalization function


def normalize(data, eps=1e-8):
data -= data.mean(axis=(1, 2, 3), keepdims=True)
std = np.sqrt(data.var(axis=(1, 2, 3), ddof=1, keepdims=True)) # calculating
standard deviation
std[std < eps] = 1.
data /= std
return data
# calling the function
train_data = normalize(train_data)
test_data = normalize(test_data)
# prints the shape of train data and test data
print 'train_data: ', train_data.shape
print 'test_data: ', test_data.shape

You can also try out the same in python Sklearn library using the following link .

ZCA Whitening

Normalization is followed by a ZCA whitening process.

The main aim of whitening is to reduce data redundancy, which means the features
are less correlated and have the same variance.

ZCA stands for zero-phase component analysis. ZCA whitened images resemble the
normal image.

# Computing whitening matrix


train_data_flat = train_data.reshape(train_data.shape[0], -1).T
test_data_flat = test_data.reshape(test_data.shape[0], -1).T
print('train_data_flat: ', train_data_flat.shape)
print('test_data_flat: ', test_data_flat.shape)
train_data_flat_t = train_data_flat.T
test_data_flat_t = test_data_flat.T

ZCA Whitening

Normalization is followed by a ZCA whitening process.

The main aim of whitening is to reduce data redundancy, which means the features
are less correlated and have the same variance.

ZCA stands for zero-phase component analysis. ZCA whitened images resemble the
normal image.
# Computing whitening matrix
train_data_flat = train_data.reshape(train_data.shape[0], -1).T
test_data_flat = test_data.reshape(test_data.shape[0], -1).T
print('train_data_flat: ', train_data_flat.shape)
print('test_data_flat: ', test_data_flat.shape)
train_data_flat_t = train_data_flat.T
test_data_flat_t = test_data_flat.T

Singular Value Decomposition (SVD)

The below code for SVD may not work in the available online cloud playground due to
package issues. So, it is better to try this out in a local Python environment.

from skimage import color


# definition for SVD
def svdFeatures(input_data):
svdArray_input_data=[]
size = input_data.shape[0]
for i in range (0,size):
img=color.rgb2gray(input_data[i])
U, s, V = np.linalg.svd(img, full_matrices=False);
S=[s[i] for i in range(30)]
svdArray_input_data.append(S)
svdMatrix_input_data=np.matrix(svdArray_input_data)
return svdMatrix_input_data
# apply SVD for train and test data
train_data_svd=svdFeatures(train_data)
test_data_svd=svdFeatures(test_data)

Singular Value Decomposition (SVD)

The below code for SVD may not work in the available online cloud playground due to
package issues. So, it is better to try this out in a local Python environment.

from skimage import color


# definition for SVD
def svdFeatures(input_data):
svdArray_input_data=[]
size = input_data.shape[0]
for i in range (0,size):
img=color.rgb2gray(input_data[i])
U, s, V = np.linalg.svd(img, full_matrices=False);
S=[s[i] for i in range(30)]
svdArray_input_data.append(S)
svdMatrix_input_data=np.matrix(svdArray_input_data)
return svdMatrix_input_data
# apply SVD for train and test data
train_data_svd=svdFeatures(train_data)
test_data_svd=svdFeatures(test_data)

Support Vector Machine (SVM)

Support Vector Machine (SVM) is effective in:

High-dimensional spaces.

In cases, where, the number of dimensions > the number of samples.

In cases with a clear margin of separation.


Given below is the code snippet for training in SVM:

from sklearn import svm #Creating a svm classifier model

clf = svm.SVC(gamma=.001,probability=True) #Model training

clf.fit(train_data_flat_t, train_labels) #After being fitted, the model can then be


used to predict the output.

Here, train_data_flat_t can be replaced with train_data_pca or train_data_svd for


PCA and SVD respectively.

***********************************************************************************
***********************************************************

DNN and SNN

Normalising the Data

Since the data is in terms of length we need to scale to data to have normal
distribution.

For this we take maximum and minimum of each feature.

Subtract each feature by its minimum value.

Finally, divide the result by difference of maximum and minimum value

import numpy as np

def normalize(data):

col_max = np.max(data, axis = 0)

col_min = np.min(data, axis = 0)

return np.divide(data - col_min, col_max - col_min)

X_norm = normalize(X)

Dot product Example

Sample code in Python for a dot product

import numpy as np

#matrix a
a = np.array([[1,2],[3,4]])

print("matrix a dimension ", a.shape)

#matrix b

b = np.array([[5,6,7],[8,9, 10]])

print("matrix b dimension ", b.shape)

#matrix c = a.b

c = np.dot(a,b)

print("dot product of a and b: ", c)

print("matrix c dimension ", c.shape)

output:

matrix a dimension (2, 2)

matrix b dimension (2, 3)

dot product of a and b: [[21 24 27]

[47 54 61]]

matrix c dimension (2, 3)

Element wise product

Sample code for element-wise product in Python

import numpy as np

#matrix a

a = np.array([[1,2],[3,4]])

print("matrix a dimension ", a.shape)

#matrix b

b = np.array([[5,6],[8,9]])

print("matrix b dimension ", b.shape)


#matrix c = a * b

c = np.multiply(a,b)

print("element wise product of matrices a and b: ", c)

print("matrix c dimension ", c.shape)

output:

matrix a dimension (2, 2)

matrix b dimension (2, 2)

element wise product of matrices a and b: [[ 5 12]

[24 36]]

matrix c dimension (2, 2)

Broadcasting Examples

1. Multiplication of a matrix and a row vector

a = np.array([[10, 10, 10], [20, 20, 20], [30, 30, 30]])

b = np.array([1, 2, 3])

c = a * b

print(c)

output:

[[10 20 30]

[20 40 60]

[30 60 90]]

2. Addition of a matrix and a scalar

a = np.array([[10, 10, 10], [20, 20, 20], [30, 30, 30]])

b = 1

c = a + b
print(c)

output:

[[11 11 11]

[21 21 21]

[31 31 31]]

3. Element-wise function call

def exp(x, n):

return x ** n

a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(exp(a, 2))

output:

[[ 1 4 9]

[16 25 36]

[49 64 81]]

#Note that each element of array **a** has been raised to power 2

Representation of Array Using numpy

array1d = np.array([1,2,3,4])
print("shape of array1d before reshaping: ", array1d.shape)
array1d = array1d.reshape(1,4)
print("shape of array1d after reshaping: ", array1d.shape)
#rank of matrix can be found using np.linalg.matrix_rank() function
print("array1d is a martrix of rank {}".format(np.linalg.matrix_rank(array1d)))
output:
shape of array1d before reshaping: (4,)
shape of array1d after reshaping: (1, 4)
array1d is a martrix of rank 1

The shape (4,) just represents that the array has 4 elements.

The shape (1, 4) represents that array has 4 elements with one row and four
columns.

Importing the Data


The following code snippet shows how to way to extract iris data from Scikit-
learn.

from sklearn import datasets


iris = datasets.load_iris()
#extracting first 100 samples pertaining #to iris setosa and verginica
X = iris.data[:100, :4]
#actual output
Y = iris.target[:100]

Getting Dimensions Right

Now the data X_norm is of shape (100,4)

But for SNN we need to have the data of shape (no_of_features x no_of_samples).
So take a transpose of X_norm.

The Y data is a list of shape (100,). Convert it a vector of shape (1,100) by


using reshape() function.

X_norm=X.reshape(100,4)

X_data = X_norm.T

Y_data = Y.reshape(1,100)

print(X_data.shape)

print(Y_data.shape)

output:

(4,100)

(1,100)

Initialising Weights and Bias

Before we start the forward propagation, we need to initialize weights and bias
to some random values.

Since we have four features, we need to have weight vector of shape (4,1) and
one bias term of shape (1,1).

In this case, we initialize all our weights and bias to zero.

def initialiseNetwork(num_features):

W = np.zeros((num_features, 1))

b = 0
parameters = {"W": W, "b": b}

return parameters

Defining Activation Function

Before going with the forward propagation, we need to define an activation


function for the neuron.

Since this is a binary classification, let's consider a sigmoid function that


maps any linear input to values between 0 to 1.

The sigmoid activation function is implemented as shown in the below code


snippet.

def sigmoid(z):

return 1/(1 + np.exp(-z))

Forward Propagation

You have seen the theoretical implementation of forward propagation in the


previous topic.

The same is implemented in Python as follows:

def forwardPropagation(X, Y, parameters):

W = parameters["W"]

b = parameters["b"]

Z = np.dot(W.T,X) + b

A = sigmoid(Z)

return A

Calculating cost function for a given number of samples.

def cost(A, Y, num_samples):

return -1/num_samples *np.sum(Y*np.log(A) + (1-Y)*(np.log(1-A)))

Defining Backpropagation

From forward propagation, you know the output A.

Using this output, you need to find the derivatives of weights and bias

def backPropagration(X, Y, A, num_samples):


dZ = A - Y

dW = (np.dot(X,dZ.T))/num_samples

db = np.sum(dZ)/num_samples

return dW, db

Updating Parameters

Once we have the derivatives, you need to subtract them from original weights
and bias.

While subtracting, we multiply the derivatives with a learning rate to have


control over the gradient at each step of iteration.

def updateParameters(parameters, dW, db, learning_rate):

W = parameters["W"] - (learning_rate * dW)

b = parameters["b"] - (learning_rate * db)

return {"W": W, "b": b}

Defining the Model

Using all the function defined so far let's define the model to initialize and
train the SNN.

def model(X, Y, num_iter, learning_rate):


num_features = X.shape[0]
num_samples = float(X.shape[1])
parameters = initialiseNetwork(num_features)
for i in range(num_iter):
A = forwardPropagation(X, Y, parameters)
if(i%100 == 0):
print("cost after {} iteration: {}".format(i, cost(A, Y, num_samples)))
dW, db = backPropagration(X, Y, A, num_samples)
parameters = updateParameters(parameters, dW, db, learning_rate)
return parameters

Training the Model

Train the model using iris dataset with learning rate 0.1 and number of
iteration equal to 1000.

parameters = model(X_data, Y, 1000, 0.1)

output:

cost after 0 iteration: 0.69314718056


cost after 100 iteration: 0.361323737602

cost after 200 iteration: 0.234699727458

cost after 300 iteration: 0.171513661704

cost after 400 iteration: 0.134598207666

cost after 500 iteration: 0.110633930302

cost after 600 iteration: 0.0938999866314

cost after 700 iteration: 0.0815813584762

cost after 800 iteration: 0.0721454431571

cost after 900 iteration: 0.0646909798272

You can see that at every iteration the cost is reducing approaching close to
zero.
***********************************************************************************
***********************************************************************
Now, you know that the input to a CNN is an N-dimensional data. If it's a digital
image then its dimension is

(\small number\_of\_rowsnumber_of_rows x \small


number\_of\_columnsnumber_of_columns x \small
number\_of\_channelsnumber_of_channels)

An additional dimension is added for the input which represents the number of
samples, so final dimension would be

(\small num\_samplesnum_samples x \small number\_of\_rowsnumber_of_rows x \small


number\_of\_columnsnumber_of_columns x \small
number\_of\_channelsnumber_of_channels)

usually represented as mm x n_hn


h
x n_wn
w
x n_cn
c

***********************************************************************************
**********************************************************
Implementation using NumPy
A = np.random.randint(0,10, size= (3,3,3))
W = np.random.randint(0,10, size= (3,3,3))

scalar = np.sum(np.multiply(A, W)) ### numpy implementation to perform convolution

print(scalar)
688
***********************************************************************************
***********************************************************
Padding - NumPy
def zero_pad(data, pad):
data_pad = numpy.pad(array = data, pad_width = ((0,0),(pad,pad), (pad,pad),
(0,0)), mode = 'constant', constant_values = 0)
return data_pad
***********************************************************************************
***********************************************************************
Strided Convolution Using NumPy
A detailed explation of this code is provided in the next card.

def conv_forward(A_prev, W, b, hparams):


stride = hparams["stride"]
pad = hparams["pad"]

m, h_prev, w_prev, c_prev = A_prev.shape

number of channels in the input


f, f, c_prev, n_c = W.shape

n_h = int((h_prev - f + 2*pad)/stride) + 1


n_w = int((w_prev - f + 2*pad)/stride) + 1

Z = np.zeros((m, n_h, n_w, n_c))


A_prev_pad = zero_pad(A_prev, pad)
for i in range(m):
for h in range(n_h):
for w in range(n_w):
for c in range(n_c):
w_start = w * stride
w_end = w_start + f
h_start = h * stride
h_end = h_start + f

Z[i,h,w,c] = conv_single_step(A_prev_pad[i, h_start:h_end,


w_start:w_end, :], W[:,:,:,c], b[:,:,:,c])
return Z
***********************************************************************************
************************************************************************
Pooling - NumPy Implementation
The below method performs max pooling on the given input using filter size and
padding provided by hparam.

def max_pool(input, hparam):


m, h_prev, w_prev, c_prev = input.shape
f = hparam["f"] ## f is the filter size to use for pooling
stride = hparam["stride"]
h_out = int(((h_prev - f)/stride) + 1)
w_out = int(((w_prev -f)/stride) + 1)
output = np.zeros((m, h_out, w_out, c_prev))
for i in range(m):
for c in range(c_prev):
for h in range(h_out):
for w in range(w_out):
w_start = w * stride
w_end = w_start + f
h_start = h * stride
h_end = h_start + f
output[i, h, w, c] = np.max(input[i,h_start:h_end,
w_start:w_end, c])
print(output.shape)
assert output.shape == (m, h_out, w_out, c_prev)
return output
***********************************************************************************
********************************************************************************

You might also like