0% found this document useful (0 votes)
29 views170 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views170 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 170

Unit I

Unit-1: INTRODUCTION TO DATA SCIENCE 9 hours

Introduction to Python Libraries: Numpy, creating array, attributes,


Numpy Arrays objects: Creating Arrays, basic operations (Array Join,
split, search, sort), Indexing, Slicing and iterating, copying arrays,
Arrays shape manipulation, Identity array, eye function, Universal
function, Linear algebra with Numpy, eigen values and eigen vectors
with Numpy, Numpy Random: Data Distribution, Normal, Exponential,
Binomial, Poisson, Uniform and ChiSaquare distributions.

T1: Using Numpy implement Array Indexing and slicing


T2: Using Numpy implement Array basic operations
T3: Using Numpy implement Linear algebra and Random package
Introduction to
Python Libraries
NumPy Arrays
NumPy

•Numerical Python
• General-purpose array-processing package.
• High-performance multidimensional array object, and tools for
working with these arrays.
• Fundamental package for scientific computing with Python.
• It is open-source software.
NumPy - Features

• A powerful N-dimensional array object


• Sophisticated (broadcasting) functions
• Tools for integrating C/C++ and Fortran code
• Useful linear algebra, Fourier transform, and random number
capabilities
Choosing NumPy over Python list
Array

• An array is a data type used to store multiple values using a single


identifier (variable name).
• An array contains an ordered collection of data elements where each
element is of the same type and can be referenced by its index
(position)
Array

• Similar to the indexing of lists


• Zero-based indexing
• [10, 9, 99, 71, 90 ]
NumPy Array

• Store lists of numerical data, vectors and matrices


• Large set of routines (built-in functions) for creating, manipulating,
and transforming NumPy arrays.
• NumPy array is officially called ndarray but commonly known as array
Creation of NumPy Arrays from List

• First we need to import the NumPy library


import numpy as np
Creation of Arrays
1. Using the NumPy functions

a. Creating one-dimensional array in NumPy


import numpy as np
array=np.arange(20)
array

Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19])
1. Using the NumPy functions

a. check the dimensions by using array.shape.


(20, )

Output:
array([ 0 1 2 3 4 5 6 7 8 9 10 1112 13 14,15, 16, 17, 18, 19])
1. Using the NumPy functions

b. Creating two-dimensional arrays in NumPy


array=np.arange(20).reshape(4,5)

Output:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]
[15, 16, 17, 18, 19]])
1. Using the NumPy functions

c. Using other NumPy functions


np.zeros((2,4))
np.ones((3,6))
np.full((2,2), 3)

Output:
array([[0., 0., 0., 0.],
[0., 0., 0., 0.]])
array([[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]])
1. Using the NumPy functions
1. Using the NumPy functions
1. Using the NumPy functions
[[0. 0. 0. 0.]
[0. 0. 0. 0.]]
c. Using other NumPy functions
import numpy as np [[1. 1. 1. 1. 1. 1.]
a=np.zeros((2,4)) [1. 1. 1. 1. 1. 1.]
b=np.ones((3,6)) [1. 1. 1. 1. 1. 1.]]
c=np.empty((2,3))
[[1.14137702e-316 0.00000000e+000
d=np.full((2,2), 3) 6.91583610e-310]
e= np.eye(3,3) [6.91583609e-310 6.91583601e-310
f=np.linspace(0, 10, num=4) 6.91583601e-310]]

print(a) [[3 3]
[3 3]]
print(b)
print(c) [[1. 0. 0.]
print(d) [0. 1. 0.]
[0. 0. 1.]]

[ 0. 3.33333333 6.66666667 10.


]
1. Using the NumPy functions
Sr No. Function Description

1 empty_like() Return a new array with the same shape and type

2 ones_like() Return an array of ones with the same shape and type.

3 zeros_like() Return an array of zeros with the same shape and type

4 full_like() Return a full array with the same shape and type

5 asarray() Convert the input to an array.

6 geomspace() Return evenly spaced numbers on a log scale.

7 copy() Returns a copy of the given object


1. Using the NumPy functions
Sr No. Function Description

8 diag() a diagonal array

9 frombuffer() buffer as a 1-D array

10 fromfile() Construct an array from text or binary file

Build a matrix object from a string, nested sequence, or


11 bmat()
array

12 mat() Interpret the input as a matrix

13 vander() Generate a Vandermonde matrix

14 triu() Upper triangle of array


1. Using the NumPy functions
Sr No. Function Description

15 tril() Lower triangle of array

An array with ones at & below the given diagonal and


16 tri()
zeros elsewhere

two-dimensional array with the flattened input as a


17 diagflat()
diagonal

18 fromfunction() executing a function over each coordinate

19 logspace() Return numbers spaced evenly on a log scale

20 meshgrid() Return coordinate matrices from coordinate vectors


2. Conversion from Python structure like lists

import numpy as np
[4 5 6]
array=np.array([4,5,6]) print(array)
[4, 5, 6]
list=[4,5,6]
print(list)
Working with Ndarray

• np.ndarray(shape, type)
• Creates an array of the given shape with random numbers.
• np.array(array_object)
• Creates an array of the given shape from the list or tuple.
• np.zeros(shape)
• Creates an array of the given shape with all zeros.
• np.ones(shape)
• Creates an array of the given shape with all ones.
• np.full(shape,array_object, dtype)
• Creates an array of the given shape with complex numbers.
• np.arange(range)
• Creates an array with the specified range.
NumPy Basic Array Operations
There is a vast range of built-in operations that we can perform on
these arrays.
1. ndim – It returns the dimensions of the array.
2. itemsize – It calculates the byte size of each element.
3. dtype – It can determine the data type of the element.
4. reshape – It provides a new view.
5. slicing – It extracts a particular set of elements.
6. linspace – Returns evenly spaced elements.
7. max/min , sum, sqrt
8. ravel – It converts the array into a single line.
Arrays in NumPy
Checking Array Dimensions in NumPy

import numpy as np
a = np.array(10)
b = np.array([1,1,1,1])
c = np.array([[1, 1, 1], [2,2,2]])
d = np.array([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]) print(a.ndim) #0
print(b.ndim) #1
print(c.ndim) #2
print(d.ndim) #3
Higher Dimensional Arrays in NumPy

import numpy as np
arr = np.array([1, 1, 1, 1, 1], ndmin=10)
print(arr)
print('number of dimensions :', arr.ndim)

[[[[[[[[[[1 1 1 1 1]]]]]]]]]]
number of dimensions : 10
Indexing and Slicing in NumPy
Indexing & Slicing
Indexing
import numpy as np
arr=([1,2,5,6,7])
print(arr[3]) #6

Slicing
import numpy as np
arr=([1,2,5,6,7])
print(arr[2:5]) #[5, 6, 7]
Indexing and Slicing
Indexing and Slicing in 2-D
Copying Arrays
Copy from one array to another
• Method 1: Using np.empty_like() function
• Method 2: Using np.copy() function
• Method 3: Using Assignment Operator
Using np.empty_like( )
• This function returns a new array with the same shape and type as a
given array.
Syntax:
• numpy.empty_like(a, dtype = None, order = ‘K’, subok = True)
Using np.empty_like( )
import numpy as np
ary = np.array([13, 99, 100, 34, 65, 11, 66, 81, 632, 44])

print("Original array: ")


# printing the Numpy array
print(ary)

# Creating an empty Numpy array similar to ary


copy = np.empty_like(ary)

# Now assign ary to copy


copy = ary

print("\nCopy of the given array: ")

# printing the copied array


print(copy)
Using np.empty_like( )
Using np.copy() function
• This function returns an array copy of the given object.
Syntax :
• numpy.copy(a, order='K', subok=False)

# importing Numpy package


import numpy as np
org_array = np.array([1.54, 2.99, 3.42, 4.87, 6.94, 8.21, 7.65, 10.50,
77.5])
print("Original array: ")
print(org_array)
# Now copying the org_array to copy_array using np.copy() function
copy_array = np.copy(org_array)
print("\nCopied array: ")
# printing the copied Numpy array
print(copy_array)
Using np.copy() function
# importing Numpy package
import numpy as np
org_array = np.array([1.54, 2.99, 3.42, 4.87, 6.94, 8.21, 7.65, 10.50, 77.5])
print("Original array: ")
print(org_array)
copy_array = np.copy(org_array)
print("\nCopied array: ")
# printing the copied Numpy array
print(copy_array)
Using Assignment Operator

import numpy as np
org_array = np.array([[99, 22, 33],[44, 77, 66]])
# Copying org_array to copy_array using Assignment operator
copy_array = org_array

# modifying org_array
org_array[1, 2] = 13

# checking if copy_array has remained the same

# printing original array


print('Original Array: \n', org_array)

# printing copied array


print('\nCopied Array: \n', copy_array)
Iterating Arrays

• Iterating means going through elements one by one.


• As we deal with multi-dimensional arrays in numpy, we can do this using
basic for loop of python.
• If we iterate on a 1-D array it will go through each element one by one.
• Iterate on the elements of the following 1-D array:
import numpy as np
arr = np.array([1, 2, 3])
for x in arr:
print(x)
Output:
1
2
3
Iterating Arrays

• Iterating 2-D Arrays


• In a 2-D array it will go through all the rows.
• If we iterate on a n-D array it will go through (n-1)th dimension one by one.

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
print(x)
Output:
[1 2 3]
[4 5 6]
Iterating Arrays

• To return the actual values, the scalars, we have to iterate the arrays
in each dimension.
arr = np.array([[1, 2, 3], [4, 5, 6]])
for x in arr:
for y in x:
print(y)

1
2
3
4
5
6
Iterating Arrays

• Iterating 3-D Arrays


• In a 3-D array it will go through all the 2-D arrays.

• import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
print(x)

[[1 2 3] [4 5 6]]
[[ 7 8 9] [10 11 12]]
Iterating Arrays

• Iterating 3-D Arrays


• To return the actual values, the scalars, we have to iterate the arrays in each
dimension.

import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
for y in x:
for z in y:
print(z)
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8

Iterating Arrays Using nditer()


• The function nditer() is a helping function that can be used
from very basic to very advanced iterations. 1
2
3
• Iterating on Each Scalar Element 4
• In basic for loops, iterating through each scalar of an array we 5
need to use n for loops which can be difficult to write for arrays 6
with very high dimensionality. 7
8
import numpy as np

arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
print(x)
Identity array

• The identity array is a square array with ones on the main diagonal.
• The identity() function return the identity array.
Identity

• numpy.identity(n, dtype = None) : Return a identity matrix i.e. a


square matrix with ones on the main daignol

• Parameters:
• n : [int] Dimension n x n of output array
• dtype : [optional, float(by Default)] Data type of returned array
Identity array

# 2x2 matrix with 1's on main diagonal


b = geek.identity(2, dtype = float)
print("Matrix b : \n", b)
a = geek.identity(4)
print("\nMatrix a : \n", a)

Output:
Matrix b :
[[ 1. 0.]
[ 0. 1.]]
Matrix a :
[[ 1. 0. 0. 0.]
[ 0. 1. 0. 0.]
[ 0. 0. 1. 0.]
[ 0. 0. 0. 1.]]
eye( )
• numpy.eye(R, C = None, k = 0, dtype = type <‘float’>) : Return a
matrix having 1’s on the diagonal and 0’s elsewhere w.r.t. k.
• R : Number of rows
C : [optional] Number of columns; By default M = N
k : [int, optional, 0 by default]
Diagonal we require; k>0 means diagonal above main diagonal or
vice versa.
dtype : [optional, float(by Default)] Data type of returned array.
eye( )
Identity( ) vs eye( )

• np.identity returns a square matrix (special case of a 2D-array) which


is an identity matrix with the main diagonal (i.e. 'k=0') as 1's and the
other values as 0's. you can't change the diagonal k here.
• np.eye returns a 2D-array, which fills the diagonal, i.e. 'k' which can
be set, with 1's and rest with 0's.
• So, the main advantage depends on the requirement. If you want an
identity matrix, you can go for identity right away, or can call
the np.eye leaving the rest to defaults.
• But, if you need a 1's and 0's matrix of a particular shape/size or have
a control over the diagonal you can go for eye method.
Identity( ) vs eye( )

import numpy as np
print(np.eye(3,5,1))
print(np.eye(8,4,0))
print(np.eye(8,4,-1))
print(np.eye(8,4,-2))
Print(np.identity(4)
Shape of an Array

• import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

• Output: (2,4)
Reshaping arrays

• Reshaping means changing the shape of an array.


• The shape of an array is the number of elements in each dimension.
• By reshaping we can add or remove dimensions or change number of
elements in each dimension.
Reshape From 1-D to 2-D

• import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)

• Output:
• [[ 1 2 3]
• [ 4 5 6]
• [ 7 8 9]
• [10 11 12]]
Reshape From 1-D to 3-D

• The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements
• import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)
print(newarr)

Output:
[[[ 1 2]
[ 3 4]
[ 5 6]]

[[ 7 8]
[ 9 10]
[11 12]]]
Can we Reshape into any Shape?

• Yes, as long as the elements required for reshaping are equal in both
shapes.
• We can reshape an 8 elements 1D array into 4 elements in 2 rows 2D array
but we cannot reshape it into a 3 elements 3 rows 2D array as that would
require 3x3 = 9 elements.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(3, 3)
print(newarr)

• Traceback (most recent call last): File


"demo_numpy_array_reshape_error.py", line 5, in <module> ValueError:
cannot reshape array of size 8 into shape (3,3)
Flattening the arrays

• Flattening array means converting a multidimensional array into a 1D


array.
• import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
newarr = arr.reshape(-1)
print(newarr)
• Output: [1 2 3 4 5 6]
• There are a lot of functions for changing the shapes of arrays in
numpy flatten, ravel and also for rearranging the
elements rot90, flip, fliplr, flipud etc. These fall under Intermediate to
Advanced section of numpy.
Operations on NumPy
1.NumPy Arithmetic Operations
1.NumPy Arithmetic Operations
import numpy as np
[6 7 8 9]
a = np.array([1, 2, 3, 4]) [-1 0 1 2]
# add 5 to every element [10 20 30 40]
[0.5 1. 1.5 2. ]
print ( a+5)
# subtract 2 from each element
print ( a-2)
# multiply each element by 5
print (a*10)
# divide each element by 2
print ( a/2)
2. NumPy Unary Operators
import numpy as np
arr = np.array([[1,5, 12], [2,32, 20], [3, 40, 13]])
print(arr.max(axis = 1))
print(arr.max(axis = 0))
print (arr.min(axis = 0))
print(arr.min(axis = 1))
print (arr.sum( ))
print ( arr.sum(axis=0))
print( arr.sum(axis=1))

[12 32 40]
[ 3 40 20]
[ 1 5 12]
[1 2 3]
128
[6 77 45]
[18 54 56]
3. NumPy Binary Operators
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[4, 3], [2, 1]])
print (a + b)
print (a*b)

[[5 5]
[5 5]]

[[4 6]
[6 4]]
NumPy Universal Functions
import numpy as np
a = np.array([0, np.pi/2, np.pi])
print ( np.sin(a))
a = np.array([0, 1, 2, 3])
print ( np.exp(a))
print ( np.sqrt(a))

[0.0000000e+00 1.0000000e+00 1.2246468e-16]

[ 1. 2.71828183 7.3890561 20.08553692]

[0. 1. 1.41421356 1.73205081]


Arithmetic Operators & Functions
Mathematical Functions
NumPy functions

import numpy as np
a = np.array([7,3,4,5,1])
b = np.array([3,4,5,6,7])
np.add(a, b) #([ 10, 7, 9, 11, 8])
np.subtract(a,b) #[ 4,-1,-1,-1,-6]
np.multiply(a, b) #[21, 12, 20, 30, 7]
np.divide(a, b) #[2.33333333, 0.75 , 0.8 , 0.83333333, 0.14285714]
np.remainder(a,b) #[1, 3, 4, 5, 1]
np.mod(a,b) #[1, 3, 4, 5, 1]
np.power(a,b) #[ 343, 81, 1024, 15625, 1]
np.reciprocal(a) #[0, 0, 0, 0, 1]
NumPy Add Operator

import numpy as np
a = np.array([10,20,100,200,500])
b = np.array([3,4,5,6,7])
print(a+b) #[ 13 24 105 206 507]
print(a-b)
print(a*b)
print(a/b)
Types of Array
Creating a 1-D Array
Creating a 1-D Array
Creation of a multidimensional
array(ndarray)
Creation of a multidimensional
array(ndarray)
Creation of a multidimensional
array(ndarray)
Program to illustrate Indexing in Ndarrays
Program to illustrate Indexing in Ndarrays
Program to illustrate Indexing in a 3D
array
Program to illustrate Indexing in a 3D
array
Operations on Ndarray
Operations on Ndarray
Add two 1d arrays element-wise
Add two 1d arrays element-wise
import numpy as np

# create numpy arrays x1 and x2


x1 = np.array([1, 3, 0, 7])
x2 = np.array([2, 0, 1, 1])
# elementwise sum with np.add()
x3 = np.add(x1, x2)
# display the arrays
print("x1:", x1)
print("x2:", x2)
print("x3:", x3)
Add two 1d arrays element-wise
import numpy as np

x1: [1 2 0 3]
# create numpy arrays x1 and x2
x2: [4 1 2 2]
x1 = np.array([1, 2, 0, 3])
x3: [5 3 2 5]
x2 = np.array([4, 1,2, 2])
# elementwise sum with np.add()
x3 = np.add(x1, x2)
# display the arrays
print("x1:", x1)
print("x2:", x2)
print("x3:", x3)
x3 = x1 + x2
# display the arrays
print("x1:", x1)
print("x2:", x2)
print("x3:", x3)
Add two 2d arrays elementwise
# create 2d arrays x1 and x2 x1:
x1 = np.array([[1, 0, 1], [2, 1, 1], [3, 0, 3]]) [[1 0 1]
[2 1 1]
x2 = np.array([[2, 2, 0], [1, 0, 1], [0, 1, 0]])
[3 0 3]]
# elementwise sum with np.add() x2:
x3 = np.add(x1, x2) [[2 2 0]
[1 0 1]
# display the arrays print("x1:\n", x1) [0 1 0]]
print("x2:\n", x2) x3:
[[3 2 1]
print("x3:\n", x3)
[3 1 2]
[3 1 3]]
Add more than two arrays elementwise
# create numpy arrays x1, x2, and x3 x1: [1 3 0 7]
x1 = np.array([1, 3, 0, 7]) x2: [2 0 1 1]
x3: [0 1 3 1]
x2 = np.array([2, 0, 1, 1])
x4: [3 4 4 9]
x3 = np.array([0, 1, 3, 1])
# elementwise sum with +
x4 = x1+x2+x3
# display the arrays
print("x1:", x1)
print("x2:", x2)
print("x3:", x3)
print("x4:", x4)
NumPy Array attributes
1. ndarray.flags- It provides information about memory layout
2. ndarray.shape- Provides array dimensions
3. ndarray.strides- Determines step size while traversing the
arrays
4. ndarray.ndim- Number of array dimensions
5. ndarray.data- Points the starting position of array
6. ndarray.size- Number of array elements
7. ndarray.itemsize- Size of individual array elements in bytes
8. ndarray.base- Provides the base object, if it is a view
9. ndarray.nbytes- Provides the total bytes consumed by the
array
10. ndarray.T- It gives the array transpose
11. ndarray.real- Separates the real part
12. ndarray.imag- Separates the imaginary
NumPy Array attributes
1. ndarray.flags- It provides information about memory layout
2. ndarray.shape- Provides array dimensions
3. ndarray.strides- Determines step size while traversing the
arrays
4. ndarray.ndim- Number of array dimensions
5. ndarray.data- Points the starting position of array
6. ndarray.size- Number of array elements
7. ndarray.itemsize- Size of individual array elements in bytes
8. ndarray.base- Provides the base object, if it is a view
9. ndarray.nbytes- Provides the total bytes consumed by the
array
10. ndarray.T- It gives the array transpose
11. ndarray.real- Separates the real part
12. ndarray.imag- Separates the imaginary
Join, split, search and sort
Joining

• Joining means putting contents of two or more arrays in a single


array.
• In SQL we join tables based on a key, whereas in NumPy we join
arrays by axes.
• We pass a sequence of arrays that we want to join to
the concatenate() function, along with the axis. If axis is not explicitly
passed, it is taken as 0.
Join
Join two arrays

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

Output:[1 2 3 4 5 6]
Join two 2-D arrays along rows (axis=1)

import numpy as np

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

Output:[[1 2 5 6] [3 4 7 8]]
stack( )

• Stacking is same as concatenation, the only difference is that stacking


is done along a new axis.
• We can concatenate two 1-D arrays along the second axis which
would result in putting them one over the other, ie. stacking.
• We pass a sequence of arrays that we want to join to
the stack() method along with the axis. If axis is not explicitly passed
it is taken as 0.
stack( )

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr)

Output: [[1 4] [2 5] [3 6]]


hstack( ) - Stacking Along Rows
• NumPy provides a helper function: hstack() to stack along rows.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.hstack((arr1, arr2))

print(arr)

Output: [1 2 3 4 5 6]
vstack( ) - Stacking Along Columns
• NumPy provides a helper function: vstack() to stack along columns.

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.vstack((arr1, arr2))

print(arr)

Output: [[1 2 3] [4 5 6]]


Split
Splitting

• Splitting is reverse operation of Joining.


• Joining merges multiple arrays into one and Splitting breaks one
array into multiple.
Split

• split( )

• array_split( )
Split

• hsplit( )

• vsplit( )
Splitting

• We use array_split() for splitting arrays, we pass it the array we want


to split and the number of splits.
• Note: The return value is an array containing three arrays.
• If the array has less elements than required, it will adjust from the
end accordingly.

Note: We also have the method split() available but it will not adjust
the elements when elements are less in source array for splitting like
in example above, array_split() worked properly but split() would fail.
array_split( )

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)

Output:
[array([1, 2]), array([3, 4]), array([5, 6])]
split( )

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.split(arr, 3)

print(newarr)

Output:
[array([1, 2]), array([3, 4]), array([5, 6])]
array_split( )

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

Output:[array([1, 2]), array([3, 4]), array([5]), array([6])]


split( )

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.split(arr, 4)

print(newarr)

Output:
Error
Split Into Arrays

• The return value of the array_split() method is an array containing


each of the split as an array.

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr[0])
print(newarr[1])
print(newarr[2])

Output:[1 2] [3 4] [5 6]
Splitting 2-D Arrays

• Use the same syntax when splitting 2-D arrays.


• Use the array_split() method, pass in the array you want to split and
the number of splits you want to do.

import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
print(newarr)

Output:[array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]]), array([[ 9, 10], [11,
12]])]
hsplit( ) and vsplit( )

• The hsplit() function is used to split an array into multiple sub-arrays


horizontally (column-wise).

• hsplit is equivalent to split with axis=1, the array is always split along
the second axis regardless of the array dimension.

• The vsplit() function is used to split an array into multiple sub-arrays


vertically (row-wise).
• Note: vsplit is equivalent to split with axis=0 (default), the array is
always split along the first axis regardless of the array dimension.
hsplit
[array([[ 1],
import numpy as np
[ 4],
[ 7],
arr = np.array([[1, 2, 3], [10],
[4, 5, 6], [7, 8, 9], [10, 11, 12], [13],
[13, 14, 15], [16, 17, 18]]) [16]]), array([[ 2],
[ 5],
newarr = np.hsplit(arr, 3) [ 8],
[11],
print(newarr) [14],
[17]]), array([[ 3],
[ 6],
[ 9],
[12],
[15],
[18]])]
vsplit
[array([[1, 2, 3], [4, 5, 6]]),
import numpy as np
array([[ 7, 8, 9], [10, 11, 12]]),
array([[13, 14, 15], [16, 17,
arr = np.array([[1, 2, 3], [4, 5, 18]])]
6], [7, 8, 9], [10, 11, 12], [13, 14,
15], [16, 17, 18]])

newarr = np.vsplit(arr, 3)

print(newarr)
Searching

• You can search an array for a certain value, and return the indexes
that get a match.
• To search an array, use the where() method.

Find the indexes where the value is 4


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)

Output:(array([3, 5, 6]),)
Sorting
• Sorting means putting elements in an ordered sequence.
• Ordered sequence is any sequence that has an order corresponding
to elements, like numeric or alphabetical, ascending or
descending.
• The NumPy ndarray object has a function called sort(), that will
sort a specified array.
• Note: This method returns a copy of the array, leaving the original
array unchanged.

import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

Output:[0 1 2 3]
Search
Array shape manipulation
Reshaping Array
List and Array
Linear Algebra with Numpy
Linear Algebra Module

• The Linear Algebra module of NumPy offers various methods to apply


linear algebra on any numpy array.
One can find:
• rank, determinant, trace, etc. of an array.
• eigen values of matrices
• matrix and vector products (dot, inner, outer,etc. product), matrix
exponentiation
• solve linear or tensor equations and much more!
Example
# Importing numpy as np
import numpy as np

A = np.array([[6, 1, 1],
[4, -2, 5],
[2, 8, 7]])

# Rank of a matrix
print("Rank of A:", np.linalg.matrix_rank(A))

# Trace of matrix A
print("\nTrace of A:", np.trace(A))

# Determinant of a matrix
print("\nDeterminant of A:", np.linalg.det(A))

# Inverse of matrix A
print("\nInverse of A:\n", np.linalg.inv(A))

print("\nMatrix A raised to power 3:\n",


np.linalg.matrix_power(A, 3))
Output

Rank of A: 3
Trace of A: 11
Determinant of A: -306.0
Inverse of A:
[[ 0.17647059 -0.00326797 -0.02287582]
[ 0.05882353 -0.13071895 0.08496732]
[-0.11764706 0.1503268 0.05228758]]

Matrix A raised to power 3:


[[336 162 228]
[406 162 469]
[698 702 905]]
Eigen values and Eigen
vectors
Eigen values and vectors

• The Python Numpy linear algebra package can find the eigenvalues
and eigenvectors of a matrix.
• We calculate the eigenvalues and eigenvectors of the matrix
import numpy as np
from numpy import linalg as LA
A = np.array([[1,2,3],[3,2,1],[1,0,-1]])
w, v = LA.eig(A)
print(w)
[ 4.31662479e+00 -2.31662479e+00 3.43699053e-17] print(v)
[[ 0.58428153 0.73595785 0.40824829] [ 0.80407569 -0.38198836
-0.81649658] [ 0.10989708 -0.55897311 0.40824829]]
Eigen values and vectors

• The numpy.linalg.eig function returns a tuple consisting of a vector


and an array
• The vector (here w) contains the eigenvalues.
• The array (here v) contains the corresponding eigenvectors, one
eigenvector per column.
• The eigenvalue w[0] goes with the 0th column of v. The
eigenvalue w[1] goes with column 1, etc.
• To extract the ith column vector, we use
• u = v[:,i]
Eigen values
# importing numpy library
import numpy as np

# create numpy 2d-array


m = np.array([[1, 2], [2, 3]])

print("Printing the Original square array:\n",m)

# finding eigenvalues and eigenvectors


w, v = np.linalg.eig(m)

# printing eigen values


print("Printing the Eigen values of the given square array:\n", w)

# printing eigen vectors


print("Printing Right eigenvectors of the given square array:\n“,v)
Eigen values
Printing the Original square array:
[[1 2]
[2 3]]

Printing the Eigen values of the given square array:


[-0.23606798 4.23606798]

Printing Right eigenvectors of the given square array:


[[-0.85065081 -0.52573111]
[ 0.52573111 -0.85065081]]
Eigen vectors

# importing numpy library


import numpy as np

# create numpy 2d-array


m = np.array([[1, 2, 3],[2, 3, 4],[4, 5, 6]])

print("Printing the Original square array:\n",m)

# finding eigenvalues and eigenvectors


w, v = np.linalg.eig(m)

# printing eigen values


print("Printing the Eigen values of the given square array:\n",w)

# printing eigen vectors


print("Printing Right eigenvectors of the given square array:\n",v)
Eigen vectors

Printing the Original square array:


[[1 2 3]
[2 3 4]
[4 5 6]]
Printing the Eigen values of the given square array:
[ 1.08309519e+01 -8.30951895e-01 1.01486082e-16]
Printing Right eigenvectors of the given square array:
[[ 0.34416959 0.72770285 0.40824829]
[ 0.49532111 0.27580256 -0.81649658]
[ 0.79762415 -0.62799801 0.40824829]]
Numpy Random
Data Distribution, Normal,
Exponential, Binomial, Poisson,
Uniform and ChiSquare
distributions.
Random

• Random number does NOT mean a different number every time.


Random means something that can not be predicted logically.

• Pseudorandom
• Computers work on programs, and programs are definitive set of
instructions. So it means there must be some algorithm to generate a
random number as well.
• If there is a program to generate random number it can be predicted,
thus it is not truly random.
• Random numbers generated through a generation algorithm are
called pseudo random.
Random

• Random number does NOT mean a different number every time.


Random means something that can not be predicted logically.

• Truerandom
• In order to generate a truly random number on our computers we
need to get the random data from some outside source. This outside
source is generally our keystrokes, mouse movements, data on
network etc.
• We do not need truly random numbers, unless its related to security
(e.g. encryption keys) or the basis of application is the randomness
(e.g. Digital roulette wheels).
Generate Random number

from numpy import random

x = random.randint(100)

print(x)

Output:45
Generate Random Float

from numpy import random

x = random.rand()

print(x)

Output:0.20589891226659818
Generate Random Array

• In NumPy we work with arrays, and you can use the two methods from the
above examples to make random arrays.

Integers
• The randint() method takes a size parameter where you can specify the
shape of an array.

from numpy import random


x=random.randint(100, size=(5))
print(x)

Output:[61 66 32 13 16]
Data Distribution

• Data Distribution is a list of all possible values, and how often each
value occurs.
• Such lists are important when working with statistics and data
science.
• The random module offer methods that returns randomly generated
data distributions.
Random Distribution

• A random distribution is a set of random numbers that follow a


certain probability density function.
• Probability Density Function: A function that describes a continuous
probability. i.e. probability of all values in an array.
• We can generate random numbers based on defined probabilities
using the choice() method of the random module.
• The choice() method allows us to specify the probability for each
value.
• The probability is set by a number between 0 and 1, where 0 means
that the value will never occur and 1 means that the value will always
occur.
Example

• Generate a 1-D array containing 100 values, where each value has to
be 3, 5, 7 or 9.
• The probability for the value to be 3 is set to be 0.1
• The probability for the value to be 5 is set to be 0.3
• The probability for the value to be 7 is set to be 0.6
• The probability for the value to be 9 is set to be 0

from numpy import random

x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))

print(x)
Example

[3 7 5 7 7 7 5 7 3 7 3 7 7 7 7 5 7 5 7 7 7 7 7 5 3 7 5 7 7 7 3 5 3 7 5 7 7 5
757577577353577577775753557773777757757
7 7 7 7 7 5 5 3 5 5 7 5 5 7 7 5 3 3 7 7 5 7 7]
Example

• You can return arrays of any shape and size by specifying the shape
in the size parameter.
• Same example as above, but return a 2-D array with 3 rows, each
containing 5 values.

from numpy import random


x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(3, 5))
print(x)

[[7 7 7 7 7]
[5 3 5 7 5]
[5 7 5 7 5]]
Normal Distribution

• The Normal Distribution is one of the most important distributions.


• It is also called the Gaussian Distribution after the German
mathematician Carl Friedrich Gauss.
• It fits the probability distribution of many events, eg. IQ Scores,
Heartbeat etc.
• Use the random.normal() method to get a Normal Data Distribution.
• It has three parameters:
• loc - (Mean) where the peak of the bell exists.
• scale - (Standard Deviation) how flat the graph distribution should be.
• size - The shape of the returned array.
Normal Distribution

from numpy import random


x = random.normal(size=(2, 3))
print(x)

Output:
Run1:
[[ 0.15001821 -1.31355388 -1.35020654] [-1.31067087 -0.48537757
-0.02052509]]
Run2:
[[-2.0610908 -0.3081812 0.99886608] [ 0.56001902 0.38363428
-0.07954767]]
Visualization of Normal Distribution
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.normal(size=1000), hist=False)
plt.show()
Exponential Distribution

• Exponential distribution is used for describing time till next event e.g.
failure/success etc.
• It has two parameters:
• scale - inverse of rate ( see lam in poisson distribution ) defaults to 1.0.
• size - The shape of the returned array.
Exponential Distribution

Time Between Customers


• The number of minutes between customers who enter a certain shop can be
modeled by the exponential distribution.
• For example, suppose a new customer enters a shop every two minutes, on
average. After a customer arrives, find the probability that a new customer
arrives in less than one minute.

To solve this, we can start by knowing that the average time between customers is
two minutes. Thus, the rate can be calculated as:
• λ = 1/μ
• λ = 1/2
• λ = 0.5
• We can plug in λ = 0.5 and x = 1 to the formula for the CDF:
• P(X ≤ x) = 1 – e-λx
• P(X ≤ 1) = 1 – e-0.5(1)
• P(X ≤ 1) = 0.3935
The probability that we’ll have to wait less than one minute for the next customer
to arrive is 0.3935.
Exponential Distribution

• Draw out a sample for exponential distribution with 2.0 scale with
2x3 size:

from numpy import random

x = random.exponential(scale=2, size=(2, 3))

print(x)

Output:[[0.16401759 5.71219287 1.20149124] [2.51527074


2.13596927 1.04229153]]
Binomial Distribution

• A binomial distribution can be thought of as simply the probability of


a SUCCESS or FAILURE outcome in an experiment or survey that is
repeated multiple times.
• The binomial is a type of distribution that has two possible
outcomes (the prefix “bi” means two, or twice).
• For example, a coin toss has only two possible outcomes: heads or tails and
taking a test could have two possible outcomes: pass or fail.
Binomial Distribution
Example

• For example, let’s suppose you wanted to know the probability of


getting a 1 on a die roll. if you were to roll a die 20 times, the
probability of rolling a one on any throw is 1/6. Roll twenty times and
you have a binomial distribution of (n=20, p=1/6). SUCCESS would be
“roll a one” and FAILURE would be “roll anything else.”
• If the outcome in question was the probability of the die landing on
an even number, the binomial distribution would then become
(n=20, p=1/2). That’s because your probability of throwing an even
number is one half.

Binomial Distribution

• Binomial Distribution is a Discrete Distribution.


• It describes the outcome of binary scenarios, e.g. toss of a coin, it will
either be head or tails.
• It has three parameters:
• n - number of trials.
• p - probability of occurence of each trial (e.g. for toss of a coin 0.5 each).
• size - The shape of the returned array.
• Discrete Distribution:The distribution is defined at separate set of
events, e.g. a coin toss's result is discrete as it can be only head or
tails whereas height of people is continuous as it can be 170, 170.1,
170.11
Binomial Distribution

• Given 10 trials for coin toss generate 10 data points:

from numpy import random

x = random.binomial(n=10, p=0.5, size=10)

print(x)

Output:[5 7 6 5 4 7 5 4 6 5]

[5 3 6 4 3 3 3 5 5 5]
Visualization of Binomial Distribution

from numpy import random


import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist=True,
kde=False)
plt.show()
Poisson Distribution

• Poisson Distribution is a Discrete Distribution.


• It estimates how many times an event can happen in a specified
time. e.g. If someone eats twice a day what is probability he will eat
thrice?

• It has two parameters:


• lam - rate or known number of occurences e.g. 2 for above problem.
• size - The shape of the returned array.
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution
Poisson Distribution

• Poisson Distribution is a Discrete Distribution.


• It estimates how many times an event can happen in a specified
time. e.g. If someone eats twice a day what is probability he will eat
thrice?

• It has two parameters:


• lam - rate or known number of occurences e.g. 2 for above problem.
• size - The shape of the returned array.
Poisson Distribution

• Generate a random 1x10 distribution for occurence 2:

from numpy import random

x = random.poisson(lam=2, size=10)

print(x)
Uniform Distribution

• Used to describe probability where every event has equal chances of


occurring.
• E.g. Generation of random numbers.

• It has three parameters:


• a - lower bound - default 0 .0.
• b - upper bound - default 1.0.
• size - The shape of the returned array.
Uniform Distribution

• Create a 2x3 uniform distribution sample:

from numpy import random

x = random.uniform(size=(2, 3))

print(x)

Output:[[0.21295952 0.57512648 0.39384297] [0.7543237


0.80233051 0.53264002]]
Chi Square Distribution

• Chi Square distribution is used as a basis to verify the hypothesis.


• It has two parameters:
• df - (degree of freedom).
• size - The shape of the returned array.
Chi Square Distribution

• Draw out a sample for chi squared distribution with degree of


freedom 2 with size 2x3:

from numpy import random

x = random.chisquare(df=2, size=(2, 3))

print(x)

Output:[[0.01738909 9.73650152 0.87953635] [0.14366152


0.98102103 2.72668685]]

You might also like