NumPy: Arrays and
Vectorized Computation
Prof. Gheith Abandah
1
Reference
• Wes McKinney, Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython, O’Reilly Media, 2nd
Edition, 2018.
• Material: [Link]
2
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
3
NumPy: Numerical Python
• One of the most important foundational packages for fast
numerical computing in Python.
• Most computational packages providing scientific
functionality use NumPy’s array objects for data exchange.
• NumPy internally stores data in a contiguous block of
memory.
• NumPy’s library of algorithms written in the C language can
operate on this memory without any type checking or other
overhead.
4
NumPy is Fast
In [7]: import numpy as np
In [8]: my_arr = [Link](1000000)
In [9]: my_list = list(range(1000000))
In [10]: %time for _ in range(10): my_arr2 = my_arr * 2
CPU times: user 20 ms, sys: 50 ms, total: 70 ms
Wall time: 72.4 ms
In [11]: %time for _ in range(10): my_list2 = [x * 2 for x in my_list]
CPU times: user 760 ms, sys: 290 ms, total: 1.05 s
Wall time: 1.05 s
5
Outline
Introduction
4.1 The NumPy ndarray: A • Creating ndarrays
Multidimensional Array Object • Data Types for ndarrays
4.2 Universal Functions: Fast
• Arithmetic with NumPy Arrays
Element-Wise Array Functions
4.3 Array-Oriented Programming • Basic Indexing and Slicing
with Arrays • Boolean Indexing
4.4 File Input and Output with Arrays • Fancy Indexing
4.5 Linear Algebra
• Transposing Arrays and Swapping
4.6 Pseudorandom Number Axes
Generation
6
Creating ndarrays
• You can create NumPy data2 = [[1,2,3,4], [5,6,7,8]]
arrays from lists. arr2 = [Link](data2)
arr2
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
• Arrays have .ndim and
.shape attributes. [Link]
2
[Link]
(2, 4)
7
Array Creation Functions
8
Data Types for ndarrays
• The data type or dtype is a a = [Link]((2, 3, 2), 7,
special object containing the dtype=int)
information to interpret a a
Or np.int32
chunk of memory as a array([[[ 7, 7], Or 'i4'
[ 7, 7],
particular type of data.
[ 7, 7]],
[[ 7, 7],
• Arrays have .dtype [ 7, 7],
attribute. [ 7, 7]]])
[Link]
dtype('int32')
9
NumPy
Data
Types
int
float
10
Data Types for ndarrays
• You can explicitly convert or af = [Link]([3.7, -1.2, -2.6, 0.5])
ai = [Link](np.int32)
cast an array from one dtype
ai
to another. array([ 3, -1, -2, 0], dtype=int32)
• NumPy can convert strings as = [Link](['1.25', '-9.6', '42'],
dtype=np.string_)
to numbers, but Pandas is
af = [Link](float)
better. array([ 1.25, -9.6 , 42. ])
11
Arithmetic with NumPy Arrays
• Any arithmetic operations arr = [Link]([[1., 2., 3.],
between equal-size arrays [4., 5., 6.]])
applies the operation arr * arr
element-wise. array([[ 1., 4., 9.],
[ 16., 25., 36.]])
• Arithmetic operations with
1 / arr
scalars propagate the scalar
array([[ 1., 0.5, 0.3333],
argument to each element
[ 0.25, 0.2, 0.1667]])
in the array.
12
Arithmetic with NumPy Arrays
• Operations between a1 = [Link]([[0., 0., 0.],
differently sized arrays is [1., 1., 1.],
called broadcasting. [2., 2., 2.],
[3., 3., 3.]])
a2 = [Link]([1., 2., 3.])
a1 + a2
array([[1., 2., 3.],
[2., 3., 4.],
[3., 4., 5.],
[4., 5., 6.]])
13
The Broadcasting Rule
• Two arrays are compatible
for broadcasting if for each
trailing dimension (i.e.,
starting from the end) the
axis lengths match.
• or if either of the lengths is
1. Broadcasting is then
performed over the missing
or length 1 dimensions.
14
Basic Indexing and Slicing
• Similar to Python for one- arr = [Link](6)
dimensional arrays. arr[3:5] = 12
arr
array([ 0, 1, 2, 12, 12, 5])
• Array slices are views on the arr_slice = arr[3:5]
original array. arr_slice[1] = 1000
arr
Contrast to arr[3:5].copy() array([ 0, 1, 2, 12, 1000, 5])
15
Basic Indexing and Slicing
• In a two-dimensional array, a = [Link]([[1, 2, 3], [4, 5, 6]])
individual elements can be
a[0][2]
accessed:
• recursively or 3
• by passing a comma- a[1, 2]
separated list of indices 6
• In multi-dimensional arrays,
if you omit later indices, the a = [Link]((2, 3, 4))
returned object will be a
a[0].shape
lower dimensional array of
all the data along the higher (3, 4)
dimensions.
16
Basic Indexing and Slicing
arr2d
• ndarrays can be sliced with array([[1, 2, 3],
the familiar syntax. [4, 5, 6],
[7, 8, 9]])
• Multiple slices arr2d[:2, 1:]
• Slice in a row array([[2, 3],
[5, 6]])
• Using : to take the entire arr2d[1, :2]
access array([4, 5])
arr2d[:, 0]
• Slices are different than arr2d[:, :1] array([1, 4, 7])
indices array([[1],
[4],
[7]]) 17
Boolean Indexing
The Boolean array must be of the same
length as the array axis it’s indexing.
• Use Boolean arrays to select
items with True.
names = [Link](['Bob', 'Joe',
data[names == 'Bob']
'Will', 'Bob', 'Joe'])
data = [Link](5, 3)
array([[ 0.2817, 0.769 , 1.2464],
data [-0.4386, -0.5397, 0.477 ]])
array([[ 0.2817, 0.769 , 1.2464],
[-1.2962, 0.275 , 0.2289], data[names == 'Bob', 2]
[ 0.8864, -2.0016, -0.3718], array([ 1.2464, 0.477 ])
[-0.4386, -0.5397, 0.477 ],
[-0.8312, -2.3702, -1.8608]])
18
Boolean Indexing
• The operators !=, <, <=, >,
>=, ~, & (and), and | (or) can
be used to build Boolean
data[data < 0] = 0
arrays.
data
• Setting values with Boolean array([[ 0.2817, 0.769 , 1.2464],
arrays also works. [ 0. , 0.275 , 0.2289],
[ 0.8864, 0. , 0. ],
[ 0. , 0. , 0.477 ],
[ 0. , 0. , 0. ]])
19
Fancy Indexing
• Is indexing using integer arr[[4, 3, 0]]
array([[16, 17, 18, 19],
arrays.
[12, 13, 14, 15],
• Creates new array. [ 0, 1, 2, 3]])
arr = [Link](20).reshape((5, 4))
arr arr[[1, 2], [0, 2]]
array([[ 0, 1, 2, 3], array([ 4, 10])
[ 4, 5, 6, 7],
[ 8, 9, 10, 11], The result is always one-dimensional
[12, 13, 14, 15],
[16, 17, 18, 19]])
20
Transposing Arrays and Swapping Axes
• Transposing returns a view arr = [Link](15).reshape((3, 5))
arr
without copying anything
array([[ 0, 1, 2, 3, 4],
using:
[ 5, 6, 7, 8, 9],
1. T special attribute [10, 11, 12, 13, 14]])
arr.T
2. .transpose((1,0)) array([[ 0, 5, 10],
method [ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])
21
Transposing Arrays and Swapping Axes
• For dimensions higher than arr = [Link](24).reshape((2, 3, 4))
[Link]
2, transpose accepts a tuple
(2, 3, 4)
of axis numbers to permute
the axes. [Link]
(4, 3, 2)
• swapaxes takes a pair of axis
numbers and switches the [Link]((0, 2, 1)).shape
(2, 4, 3)
indicated axes to rearrange
the data.
[Link](1, 0).shape
(3, 2, 4)
22
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
23
4.2 Universal Functions: Fast Element-Wise
Array Functions
• Rich set of fast functions. arr = [Link](4)
[Link](arr)
• ufunc is a function that
performs element-wise array([ 0. , 1. , 1.4142, 1.7321])
operations. arr
array([0, 1, 2, 3])
• Accepts an optional out
argument that allows them
to operate in-place. [Link](arr, arr)
array([ 0. , 1. , 1.4142, 1.7321])
• There are unary and binary
functions. arr
array([ 0. , 1. , 1.4142, 1.7321])
24
Unary Universal Functions
25
Unary Universal Functions – cont.
26
Binary Universal Functions
27
Binary Universal Functions – cont.
28
Outline
Introduction
4.1 The NumPy ndarray: A
Multidimensional Array Object • Expressing Conditional Logic as
4.2 Universal Functions: Fast Array Operations
Element-Wise Array Functions
• Mathematical and Statistical
4.3 Array-Oriented Programming Methods
with Arrays
• Methods for Boolean Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra • Sorting
4.6 Pseudorandom Number • Unique and Other Set Logic
Generation
29
Expressing Conditional Logic as Array
Operations
• Python ternary expression: s = 'one'
value = true-expr if 1 if s == 'one' else 0
condition else false-expr 1
a = [[ 1,-1], b = [[1, 2],
• NumPy has [Link]()
[-1, 1]] [3, 4]]
function that accepts
• Boolean array
[Link](a > 0, 5, b)
• True expression
array([[5, 2],
• False expression
[3, 5]])
30
Mathematical and Statistical Methods
• Mathematical functions that arr = [[1, 2, 3],
compute statistics about an [4, 5, 6]]
entire array. [Link](arr)
• Call the instance method or 21
[Link]()
the top-level NumPy
21
function.
[Link](axis = 0)
• Can compute along an axis. array([5, 7, 9])
• Not all functions are [Link](axis = 1)
reductions. array([ 6, 15])
[Link]()
array([ 1, 3, 6, 10, 15, 21]) 31
Basic Array Statistical Methods
std = sqrt(var)
[Link](arr)
6
[Link]()
5
32
Methods for Boolean Arrays
• Boolean values are coerced arr = [[1, 2, 3],
to 1 (True) and 0 (False). [4, 5, 6]]
arr > 4
array([[False, False, False],
[False, True, True]])
• Useful function: (arr > 4).sum()
• any() 2
(arr > 4).any()
• all()
True
(arr > 4).all()
False
33
Sorting
arr = [[1, 4, 3],
• The top-level NumPy [5, 2, 6]]
[Link] supports: [Link](arr)
1. Along the last axis (default) array([[1, 3, 4],
2. Any axis you select [2, 5, 6]])
3. Flattened [Link](arr, axis = None)
array([1, 2, 3, 4, 5, 6])
• The instance method sorts [Link](axis = 0)
in-place and supports: arr
1. Along the last axis
array([[1, 2, 3],
2. Any axis you select
[5, 4, 6]])
34
Unique and Other Set Logic
• NumPy has some basic set operations for one-dimensional
ndarrays.
35
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
36
4.4 File Input and Output with Arrays
• NumPy is able to save and [Link]('file_1', arr)
load data to and from disk …
either in text or binary loaded_arr = [Link]('file_1.npy')
format.
• [Link] and [Link] are [Link]('file_2.npz',
used for efficiently saving a=arr, b=arr2)
and loading in binary …
format. arch = [Link]('file_2.npz')
• For multiple arrays, use arch['b']
[Link]. Load dictionary-like. array([0, 1, 2, 3, 4, 5])
37
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
38
4.5 Linear Algebra
• NumPy supports linear a = [[ 1, 2], b = [[-1, -1],
algebra, like matrix [ 3, 4]] [ 1, 1]]
multiplication,
decompositions, a * b
determinants, and other array([[-1, -2],
square matrix math. [ 3, 4]])
• * is element wise operator. ≡ a @ b
[Link](a, b)
• Use [Link] for matrix
array([[1, 1],
multiplication.
[1, 1]])
39
Commonly used [Link] functions
40
Commonly used [Link] functions
from [Link] import det from [Link] import inv
a = [[ 1, 2],
[ 3, 4]] inv(a)
[Link](a) array([[-2. , 1. ],
array([1, 4]) [ 1.5, -0.5]])
[Link](a)
5 [Link](inv(a))
det(a) array([[1.000e+00, 1.110e-16],
-2. [0.000e+00, 1.000e+00]])
41
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
42
4.6 Pseudorandom Number Generation
Array shape
• [Link] provides
functions for efficiently [Link](2)
generating whole arrays of array([-0.16455161, 0.58873714])
sample values from many
kinds of probability [Link](loc=3.,
distributions. scale=.01, size=(3, 2))
• Example: Normal array([[3.01793583, 3.0055783 ],
distribution. [3.00251166, 3.00951863],
[2.99502288, 2.99333826]])
43
4.6 Pseudorandom Number Generation
• Generates pseudorandom
numbers by an algorithm
with deterministic behavior [Link](7)
based on the seed.
• You can change the global rng = [Link](7)
seed using seed(). [Link](10)
…
• RandomState() creates a
random number generator
isolated from others.
44
Important [Link] functions
Simulating 10 coin flips:
draws = [Link](0, 2, size=10)
steps = [Link](draws > 0, 1, -1)
45
Homework
• Solve the homework on NumPy
46
Summary
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation
47