0% found this document useful (0 votes)
13 views

05 NumPy - Arrays and Vectorized Computation

The document provides an overview of NumPy, a foundational package for numerical computing in Python, covering its multidimensional array object, universal functions for fast element-wise operations, and various programming techniques. It includes details on array creation, data types, arithmetic operations, indexing, and advanced features like broadcasting and boolean indexing. Additionally, it discusses mathematical methods and sorting functionalities within NumPy.

Uploaded by

ambikavarmak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

05 NumPy - Arrays and Vectorized Computation

The document provides an overview of NumPy, a foundational package for numerical computing in Python, covering its multidimensional array object, universal functions for fast element-wise operations, and various programming techniques. It includes details on array creation, data types, arithmetic operations, indexing, and advanced features like broadcasting and boolean indexing. Additionally, it discusses mathematical methods and sorting functionalities within NumPy.

Uploaded by

ambikavarmak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

NumPy: Arrays and

Vectorized Computation

Prof. Gheith Abandah

1
Reference

• Wes McKinney, Python for Data Analysis: Data Wrangling


with Pandas, NumPy, and IPython, O’Reilly Media, 2nd
Edition, 2018.
• Material: https://github.com/wesm/pydata-book

2
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

3
NumPy: Numerical Python
• One of the most important foundational packages for fast
numerical computing in Python.
• Most computational packages providing scientific
functionality use NumPy’s array objects for data exchange.
• NumPy internally stores data in a contiguous block of
memory.
• NumPy’s library of algorithms written in the C language can
operate on this memory without any type checking or other
overhead.

4
NumPy is Fast
In [7]: import numpy as np
In [8]: my_arr = np.arange(1000000)
In [9]: my_list = list(range(1000000))

In [10]: %time for _ in range(10): my_arr2 = my_arr * 2


CPU times: user 20 ms, sys: 50 ms, total: 70 ms
Wall time: 72.4 ms

In [11]: %time for _ in range(10): my_list2 = [x * 2 for x in my_list]


CPU times: user 760 ms, sys: 290 ms, total: 1.05 s
Wall time: 1.05 s
5
Outline
Introduction
4.1 The NumPy ndarray: A • Creating ndarrays
Multidimensional Array Object • Data Types for ndarrays
4.2 Universal Functions: Fast
• Arithmetic with NumPy Arrays
Element-Wise Array Functions
4.3 Array-Oriented Programming • Basic Indexing and Slicing
with Arrays • Boolean Indexing
4.4 File Input and Output with Arrays • Fancy Indexing
4.5 Linear Algebra
• Transposing Arrays and Swapping
4.6 Pseudorandom Number Axes
Generation
6
Creating ndarrays
• You can create NumPy data2 = [[1,2,3,4], [5,6,7,8]]
arrays from lists. arr2 = np.array(data2)
arr2
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
• Arrays have .ndim and
.shape attributes. arr2.ndim
2
arr2.shape
(2, 4)
7
Array Creation Functions

8
Data Types for ndarrays
• The data type or dtype is a a = np.full((2, 3, 2), 7,
special object containing the dtype=int)
information to interpret a a
Or np.int32
chunk of memory as a array([[[ 7, 7], Or 'i4'
[ 7, 7],
particular type of data.
[ 7, 7]],
[[ 7, 7],
• Arrays have .dtype [ 7, 7],
attribute. [ 7, 7]]])
a.dtype
dtype('int32')
9
NumPy
Data
Types
int

float

10
Data Types for ndarrays
• You can explicitly convert or af = np.array([3.7, -1.2, -2.6, 0.5])
ai = af.astype(np.int32)
cast an array from one dtype
ai
to another. array([ 3, -1, -2, 0], dtype=int32)

• NumPy can convert strings as = np.array(['1.25', '-9.6', '42'],


dtype=np.string_)
to numbers, but Pandas is
af = as.astype(float)
better. array([ 1.25, -9.6 , 42. ])

11
Arithmetic with NumPy Arrays
• Any arithmetic operations arr = np.array([[1., 2., 3.],
between equal-size arrays [4., 5., 6.]])
applies the operation arr * arr
element-wise. array([[ 1., 4., 9.],
[ 16., 25., 36.]])

• Arithmetic operations with


1 / arr
scalars propagate the scalar
array([[ 1., 0.5, 0.3333],
argument to each element
[ 0.25, 0.2, 0.1667]])
in the array.
12
Arithmetic with NumPy Arrays
• Operations between a1 = np.array([[0., 0., 0.],
differently sized arrays is [1., 1., 1.],
called broadcasting. [2., 2., 2.],
[3., 3., 3.]])
a2 = np.array([1., 2., 3.])
a1 + a2
array([[1., 2., 3.],
[2., 3., 4.],
[3., 4., 5.],
[4., 5., 6.]])
13
The Broadcasting Rule
• Two arrays are compatible
for broadcasting if for each
trailing dimension (i.e.,
starting from the end) the
axis lengths match.
• or if either of the lengths is
1. Broadcasting is then
performed over the missing
or length 1 dimensions.
14
Basic Indexing and Slicing
• Similar to Python for one- arr = np.arange(6)
dimensional arrays. arr[3:5] = 12
arr
array([ 0, 1, 2, 12, 12, 5])

• Array slices are views on the arr_slice = arr[3:5]


original array. arr_slice[1] = 1000
arr
Contrast to arr[3:5].copy() array([ 0, 1, 2, 12, 1000, 5])

15
Basic Indexing and Slicing
• In a two-dimensional array, a = np.array([[1, 2, 3], [4, 5, 6]])
individual elements can be
a[0][2]
accessed:
• recursively or 3
• by passing a comma- a[1, 2]
separated list of indices 6
• In multi-dimensional arrays,
if you omit later indices, the a = np.zeros((2, 3, 4))
returned object will be a
a[0].shape
lower dimensional array of
all the data along the higher (3, 4)
dimensions.
16
Basic Indexing and Slicing
arr2d
• ndarrays can be sliced with array([[1, 2, 3],

the familiar syntax. [4, 5, 6],


[7, 8, 9]])
• Multiple slices arr2d[:2, 1:]
• Slice in a row array([[2, 3],
[5, 6]])
• Using : to take the entire arr2d[1, :2]
access array([4, 5])
arr2d[:, 0]
• Slices are different than arr2d[:, :1] array([1, 4, 7])
indices array([[1],
[4],
[7]]) 17
Boolean Indexing
The Boolean array must be of the same
length as the array axis it’s indexing.
• Use Boolean arrays to select
items with True.
names = np.array(['Bob', 'Joe',
data[names == 'Bob']
'Will', 'Bob', 'Joe'])
data = np.random.randn(5, 3)
array([[ 0.2817, 0.769 , 1.2464],
data [-0.4386, -0.5397, 0.477 ]])
array([[ 0.2817, 0.769 , 1.2464],
[-1.2962, 0.275 , 0.2289], data[names == 'Bob', 2]
[ 0.8864, -2.0016, -0.3718], array([ 1.2464, 0.477 ])
[-0.4386, -0.5397, 0.477 ],
[-0.8312, -2.3702, -1.8608]])

18
Boolean Indexing
• The operators !=, <, <=, >,
>=, ~, & (and), and | (or) can
be used to build Boolean
data[data < 0] = 0
arrays.
data
• Setting values with Boolean array([[ 0.2817, 0.769 , 1.2464],
arrays also works. [ 0. , 0.275 , 0.2289],
[ 0.8864, 0. , 0. ],
[ 0. , 0. , 0.477 ],
[ 0. , 0. , 0. ]])

19
Fancy Indexing
• Is indexing using integer arr[[4, 3, 0]]
array([[16, 17, 18, 19],
arrays.
[12, 13, 14, 15],
• Creates new array. [ 0, 1, 2, 3]])
arr = np.arange(20).reshape((5, 4))
arr arr[[1, 2], [0, 2]]
array([[ 0, 1, 2, 3], array([ 4, 10])
[ 4, 5, 6, 7],
[ 8, 9, 10, 11], The result is always one-dimensional
[12, 13, 14, 15],
[16, 17, 18, 19]])

20
Transposing Arrays and Swapping Axes
• Transposing returns a view arr = np.arange(15).reshape((3, 5))
arr
without copying anything
array([[ 0, 1, 2, 3, 4],
using:
[ 5, 6, 7, 8, 9],
1. T special attribute [10, 11, 12, 13, 14]])
arr.T
2. .transpose((1,0)) array([[ 0, 5, 10],
method [ 1, 6, 11],
[ 2, 7, 12],
[ 3, 8, 13],
[ 4, 9, 14]])

21
Transposing Arrays and Swapping Axes
• For dimensions higher than arr = np.arange(24).reshape((2, 3, 4))
arr.shape
2, transpose accepts a tuple
(2, 3, 4)
of axis numbers to permute
the axes. arr.T.shape
(4, 3, 2)
• swapaxes takes a pair of axis
numbers and switches the arr.transpose((0, 2, 1)).shape
(2, 4, 3)
indicated axes to rearrange
the data.
arr.swapaxes(1, 0).shape
(3, 2, 4)
22
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

23
4.2 Universal Functions: Fast Element-Wise
Array Functions
• Rich set of fast functions. arr = np.arange(4)
np.sqrt(arr)
• ufunc is a function that
performs element-wise array([ 0. , 1. , 1.4142, 1.7321])
operations. arr
array([0, 1, 2, 3])
• Accepts an optional out
argument that allows them
to operate in-place. np.sqrt(arr, arr)
array([ 0. , 1. , 1.4142, 1.7321])
• There are unary and binary
functions. arr
array([ 0. , 1. , 1.4142, 1.7321])

24
Unary Universal Functions

25
Unary Universal Functions – cont.

26
Binary Universal Functions

27
Binary Universal Functions – cont.

28
Outline
Introduction
4.1 The NumPy ndarray: A
Multidimensional Array Object • Expressing Conditional Logic as
4.2 Universal Functions: Fast Array Operations
Element-Wise Array Functions
• Mathematical and Statistical
4.3 Array-Oriented Programming Methods
with Arrays
• Methods for Boolean Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra • Sorting
4.6 Pseudorandom Number • Unique and Other Set Logic
Generation
29
Expressing Conditional Logic as Array
Operations
• Python ternary expression: s = 'one'
value = true-expr if 1 if s == 'one' else 0
condition else false-expr 1

a = [[ 1,-1], b = [[1, 2],


• NumPy has np.where()
[-1, 1]] [3, 4]]
function that accepts
• Boolean array
np.where(a > 0, 5, b)
• True expression
array([[5, 2],
• False expression
[3, 5]])
30
Mathematical and Statistical Methods
• Mathematical functions that arr = [[1, 2, 3],
compute statistics about an [4, 5, 6]]
entire array. np.sum(arr)

• Call the instance method or 21


arr.sum()
the top-level NumPy
21
function.
arr.sum(axis = 0)
• Can compute along an axis. array([5, 7, 9])
• Not all functions are arr.sum(axis = 1)
reductions. array([ 6, 15])
arr.cumsum()
array([ 1, 3, 6, 10, 15, 21]) 31
Basic Array Statistical Methods
std = sqrt(var)

np.max(arr)
6
arr.argmax()
5

32
Methods for Boolean Arrays
• Boolean values are coerced arr = [[1, 2, 3],
to 1 (True) and 0 (False). [4, 5, 6]]
arr > 4
array([[False, False, False],
[False, True, True]])
• Useful function: (arr > 4).sum()

• any() 2
(arr > 4).any()
• all()
True
(arr > 4).all()
False
33
Sorting
arr = [[1, 4, 3],
• The top-level NumPy [5, 2, 6]]
np.sort supports: np.sort(arr)
1. Along the last axis (default) array([[1, 3, 4],
2. Any axis you select [2, 5, 6]])
3. Flattened np.sort(arr, axis = None)
array([1, 2, 3, 4, 5, 6])
• The instance method sorts arr.sort(axis = 0)
in-place and supports: arr
1. Along the last axis
array([[1, 2, 3],
2. Any axis you select
[5, 4, 6]])
34
Unique and Other Set Logic
• NumPy has some basic set operations for one-dimensional
ndarrays.

35
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

36
4.4 File Input and Output with Arrays
• NumPy is able to save and np.save('file_1', arr)
load data to and from disk …
either in text or binary loaded_arr = np.load('file_1.npy')
format.
• np.save and np.load are np.savez('file_2.npz',
used for efficiently saving a=arr, b=arr2)
and loading in binary …
format. arch = np.load('file_2.npz')
• For multiple arrays, use arch['b']
np.savez. Load dictionary-like. array([0, 1, 2, 3, 4, 5])

37
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

38
4.5 Linear Algebra
• NumPy supports linear a = [[ 1, 2], b = [[-1, -1],
algebra, like matrix [ 3, 4]] [ 1, 1]]
multiplication,
decompositions, a * b
determinants, and other array([[-1, -2],
square matrix math. [ 3, 4]])
• * is element wise operator. ≡ a @ b
np.dot(a, b)
• Use np.dot for matrix
array([[1, 1],
multiplication.
[1, 1]])
39
Commonly used numpy.linalg functions

40
Commonly used numpy.linalg functions
from numpy.linalg import det from numpy.linalg import inv
a = [[ 1, 2],
[ 3, 4]] inv(a)
np.diag(a) array([[-2. , 1. ],
array([1, 4]) [ 1.5, -0.5]])
np.trace(a)
5 a.dot(inv(a))
det(a) array([[1.000e+00, 1.110e-16],
-2. [0.000e+00, 1.000e+00]])

41
Outline
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

42
4.6 Pseudorandom Number Generation
Array shape
• numpy.random provides
functions for efficiently np.random.randn(2)
generating whole arrays of array([-0.16455161, 0.58873714])
sample values from many
kinds of probability np.random.normal(loc=3.,
distributions. scale=.01, size=(3, 2))
• Example: Normal array([[3.01793583, 3.0055783 ],
distribution. [3.00251166, 3.00951863],
[2.99502288, 2.99333826]])

43
4.6 Pseudorandom Number Generation
• Generates pseudorandom
numbers by an algorithm
with deterministic behavior np.random.seed(7)
based on the seed.
• You can change the global rng = np.random.RandomState(7)
seed using seed(). rng.randn(10)

• RandomState() creates a
random number generator
isolated from others.
44
Important numpy.random functions

Simulating 10 coin flips:


draws = np.random.randint(0, 2, size=10)
steps = np.where(draws > 0, 1, -1)

45
Homework
• Solve the homework on NumPy

46
Summary
Introduction
4.1 The NumPy ndarray: A Multidimensional Array Object
4.2 Universal Functions: Fast Element-Wise Array Functions
4.3 Array-Oriented Programming with Arrays
4.4 File Input and Output with Arrays
4.5 Linear Algebra
4.6 Pseudorandom Number Generation

47

You might also like