numpyintro-pdf
numpyintro-pdf
Introduction to NumPy
Lab Objective: NumPy is a powerful Python package for manipulating data with multi-dimensional
vectors. Its versatility and speed makes Python an ideal language for applied and computational
mathematics. In this lab we introduce basic NumPy data structures and operations as a first step to
numerical computing in Python.
Arrays
In many algorithms, data can be represented mathematically as a vector or a matrix. Conceptually,
a vector is just a list of numbers and a matrix is a two-dimensional list of numbers (a list of lists).
However, even basic linear algebra operations like matrix multiplication are cumbersome to implement
and slow to execute when data is stored this way. The NumPy module1 offers a much better solution.
The basic object in NumPy is the array, which is conceptually similar to a matrix. The NumPy
array class is called ndarray (for “n-dimensional array”). The simplest way to explicitly create a 1-D
ndarray is to define a list, then cast that list as an ndarray with NumPy’s array() function.
1
2 Lab 4. Introduction to NumPy
Problem 1. There are two main ways to perform matrix multiplication in NumPy: with
NumPy’s dot() function (np.dot(A, B)), or with the @ operator (A @ B). Write a function
that defines the following matrices as NumPy arrays.
2 6 −5 3
3 −1 4
A= B = 5 −8 9 7
1 5 −9
9 −3 −2 −3
Achtung!
The @ operator was not introduced until Python 3.5. It triggers the __matmul__() magic
method,a which for the ndarray is essentially a wrapper around np.dot(). If you are using a
previous version of Python, always use np.dot() to perform basic matrix multiplication.
a See the lab on Object Oriented Programming for an overview of magic methods.
NumPy arrays act like mathematical vectors and matrices: + and * perform component-wise
addition or multiplication.
Problem 2. Write a function that defines the following matrix as a NumPy array.
3 1 4
A= 1 5 9
−5 3 1
Array Attributes
An ndarray object has several attributes, some of which are listed below.
Attribute Description
dtype The type of the elements in the array.
ndim The number of axes (dimensions) of the array.
shape A tuple of integers indicating the size in each dimension.
size The total number of elements in the array.
4 Lab 4. Introduction to NumPy
Note that ndim is the number of entries in shape, and that the size of the array is the product
of the entries of shape.
Function Returns
arange() Array of sequential integers (like list(range())).
eye() 2-D array with ones on the diagonal and zeros elsewhere.
ones() Array of given shape and type, filled with ones.
ones_like() Array of ones with the same shape and type as a given array.
zeros() Array of given shape and type, filled with zeros.
zeros_like() Array of zeros with the same shape and type as a given array.
full() Array of given shape and type, filled with a specified value.
full_like() Full array with the same shape and type as a given array.
Each of these functions accepts the keyword argument dtype to specify the data type. Common
types include np.bool_, np.int64, np.float64, and np.complex128.
Unlike native Python data structures, all elements of a NumPy array must be of the
same data type. To change an existing array’s data type, use the array’s astype() method.
The following functions are for dealing with the diagonal, upper, or lower portion of an array.
Function Description
diag() Extract a diagonal or construct a diagonal array.
tril() Get the lower-triangular portion of an array by replacing entries above
the diagonal with zeros.
triu() Get the upper-triangular portion of an array by replacing entries below
the diagonal with zeros.
# diag() can also be used to create a diagonal matrix from a 1-D array.
>>> np.diag([1, 11, 111])
array([[ 1, 0, 0],
[ 0, 11, 0],
[ 0, 0, 111]])
Problem 3. Write a function that defines the following matrices as NumPy arrays using the
functions presented in this section, not np.array(). Calculate the matrix product ABA.
Change the data type of the resulting matrix to np.int64, then return it.
1 1 1 1 1 1 1 −1 5 5 5 5 5 5
0 1 1 1 1 1 1
−1 −1 5 5 5 5 5
0 0 1 1 1 1 1
−1 −1 −1 5 5 5 5
A=
0 0 0 1 1 1 1
B=
−1 −1 −1 −1 5 5 5
0 0 0 0 1 1 1
−1 −1 −1 −1 −1 5 5
0 0 0 0 0 1 1 −1 −1 −1 −1 −1 −1 5
0 0 0 0 0 0 1 −1 −1 −1 −1 −1 −1 −1
Data Access
Array Slicing
Indexing for a 1-D NumPy array uses the slicing syntax x[start:stop:step]. If there is no colon,
a single entry of that dimension is accessed. With a colon, a range of values is accessed. For multi-
dimensional arrays, use a comma to separate slicing syntax for each axis.
>>> A = np.array([[0,1,2,3,4],[5,6,7,8,9]])
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Note
Indexing and slicing operations return a view of the array. Changing a view of an array also
changes the original array. In other words, arrays are mutable. To create a copy of an array,
use np.copy() or the array’s copy() method. Changes to a copy of an array does not affect
the original array, but copying an array uses more time and memory than getting a view.
Fancy Indexing
So-called fancy indexing is a second way to access or change the elements of an array. Instead of
using slicing syntax, provide either an array of indices or an array of boolean values (called a mask )
to extract specific elements.
# A boolean array extracts the elements of 'x' at the same places as 'True'.
>>> mask = np.array([True, False, False, True, False])
>>> x[mask] # Get the 0th and 3rd entries.
array([ 0, 30])
Fancy indexing is especially useful for extracting or changing the values of an array that meet
some sort of criterion. Use comparison operators like < and == to create masks.
While indexing and slicing always return a view, fancy indexing always returns a copy.
8 Lab 4. Introduction to NumPy
Problem 4. Write a function that accepts a single array as input. Make a copy of the array,
then use fancy indexing to set all negative entries of the copy to 0. Return the copy.
Array Manipulation
Shaping
An array’s shape attribute describes its dimensions. Use np.reshape() or the array’s reshape()
method to give an array a new shape. The total number of entries in the old array and the new
array must be the same in order for the shaping to work correctly. Using a -1 in the new shape tuple
makes the specified dimension as long as necessary.
# Reshape 'A' into an array with 2 rows and the appropriate number of columns.
>>> A.reshape((2,-1))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Use np.ravel() to flatten a multi-dimensional array into a 1-D array and np.transpose() or
the T attribute to transpose a 2-D array in the matrix sense.
>>> A = np.arange(12).reshape((3,4))
>>> A
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Note
By default, all NumPy arrays that can be represented by a single dimension, including column
slices, are automatically reshaped into “flat” 1-D arrays. For example, by default an array will
have 10 elements instead of 10 arrays with one element each. Though we usually represent
vectors vertically in mathematical notation, NumPy methods such as dot() are implemented
to purposefully work well with 1-D “row arrays”.
>>> A = np.arange(10).reshape((2,5))
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
However, it is occasionally necessary to change a 1-D array into a “column array”. Use
np.reshape(), np.vstack(), or slice the array and put np.newaxis on the second axis. Note
that np.transpose() does not alter 1-D arrays.
>>> x = np.arange(3)
>>> x
array([0, 1, 2])
Stacking
NumPy has functions for stacking two or more arrays with similar dimensions into a single block
matrix. Each of these methods takes in a single tuple of arrays to be stacked in sequence.
Function Description
concatenate() Join a sequence of arrays along an existing axis
hstack() Stack arrays in sequence horizontally (column wise).
vstack() Stack arrays in sequence vertically (row wise).
column_stack() Stack 1-D arrays as columns into a 2-D array.
10 Lab 4. Introduction to NumPy
>>> A = np.arange(6).reshape((2,3))
>>> B = np.zeros((4,3))
>>> A = A.T
>>> B = np.ones((3,4))
See http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.array-manipulation.html
for more array manipulation routines and documentation.
Problem 5. Write a function that defines the following matrices as NumPy arrays.
3 0 0 −2 0 0
0 2 4
A= B= 3 3 0 C = 0 −2 0
1 3 5
3 3 3 0 0 −2
Use NumPy’s stacking functions to create and return the block matrix:
0 AT I
A 0 0 ,
B 0 C
where I is the 3 × 3 identity matrix and each 0 is a matrix of all zeros of appropriate size.
A block matrix of this form is used in the Interior Point method for linear optimization.
11
Array Broadcasting
Many matrix operations make sense only when the two operands have the same shape, such as
element-wise addition. Array broadcasting extends such operations to accept some (but not all)
operands with different shapes, and occurs automatically whenever possible.
Suppose, for example, that we would like to add different values to the columns of an m × n
matrix A. Adding a 1-D array x with the n entries to A will automatically do this correctly. To add
different values to the different rows of A, first reshape a 1-D array of m values into a column array.
Broadcasting then correctly takes care of the operation.
Broadcasting can also occur between two 1-D arrays, once they are reshaped appropriately.
>>> A = np.arange(12).reshape((4,3))
>>> x = np.arange(3)
>>> A
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
>>> x
array([0, 1, 2])
Function Description
abs() or absolute() Calculate the absolute value element-wise.
exp() / log() Exponential (ex ) / natural log element-wise.
maximum() / minimum() Element-wise maximum / minimum of two arrays.
sqrt() The positive square-root, element-wise.
sin(), cos(), tan(), etc. Element-wise trigonometric operations.
>>> x = np.arange(-2,3)
>>> print(x, np.abs(x)) # Like np.array([abs(i) for i in x]).
[-2 -1 0 1 2] [2 1 0 1 2]
Achtung!
The math module has many useful functions for numerical computations. However, most of
these functions can only act on single numbers, not on arrays. NumPy functions can act on
either scalars or entire arrays, but math functions tend to be a little faster for acting on scalars.
Always use universal NumPy functions, not the math module, when working with arrays.
13
The np.ndarray class itself has many useful methods for numerical computations.
Method Returns
all() True if all elements evaluate to True.
any() True if any elements evaluate to True.
argmax() Index of the maximum value.
argmin() Index of the minimum value.
argsort() Indices that would sort the array.
clip() restrict values in an array to fit within a given range
max() The maximum element of the array.
mean() The average value of the array.
min() The minimum element of the array.
sort() Return nothing; sort the array in-place.
std() The standard deviation of the array.
sum() The sum of the elements of the array.
var() The variance of the array.
Each of these np.ndarray methods has an equivalent NumPy function. For example, A.max()
and np.max(A) operate the same way. The one exception is the sort() function: np.sort() returns
a sorted copy of the array, while A.sort() sorts the array in-place and returns nothing.
Every method listed can operate along an axis via the keyword argument axis. If axis is
specified for a method on an n-D array, the return value is an (n − 1)-D array, the specified axis
having been collapsed in the evaluation process. If axis is not specified, the return value is usually
a scalar. Refer to the NumPy Visual Guide in the appendix for more visual examples.
>>> A = np.arange(9).reshape((3,3))
>>> A
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Problem 6. A matrix is called row-stochastic a if its rows each sum to 1. Stochastic matrices
are fundamentally important for finite discrete random processes and some machine learning
algorithms.
Write a function than accepts a matrix (as a 2-D array). Divide each row of the matrix by
the row sum and return the new row-stochastic matrix. Use array broadcasting and the axis
argument instead of a loop.
a Similarly, a matrix is called column-stochastic if its columns each sum to 1.
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
One way to approach this problem is to iterate through the rows and columns of the array,
checking small slices of the array at each iteration and updating the current largest product.
Array slicing, however, provides a much more efficient solution.
15
The naïve method for computing the greatest product of four adjacent numbers in a
horizontal row might be as follows:
>>> winner = 0
>>> for i in range(20):
... for j in range(17):
... winner = max(np.prod(grid[i,j:j+4]), winner)
...
>>> winner
48477312
Instead, use array slicing to construct a single array where the (i, j)th entry is the product
of the four numbers to the right of the (i, j)th entry in the original grid. Then find the largest
element in the new array.
Use slicing to similarly find the greatest products of four vertical, right diagonal, and left
diagonal adjacent numbers.
(Hint: Consider drawing the portions of the grid that each slice in the above code covers, like
the examples in the visual guide. Then draw the slices that produce vertical, right diagonal, or
left diagonal sequences, and translate the pictures into slicing syntax.)
Achtung!
All of the examples in this lab use NumPy arrays, objects of type np.ndarray. NumPy also
has a “matrix” data structure called np.matrix that was built specifically for MATLAB users
who are transitioning to Python and NumPy. It behaves slightly differently than the regular
array class, and can cause some unexpected and subtle problems.
For consistency (and your sanity), never use a NumPy matrix; always use NumPy arrays.
If necessary, cast a matrix object as an array with np.array().
16 Lab 4. Introduction to NumPy
Additional Material
Random Sampling
The submodule np.random holds many functions for creating arrays of random values chosen from
probability distributions such as the uniform, normal, and multinomial distributions. It also contains
some utility functions for getting non-distributional random samples, such as random integers or
random samples from a given array.
Function Description
choice() Take random samples from a 1-D array.
random() Uniformly distributed floats over [0, 1).
randint() Random integers over a half-open interval.
random_integers() Random integers over a closed interval.
randn() Sample from the standard normal distribution.
permutation() Randomly permute a sequence / generate a random sequence.
Function Distribution
beta() Beta distribution over [0, 1].
binomial() Binomial distribution.
exponential() Exponential distribution.
gamma() Gamma distribution.
geometric() Geometric distribution.
multinomial() Multivariate generalization of the binomial distribution.
multivariate_normal() Multivariate generalization of the normal distribution.
normal() Normal / Gaussian distribution.
poisson() Poisson distribution.
uniform() Uniform distribution.
Note that many of these functions have counterparts in the standard library’s random module.
These NumPy functions, however, are much better suited for working with large collections of random
samples.
Function Description
save() Save a single array to a .npy file.
savez() Save multiple arrays to a .npz file.
savetxt() Save a single array to a .txt file.
load() Load and return an array or arrays from a .npy or .npz file.
loadtxt() Load and return an array from a text file.
# Read the array from the file and check that it matches the original.
>>> y = np.load("uniform.npy") # Or np.loadtxt("uniform.txt").
>>> np.allclose(x, y) # Check that x and y are close entry-wise.
True
To save several arrays to a single file, specify a keyword argument for each array in np.savez().
Then np.load() will return a dictionary-like object with the keyword parameter names from the
save command as the keys.
# Read the arrays from the file and check that they match the original.
>>> arrays = np.load("normal.npz")
>>> np.allclose(x, arrays["first"])
True
>>> np.allclose(y, arrays["second"])
True