0% found this document useful (0 votes)
37 views12 pages

NumPy: Efficient Numerical Computing

NumPy is an open-source Python library for numerical computing that provides efficient tools for working with arrays and mathematical functions. It features multidimensional arrays, vectorization, and interoperability with other libraries, making it faster and more memory-efficient than Python lists. Key operations include array creation, reshaping, indexing, and broadcasting, which enhance performance in scientific and machine learning applications.

Uploaded by

2002bhoopesh2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views12 pages

NumPy: Efficient Numerical Computing

NumPy is an open-source Python library for numerical computing that provides efficient tools for working with arrays and mathematical functions. It features multidimensional arrays, vectorization, and interoperability with other libraries, making it faster and more memory-efficient than Python lists. Key operations include array creation, reshaping, indexing, and broadcasting, which enhance performance in scientific and machine learning applications.

Uploaded by

2002bhoopesh2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NumPy

What is NumPy?

NumPy (short form of Numerical Python) is an open-source Python library used for
numerical computing. It provides tools for working efficiently with arrays, matrices, and
mathematical functions.
NumPy introduces a new data structure called the ndarray (n-dimensional array),
which is much faster and more memory-efficient than Python’s built-in lists for
numerical operations.

Important Features

• Multidimensional Arrays:
Efficient storage and manipulation of large datasets (e.g., 1D, 2D, 3D arrays).

• Mathematical Functions:
A wide range of mathematical operations — linear algebra, Fourier transforms,
random number generation, etc.

• Vectorization:
Operations on entire arrays without explicit loops, making code cleaner and faster.

• Interoperability:
Works well with other libraries like Pandas, Matplotlib, TensorFlow, PyTorch etc.

Usage

import numpy as np

# Create & Output an array


arr = [Link]([1, 2, 3, 4, 5])
print(arr)

NumPy Arrays vs Python Lists

1. NumPy arrays are faster than Python lists (implemented in C, not pure Python).
2. They store elements in contiguous memory (better cache performance).
3. They are also homogenous i.e. all elements have same type.
4. They use vectorized operations (no slow Python loops).

1
Performance Comparison

import numpy as np
import time

size = 10_000_000 # large data set of 10 million numbers

# Python Lists
python_list = list(range(size))
start = [Link]()

list_squared = [x**2 for x in python_list] # square of all nums

end = [Link]()
print("Python list time:", end - start, "seconds")

# NumPy Arrays
np_array = [Link](python_list)
start = [Link]()

array_squared = np_array ** 2 # vectorized operation


end = [Link]()

print("NumPy array time:", end - start, "seconds")

Memory Usage Comparison

# Memory Usage comparison


import sys

print("Python list size:", [Link](python_list) * len(python_list))


print("NumPy array size:", np_array.nbytes)

Creating NumPy Arrays

There are multiple ways of creating NumPy arrays, most common of which are:

1. From Python Lists

# Creating NumPy Arrays - from lists


arr = [Link]([1, 2, 3, 4])
print(arr, type(arr))

arr2 = [Link]([1, 2, 3, 4, "prime", 3.14])


print(arr2, type(arr2))

# 2D Arrays - Matrix
arr3 = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr3, [Link])

2
Note - All elements in arr2 in above code will have same type
(homogenous) unlike lists.

2. Using built-in Functions

# Creating NumPy Arrays - from scratch


arr1 = [Link]((3, 4)) # 3x4 array of 0s
print(arr1, [Link])

arr2 = [Link]((3, 3)) # 3x3 array of 1s


print(arr2, [Link])

arr3 = [Link]((2, 3), 5) # 2x3 array of 5s


print(arr3, [Link])

arr4 = [Link](3) # Identity matrix of 3x3


print(arr4, [Link])

arr5 = [Link](1, 20, 2) # Elements in range(1, 20)


print(arr5, [Link])

arr6 = [Link](0, 10, 5) # Evenly spaced array


print(arr6, [Link])

NumPy Array Properties

Array properties helps you understand and manipulate data in arrays efficiently.

# Useful Attributes
arr = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

print([Link]) # Dimensions - (4 x 3)
print([Link]) # Total elements - (12)
print([Link]) # Number of dimensions - 2
print([Link]) # Data type object - int64
print([Link]) # Size of each element in bytes - 8 for int64

3
We can also explicitly change the dtype for our arrays.

# Specify dtype at creation


str_arr = [Link]([1, 2, 3], dtype="U")
print(str_arr, str_arr.dtype)

float_arr = [Link]([1, 2, 3], dtype="float64")


print(str_arr, float_arr.dtype)

# Creating new array with a specific type from existing array


int_arr = float_arr.astype(np.int64)
print(int_arr, int_arr.dtype)

Operations on Arrays

There are a lot of useful operations that we can perform on our arrays.

1. Reshaping

arr = [Link]([1, 2, 3, 4, 5, 6])


print([Link])

reshaped = [Link]((2, 3)) # converts (1x6) => (2x3)


print(reshaped, [Link])

flattened = [Link]() # converts 2D => 1D


print(flattened, [Link])

2. Indexing

# Indexing for 1D array


arr = [Link]([1, 2, 3, 4, 5])
print(arr[0])

# Indexing for 2D array


arr = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) # 2D array
print(arr[0][1]) # 2
print(arr[1][2]) # 6

Apart from simple indexing, we can also use Fancy & Boolean indexing.
Fancy indexing means accessing array elements using integer arrays/lists of indices
rather than plain slices ( : ).

# Fancy Indexing
arr = [Link]([1, 2, 3, 4, 5])
idx = [0, 1, 4]
print(arr[idx]) # print nums at given indices

4
Boolean masking(or indexing) means using a Boolean array ( True/ False) to select
elements from another array.

# Boolean Indexing
print(arr[arr > 2]) # print nums greater than 2
print(arr[arr % 2== 0]) # print even num

3. Slicing

# Slicing 1D array
arr = [Link]([1, 2, 3, 4, 5, 6, 7])

print(arr[2:6]) # [3, 4, 5, 6]
print(arr[:6]) # [1, 2, 3, 4, 5, 6]
print(arr[3:]) # [4, 5, 6, 7]
print(arr[::2]) # [1, 3, 5, 7]

Copy v/s View


Slicing a list returns a copy but slicing a NumPy array returns a view - for efficiency.
View is like a shallow copy that shares the same data as the original array, so no
duplication happens here.
• Views are fast and memory-efficient (no data duplication).
• Copies are safe but slower and use more memory.

# Sliced List is a COPY


py_list = [1, 2, 3, 4, 5]
copy_list = py_list[1:4] # [2, 3, 4]
copy_list[1] = 333

print(copy_list)
print(py_list) # [1, 2, 3, 4, 5] - remains same

# Sliced Array is a VIEW


np_arr = [Link]([1, 2, 3, 4, 5])
view_arr = np_arr[1:4] # [2, 3, 4]
view_arr[1] = 333

print(view_arr)
print(np_arr) # [1, 2, 333, 4, 5] - changes

# Creating a COPY for Array


copy_arr = np_arr[1:4].copy() # [2, 3, 4]
copy_arr[2] = 444
print(copy_arr)
print(np_arr) # [1, 2, 3, 4, 5] - remains same

5
Multi-dimensional Arrays

Multi-dimensional arrays in NumPy are the foundation of most scientific and


machine-learning work.
A NumPy array can have any number of dimensions (1D, 2D, 3D & so on). Each
dimension is called an axis.
• 1D array has 1 axis (axis0).
• 2D array has 2 axes (axis0 = rows, axis1 = columns)
• 3D array has 3 axes (axis0 = depth/layer, axis1 = rows in each layer, axis2 =
columns in each layer)

# 1D array
arr1D = [Link]([1, 2, 3])
print([Link]) # 1

# 2D array (matrix)
arr2D = [Link]([[1, 2, 3],
[4, 5, 6]])
print([Link]) # 2

# 3D array (tensor)
arr3D = [Link]([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[10, 11, 12]]])
print([Link]) # 3

Usage of multi-dimensional arrays in Practice


We’ll look into a practical example of dealing with real-world images data. If we
have a ML/DL model working with images, then we can feed this data as 2D
or 3D arrays.
1. Grayscale Image as 2D array
A grayscale image has height × width pixels. Each pixel has a single intensity
value (0–255 for 8-bit images).
2. Color Images as 3D array
A color image(RGB) has height × width × channels. The 3 channels are →
RGB (Red, Green, Blue) & each channel is a 2D array of pixel intensities (0-
255).

We’ll cover practical implementation in later chapters.

6
Operations along Axes
We can also perform certain operations along a specific axis in an array.

arr2D = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print([Link](arr2D)) # sum of entire array - 45

sum_of_columns = [Link](arr2D, axis = 0)


print(sum_of_columns) # [12 15 18]

sum_of_rows = [Link](arr2D, axis = 1)


print(sum_of_rows) # [6 15 24]

# Slicing
print(arr2D[0:3, 1:3]) # slice rows(0, 1, 2) x cols(1, 2)
# [[2 3]
# [5 6]
# [8 9]]

Operations on 3D arrays
Let’s look at how do we work with 3D arrays:

arr3D = [Link]([[[1, 2],[3, 4],[5, 6]], [[7, 8],[9, 10],[11, 12]]])

print(arr3D, [Link])

# Indexing
print(arr3D[0][1][1]) # 4
print(arr3D[1][2][1]) # 12

print(arr3D[:, :, 0]) # first col from both layers


print(arr3D[:, 0, :]) # first row from both layers

# Manipulating data
arr3D[:, 0, :] = 99 # change first row to store 99
print(arr3D)

Data Types in NumPy

We have already discussed how every NumPy array has a single data type
(homogeneous arrays) & it is stored in the .dtype attribute.
Now, let’s look into some of the most common data types in NumPy:
• Integer literals : int32 , int64
• Floating literals : float32 , float64
• Boolean : bool

7
• Complex numbers : complex64, complex128
• String : S (byte-str) & U (unicode-str)
• Object : generic python objects – object

# Common Data Types


arr = [Link]([1, 2, 3, 4, 5])
arr2 = [Link]([1.0, 2.0, 3.0])
arr3 = [Link](["hello", "world", "prime", "ai/ml"])

print([Link]) # int64
print([Link]) # float64
print([Link]) # U

# Complex Numbers
arr1 = [Link]([2 + 3j])
arr2 = [Link]([5 + 8j])

print(arr1, [Link])
print(arr1 + arr2)
print(arr2 - arr1)

# Objects
arr = [Link] (["hello", {1, 2, 3}, 3.14])
print(arr, [Link])

We can also change the data type, either by explicitly typecasting at array creation
using dtype attribute or by using the astype method while creating a new array from
existing one.

# Changing the data type


new_arr = [Link]("float64")
print(new_arr, new_arr.dtype)

new_arr = [Link]([1, 2, 3, 4, 5], dtype="float64")


print(new_arr, new_arr.dtype)

Why does dtype matter?


• Memory efficiency
◦ np.int8 uses 1 byte per element, np.int64 uses 8 bytes.
• Performance
◦ Smaller types = faster computations.
• Compatibility
◦ Images often use np.uint8
◦ ML libraries expect float32

8
Note - In some case it is useful to do Downcasting i.e. converting type to a
smaller data type. This is done to reduce memory usage & improve
performance.

Example - Suppose you have a dataset of 1 million people’s ages. Storing


them as int64 wastes memory because ages are small numbers (0–
120). So we can downcast these values to Int8 .

Vectorization & Broadcasting in NumPy

Vectorization and Broadcasting in NumPy are two of the most powerful features for
fast numerical computations.
Vectorization
Vectorization means performing operations on entire arrays at once without explicit
Python loops.
• NumPy uses C-level implementations internally → much faster than Python loops.
• Makes code shorter, cleaner, and faster.

arr = [Link]([1, 2, 3, 4, 5])

sq_arr = arr** 2 # Square of all nums


print(sq_arr)

arr2 = [Link]([6, 7, 8, 9, 10])


print(arr + arr2) # Sum of 2 arrays

Broadcasting
Broadcasting allows NumPy to automatically expand arrays of different shapes so
that arithmetic operations can be performed. It’s basically scaling arrays without using
extra memory.
• No need to manually reshape arrays.
• Useful for combining arrays of different dimensions.
Broadcasting Rules
Broadcasting can only take place when the arrays are of compatible shape. So NumPy
compares shapes of arrays from right to left. For the array to be compatible, all
dimensions must either be:

9
• Equal, or
• 1, or
• Missing (smaller array can be “stretched”).

# Broadcasting with a Scalar


arr_mul10 = arr * 10 # Multiply by 10 to all nums
print(arr_mul10)

# Broadcasting with a Vector


arr1D = [Link]([1, 2, 3])
arr2D = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr1D + arr2D)

A quite common example of broadcasting in Vector Normalization. This is very


common in machine learning and data preprocessing.
Let’s take an example of Standard Vector Normalization i.e. transforming an array
such that it has:
• mean = 0
•standard deviation = 1
For each element x i in vector, xinormalized = (xi − μ )/σ
Where:
• μ = mean of the vector (or column)
• σ = standard deviation

# Standard Vector Normalization


arr = [Link]([[1, 2], [3, 4]])
mean = [Link](arr)
std_dev = [Link](arr)
normalized_arr = (arr - mean) / std_dev

print(normalized_arr)

# Column wise Normalization


arr = [Link]([[1, 2], [3, 4], [5, 6]])
mean = [Link](arr, axis = 0)
std_dev = [Link](arr, axis = 0)
print((arr - mean) / std_dev)

Extra references: understand Standard Deviation and Variance.

10
Useful Mathematical Functions

NumPy is massive & provides a wide range of built-in mathematical functions that
are highly optimized and can operate element-wise on arrays. Let’s have a look at
some of them:

Aggregation Functions
These are functions that take an array and reduce it to a single value (or smaller array)
by combining elements.
1. sum() - returns sum of all elements
2. prod() - returns product of all elements
3. min() - returns minimum value
4. max() - returns maximum value
5. argmin() - returns index of min value
6. argmax() - returns index of max value
7. mean() - returns mean (average)
8. median() - returns median
9. std() - returns standard deviation
10. var() - returns variance

arr = [Link]([1, 2, 3, 4, 5])

print([Link](arr)) # 15
print([Link](arr)) # 120
print([Link](arr)) # 1
print([Link](arr)) # 0
print([Link](arr)) # 5
print([Link](arr)) # 4
print([Link](arr)) # 3.0
print([Link](arr)) # 3.0
print([Link](arr)) # 1.41
print([Link](arr)) # 2.0

Power Functions
1. square() - returns square of all elements

2. sqrt() - returns square root of all elements


3. pow(a, b) - returns ab

11
arr = [Link]([1, 2, 3, 4, 5])

print([Link](arr)) # [1, 4, 9, 16, 25]


print([Link](arr)) # [1, 1.41, 1.73, 2, 2.23]
print([Link](arr, 3)) # [1, 8, 27, 64, 125]

Log & Exponential Functions


1. log() - returns natural log
2. log10() - returns log base 10
3. log2() - returns log base 2
4. exp(x) - returns ex

print([Link](arr))
print(np.log10(arr))
print(np.log2(arr))
print([Link](arr))

Rounding Functions
1. round() - rounds off to nearest value
2. ceil() - rounds up
3. floor() - rounds down
4. trunc(x) - truncates(removes) the fractional part

print([Link](2.678)) # 3.0
print([Link](2.678)) # 2.0
print([Link](2.678)) # 3.0
print([Link](2.678)) # 2.0

Miscellaneous Functions
1. abs() - returns absolute value
2. sort() - returns sorted(arranges in increasing order) values
3. unique() - returns unique values

arr = [Link]([1, 2, -5, 3, 8, -4, 2, 5])

print([Link](arr)) # [1 2 5 3 8 4 2 5]
print([Link](arr)) # [-5 -4 1 2 2 3 5 8]
print([Link](arr)) # [-5 -4 1 2 3 5 8]

There are many other built-in functions that we will use and learn about in later
chapters.

| Keep Learning & Keep Exploring!

12

You might also like