0% found this document useful (0 votes)
2 views

Unit3_ Arrays and Strings

The document provides an overview of NumPy, a Python library for numerical computing, detailing its installation, array creation, basic operations, and manipulation techniques. It also covers string handling in Python, including string types, slicing, formatting, and built-in methods for string manipulation. Additionally, it introduces regular expressions and the re module for pattern matching in strings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit3_ Arrays and Strings

The document provides an overview of NumPy, a Python library for numerical computing, detailing its installation, array creation, basic operations, and manipulation techniques. It also covers string handling in Python, including string types, slicing, formatting, and built-in methods for string manipulation. Additionally, it introduces regular expressions and the re module for pattern matching in strings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT -03 ARRAYS & STRINGS

● NumPy stands for Numerical Python, is an open-source Python library that provides
support for large, multi-dimensional arrays and matrices.It also have a collection of
high-level mathematical functions to operate on arrays.
● It was created by Travis Oliphant in 2005.

Install Python NumPy


Numpy can be installed for Mac and Linux users via the following pip command:
pip install numpy

note- Windows does not have any package manager analogous to that in Linux or Mac. Please
download the pre-built Windows installer for NumPy from here (according to your system
configuration and Python version). And then install the packages manually.

Arrays in NumPy
NumPy’s main object is the homogeneous multidimensional array.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
In NumPy, dimensions are called axes. The number of axes is rank.
NumPy’s array class is called ndarray. It is also known by the alias array.

Example:
In this example, we are creating a two-dimensional array that has the rank of 2 as it has 2 axes.
The first axis(dimension) is of length 2, i.e., the number of rows, and the second
axis(dimension)is of length 3, i.e., the number of columns.
The overall shape of the array can be represented as (2, 3)

import numpy as np

# Creating array object


arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )

# Printing type of arr object


print("Array is of type: ", type(arr))

# Printing array dimensions (axes)


print("No. of dimensions: ", arr.ndim)

# Printing shape of array


print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)

# Printing type of elements in array


print("Array stores elements of type: ", arr.dtype)

output:
Array is of type: <class 'numpy.ndarray'>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64

NumPy Basic Operations


The Plethora of built-in arithmetic functions is provided in Python NumPy.

Operations on a single NumPy array


We can use overloaded arithmetic operators to do element-wise operations on the array to
create a new array. In the case of +=, -=, *= operators, the existing array is modified.

# Python program to demonstrate


# basic operations on single array

import numpy as np
a = np.array([1, 2, 5, 3])

# add 1 to every element


print ("Adding 1 to every element:", a+1)

# subtract 3 from each element


print ("Subtracting 3 from each element:", a-3)

# multiply each element by 10


print ("Multiplying each element by 10:", a*10)

# square each element


print ("Squaring each element:", a**2)

# modify existing array


a *= 2
print ("Doubled each element of original array:", a)

# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)

output
Adding 1 to every element: [2 3 6 4]
Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]

NumPy – Unary Operators


Many unary operations are provided as a method of ndarray class. This includes sum, min, max,
etc. These functions can also be applied row-wise or column-wise by setting an axis parameter.

# Python program to demonstrate


# unary operators in numpy
import numpy as np
arr = np.array([[1, 5, 6],
[4, 7, 2],
[3, 1, 9]])

# maximum element of array


print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:",
arr.max(axis = 1))

# minimum element of array


print ("Column-wise minimum elements:",
arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:",
arr.sum())

# cumulative sum along each row


print ("Cumulative sum along each row:\n",
arr.cumsum(axis = 1))

Output:

Largest element is: 9


Row-wise maximum elements: [6 7 9]
Column-wise minimum elements: [1 1 2]
Sum of all array elements: 38
Cumulative sum along each row:
[[ 1 6 12]
[ 4 11 13]
[ 3 4 13]]

NumPy – Binary Operators


These operations apply to the array elementwise and a new array is created.
You can use all basic arithmetic operators like +, -, /, etc. In the case of +=, -=, = operators,
the existing array is modified.

# Python program to demonstrate


# binary operators in Numpy
import numpy as np

a = np.array([[1, 2],
[3, 4]])
b = np.array([[4, 3],
[2, 1]])

# add arrays
print ("Array sum:\n", a + b)

# multiply arrays (elementwise multiplication)


print ("Array multiplication:\n", a*b)

# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))
Output:
Array sum:
[[5 5]
[5 5]]
Array multiplication:
[[4 6]
[6 4]]
Matrix multiplication:
[[ 8 5]
[20 13]]

Indexing arrays

Array indexing is the same as accessing an array element. You can access an array element by
referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first
element has index 0,
and the second has index 1 etc.

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
This will print "1" as output.

Access 2-D arrays

To access elements from 2-D arrays we can use comma separated integers representing the
dimension and the index of the element. Think of 2-D arrays like a table with rows and columns,
where the dimension represents the row and the index represents the column.

import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
This will print "2" as output.

Slicing arrays

Slicing in python means taking elements from one given index to another given index.
We pass a slice instead of an index like this: [start:end].

We can also define the step, like this: [start:end:step].


If we don't pass start its considered 0
If we don't pass end its considered length of array in that dimension
If we don't pass step its considered 1

Eg: Slice elements from index 1 to index 5 from the following array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Output: 2,3,4,5,6

Eg: Slice elements from index 4 to the end of the array:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Output: 5,6,7

Array Stacking and Manipulation

stack() is a library that is used for array stacking. It is used for joining multiple NumPy arrays.
It returns a NumPy array.
To join 2 arrays, they must have the same shape and dimensions. (e.g. both (2,3)–> 2 rows,3
columns)stack() creates a new array which has 1 more dimension than the input arrays. If we
stack 2 1-D arrays, the resultant array will have 2 dimensions.

Eg:
import numpy as np
# input array
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stacking 2 1-d arrays


c = np.stack((a, b),axis=0)
print(c)
axis=0 will create a 1-D array.
array([[1, 2, 3],
[4, 5, 6]])

Eg:
np.stack((a,b),axis=1)
axis=1 will create a 2-D array.
array([[1, 4],
[2, 5],
[3, 6]])

Multidimensional arrays and matrices


Multi-dimensional arrays, also known as matrices, are a data structure in Python.
They allow you to store and manipulate data in multiple dimensions or axes.

Creating Multi-Dimensional Arrays Using NumPy


To create a multi-dimensional array using NumPy, we can use the np.array() function and pass
in a nested list of values as an argument.
The outer list represents the rows of the array, and the inner lists represent the columns.

Eg:
import numpy as np

# Create a 2-dimensional array with 3 rows and 4 columns


arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
# Print the array
print(arr)

Output: array([[ 1, 2, 3, 4],


[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

Accessing and Modifying Multi-dimensional Arrays Using NumPy

Once we have created a multi-dimensional array, we can access and modify its elements using
indexing and slicing.
We use the index notation [i, j] to access an element at row i and column j, where i and j are
zero-based indices.

Eg:
import numpy as np

# Create a 2-dimensional array with 3 rows and 4 columns


arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Access an element at row 1, column 2


print(arr[1, 2]) # Output: 7
# Modify an element at row 0, column 3
arr[0, 3] = 20

# Print the modified array


print(arr)
Output:
7
array([[ 1, 2, 3, 20],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])

Strings
● Strings are the arrays of characters surrounded by either double quotes or single quotes.
● Eg: "hello" or 'hello'; both are same
● You can display a string literal with the print() function:
print("Hello")

Assign String to a Variable

a = "Hello"
print(a)

Output: Hello

Multiline Strings

You can assign a multiline string to a variable by using three quotes:


a = """ This is the class of Python. Current lecture is based on arrays and strings.
We have covered arrays so far, now we are studying strings."""
print(a)

Slicing
● You can return a range of characters by using the slice syntax.
● Specify the start index and the end index, separated by a colon, to return a part of the
string.
● Example: Get your own Python Server
● Get the characters from position 2 to position 5 (not included):
b = "Hello, World!"
print(b[2:5])
output- llo

Slice From the Start


● By leaving out the start index, the range will start at the first character:
● Example
Get the characters from the start to position 5 (not included):
b = "Hello, World!"
print(b[:5])
output- hello

Formatting strings with ‘+’ and format()


In Python, it is possible to format strings using the ‘+’ operator and the format() function.
We use the format() function to format strings with predefined values. It allows you to insert
numerical values ​with two decimal places, numerical values ​with an exponent, strings, among
others.
In this example, we use the format() function to insert the value of the variable “number” into the
string.
We use {} to allow insertion of the variable in the string. Look:
number = 123.456
name = "John Doe"
date = "2023-04-27"
print("The value is: {}".format(number))

Output:
The value is: 123.456

String manipulation with built-in methods


There are several built-in methods you can use to manipulate strings. These methods help
accomplish common tasks like adding, removing, or changing characters within a string. Some
of the more common methods include:

upper() and lower()


These methods convert the string to uppercase or lowercase, respectively.
string = "Olá, mundo!"
string_uppercase = string.upper()
string_lowercase = string.lower()
print(string_uppercase) # Output: OLA, MUNDO!
print(string_lowercase) # Output: ola, mundo!

strip()
This method removes any whitespace on either side of the string.
string = " Hello World! "
stringa_no_space = string.strip()
print(string_no_space)

The output will be:


Hello World!
replace()
This method replaces all occurrences of a specific substring with another one.
E.g.:
string = "Hello World!"
string_love = string.replace("hello", "love")
print(string_love)

The output will be:


love world!

split()
This method splits a string into a list of substrings, separated by a specific substring.

string = "Hello, World!"


string_split = string.split(",")

print(string_split)
The output will be:

[‘Hello World!’]

join()
This method joins a list of strings into a single string, separated by a specific substring.

list_of_words = ["hello", "world!"]


string_join = " ".join(list_of_words)

print(string_join)
The output will be:
hello world!

These are just a few examples of the built-in methods available for manipulating strings in
Python.
There are many other methods and attributes available that we can use to perform a variety of
tasks, such as formatting numbers, removing special characters, and much more.
With knowledge of these methods, you can manipulate strings more effectively and create more
powerful and flexible Python programs. Furthermore, these methods are useful not only for
string manipulation, but also for other general text processing tasks.

String types in Python (str, unicode, bytes)


There are three main types of strings: native strings (str) , Unicode strings (unicode) and byte
strings . Below is an explanation of each:
Native String (str): This is the default string in Python. It is composed of ASCII character and is
suitable for representing plain text. The native string is represented as a sequence of
characters.
String Unicode (unicode): is a string that can contain characters from any language in the world.
It is represented as a string of Unicode codes. In Python 2, strings were standard Unicode, but
in Python 3, Unicode strings were replaced by byte strings.
String bytes: is a string that contains binary data, such as an image or a file.
A string in Python can be represented as a sequence of bytes. In Python 2, byte strings were
the default string form, but in Python 3, byte strings were removed and the native string is now
the standard string form.
In general, we use Unicode strings to represent text containing characters from different
languages, while byte strings are used to display binary data. Choosing the appropriate string
depends on the type of data you are working with and what you want to do with that data.

Regular Expressions
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.

RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

import re
RegEx in Python
When you have imported the re module, you can start using regular expressions:

Example
Search the string to see if it starts with "The" and ends with "Spain":

import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
output- YES! We have a match!

RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:

Function Description
findall Returns a list containing all matches
search Returns a Match object if there is a match anywhere in the string
split Returns a list where the string has been split at each match
sub Replaces one or many matches with a string

The findall() Function


The findall() function returns a list containing all matches.

Example
import re
#Return a list containing every occurrence of "ai":

txt = "The rain in Spain"


x = re.findall("ai", txt)
print(x)

output
['ai', 'ai']

The list contains the matches in the order they are found.
If no matches are found, an empty list is returned:

The search() Function


The search() function searches the string for a match, and returns a Match object if there is a
match.
If there is more than one match, only the first occurrence of the match will be returned:

Example
Search for the first white-space character in the string:

import re
txt = "The rain in Spain"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())
output- The first white-space character is located in position: 3

Pattern Matching with Regular Expressions


A Regex object’s search() method searches the string it is passed for any matches to the regex.
Match objects have a group() method that will return the actual matched text from the searched
string.
Example: Import the regex module with import re. Create a Regex object with the re.compile()
function. (Remember to use a raw string.) Pass the string you want to search into the Regex
object’s search() method. This returns a Match object. Call the Match object’s group() method to
return a string of the actual matched text.

# Python program to illustrate


# Matching regex objects
import re

# regex object
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())

Output:
Phone number found: 415-555-4242
Python – Substituting patterns in text using regex

Regular Expression (regex) is meant for pulling out the required information from any text
which is based on patterns. They are also widely used for manipulating the pattern-based texts
which leads to text preprocessing and are very helpful in implementing digital skills like
Natural Language Processing(NLP).

The re.sub() method performs global search and global replace on the given string. It is used for
substituting a specific pattern in the string. There are in total 5 arguments of this function.

Syntax: re.sub(pattern, repl, string, count=0, flags=0)

Parameters:

pattern – the pattern which is to be searched and substituted

repl – the string with which the pattern is to be replaced

string – the name of the variable in which the pattern is stored

count – number of characters up to which substitution will be performed

flags – it is used to modify the meaning of the regex pattern

count and flags are optional arguments.


Example 1: Substitution of a specific text pattern
In this example, a given text pattern will be searched and substituted in a string.
The idea is to use the very normal form of the re.sub() method with only the first 3 arguments.

Below is the implementation:

# Python implementation of substituting a


# specific text pattern in a string using regex

# importing regex module


import re

# Function to perform operations on the strings


def substitutor():

# a string variable
sentence1 = "It is raining outside."

# replacing text 'raining' in the string


# variable sentence1 with 'sunny' thus
# passing first parameter as raining
# second as sunny, third as the
# variable name in which string is stored
# and printing the modified string
print(re.sub(r"raining", "sunny", sentence1))

# a string variable
sentence2 = "Thank you very very much."

# replacing text 'very' in the string


# variable sentence2 with 'so' thus
# passing parameters at their
# appropriate positions and printing
# the modified string
print(re.sub(r"very", "so", sentence2))

# Driver Code:
substitutor()
Output:
It is sunny outside.
Thank you so so much.
No matter how many time the required pattern is present in the string, the re.sub() function
replaces all of them with the given pattern. That’s why both the ‘very’ are replaced by ‘so’ in the
above example.

Python Date and Time


Python Datetime module supplies classes to work with date and time. These classes provide
several functions to deal with dates, times, and time intervals. Date and DateTime are an object
in Python, so when you manipulate them, you are manipulating objects and not strings or
timestamps.
The DateTime module is categorized into 6 main classes –
● date – An idealized naive date, assuming the current Gregorian calendar always was,
and always will be, in effect. Its attributes are year, month, and day. you can refer to –
Python DateTime – Date Class
● time – An idealized time, independent of any particular day, assuming that every day has
exactly 24*60*60 seconds. Its attributes are hour, minute, second, microsecond, and
tzinfo. You can refer to – Python DateTime – Time Class
● date-time – It is a combination of date and time along with the attributes year, month,
day, hour, minute, second, microsecond, and tzinfo. You can refer to – Python DateTime
– DateTime Class
● timedelta – A duration expressing the difference between two date, time, or datetime
instances to microsecond resolution. You can refer to – Python DateTime – Timedelta
Class
● tzinfo – It provides time zone information objects. You can refer to – Python –
datetime.tzinfo()
● timezone – A class that implements the tzinfo abstract base class as a fixed offset from
the UTC (New in version 3.2). You can refer to – Handling timezone in Python

Python Date class Syntax

class datetime.date(year, month, day)


Example:
from datetime import date
my_date = date(1996, 12, 11)
print("Date passed as argument is", my_date)

Output:
Date passed as argument is 1996-12-11

Introduction to pandas and series data frames


Pandas Series:

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Create a simple Pandas Series from a list:


import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

Output:
0 1
1 7
2 2
dtype: int64

Labels

If nothing else is specified, the values are labeled with their index number. First value has
index 0, second value has index 1 etc.

This label can be used to access a specified value.

Example
Return the first value of the Series:
print(myvar[0])

Output:
1

Create Labels

With the index argument, you can name your own labels.

Example
Create your own labels:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

Output:
x 1
y 7
z 2
dtype: int64

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
Output:
day1 420
day2 380
day3 390
dtype: int64

DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Example
Create a DataFrame from two Series:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
myvar = pd.DataFrame(data)
print(myvar)
Output:
calories duration
0 420 50
1 380 40
2 390 45
Pandas DataFrames

What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table


with rows and columns.

Example:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Output:
calories duration
0 420 50
1 380 40
2 390 45

Locate Row

As you can see from the result above, the DataFrame is like a table with rows and columns.

Pandas use the loc attribute to return one or more specified row(s)

Example
Return row 0:
#refer to the row index:
print(df.loc[0])
output:
calories 420
duration 50
Name: 0, dtype: int64

Named Indexes

With the index argument, you can name your own indexes.

Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
Output:
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Output:
calories 380
duration 40
Name: day2, dtype: int64

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]


************************************************************************************************************************

calories 420
duration 50

You might also like