Unit3_ Arrays and Strings
Unit3_ Arrays and Strings
● NumPy stands for Numerical Python, is an open-source Python library that provides
support for large, multi-dimensional arrays and matrices.It also have a collection of
high-level mathematical functions to operate on arrays.
● It was created by Travis Oliphant in 2005.
note- Windows does not have any package manager analogous to that in Linux or Mac. Please
download the pre-built Windows installer for NumPy from here (according to your system
configuration and Python version). And then install the packages manually.
Arrays in NumPy
NumPy’s main object is the homogeneous multidimensional array.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
In NumPy, dimensions are called axes. The number of axes is rank.
NumPy’s array class is called ndarray. It is also known by the alias array.
Example:
In this example, we are creating a two-dimensional array that has the rank of 2 as it has 2 axes.
The first axis(dimension) is of length 2, i.e., the number of rows, and the second
axis(dimension)is of length 3, i.e., the number of columns.
The overall shape of the array can be represented as (2, 3)
import numpy as np
output:
Array is of type: <class 'numpy.ndarray'>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
import numpy as np
a = np.array([1, 2, 5, 3])
# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
print ("\nOriginal array:\n", a)
print ("Transpose of array:\n", a.T)
output
Adding 1 to every element: [2 3 6 4]
Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
Output:
a = np.array([[1, 2],
[3, 4]])
b = np.array([[4, 3],
[2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))
Output:
Array sum:
[[5 5]
[5 5]]
Array multiplication:
[[4 6]
[6 4]]
Matrix multiplication:
[[ 8 5]
[20 13]]
Indexing arrays
Array indexing is the same as accessing an array element. You can access an array element by
referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first
element has index 0,
and the second has index 1 etc.
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
This will print "1" as output.
To access elements from 2-D arrays we can use comma separated integers representing the
dimension and the index of the element. Think of 2-D arrays like a table with rows and columns,
where the dimension represents the row and the index represents the column.
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
This will print "2" as output.
Slicing arrays
Slicing in python means taking elements from one given index to another given index.
We pass a slice instead of an index like this: [start:end].
Eg: Slice elements from index 1 to index 5 from the following array:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Output: 2,3,4,5,6
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[4:])
Output: 5,6,7
stack() is a library that is used for array stacking. It is used for joining multiple NumPy arrays.
It returns a NumPy array.
To join 2 arrays, they must have the same shape and dimensions. (e.g. both (2,3)–> 2 rows,3
columns)stack() creates a new array which has 1 more dimension than the input arrays. If we
stack 2 1-D arrays, the resultant array will have 2 dimensions.
Eg:
import numpy as np
# input array
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
Eg:
np.stack((a,b),axis=1)
axis=1 will create a 2-D array.
array([[1, 4],
[2, 5],
[3, 6]])
Eg:
import numpy as np
Once we have created a multi-dimensional array, we can access and modify its elements using
indexing and slicing.
We use the index notation [i, j] to access an element at row i and column j, where i and j are
zero-based indices.
Eg:
import numpy as np
Strings
● Strings are the arrays of characters surrounded by either double quotes or single quotes.
● Eg: "hello" or 'hello'; both are same
● You can display a string literal with the print() function:
print("Hello")
a = "Hello"
print(a)
Output: Hello
Multiline Strings
Slicing
● You can return a range of characters by using the slice syntax.
● Specify the start index and the end index, separated by a colon, to return a part of the
string.
● Example: Get your own Python Server
● Get the characters from position 2 to position 5 (not included):
b = "Hello, World!"
print(b[2:5])
output- llo
Output:
The value is: 123.456
strip()
This method removes any whitespace on either side of the string.
string = " Hello World! "
stringa_no_space = string.strip()
print(string_no_space)
split()
This method splits a string into a list of substrings, separated by a specific substring.
print(string_split)
The output will be:
[‘Hello World!’]
join()
This method joins a list of strings into a single string, separated by a specific substring.
print(string_join)
The output will be:
hello world!
These are just a few examples of the built-in methods available for manipulating strings in
Python.
There are many other methods and attributes available that we can use to perform a variety of
tasks, such as formatting numbers, removing special characters, and much more.
With knowledge of these methods, you can manipulate strings more effectively and create more
powerful and flexible Python programs. Furthermore, these methods are useful not only for
string manipulation, but also for other general text processing tasks.
Regular Expressions
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.
import re
RegEx in Python
When you have imported the re module, you can start using regular expressions:
Example
Search the string to see if it starts with "The" and ends with "Spain":
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
output- YES! We have a match!
RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:
Function Description
findall Returns a list containing all matches
search Returns a Match object if there is a match anywhere in the string
split Returns a list where the string has been split at each match
sub Replaces one or many matches with a string
Example
import re
#Return a list containing every occurrence of "ai":
output
['ai', 'ai']
The list contains the matches in the order they are found.
If no matches are found, an empty list is returned:
Example
Search for the first white-space character in the string:
import re
txt = "The rain in Spain"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())
output- The first white-space character is located in position: 3
# regex object
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
mo = phoneNumRegex.search('My number is 415-555-4242.')
print('Phone number found: ' + mo.group())
Output:
Phone number found: 415-555-4242
Python – Substituting patterns in text using regex
Regular Expression (regex) is meant for pulling out the required information from any text
which is based on patterns. They are also widely used for manipulating the pattern-based texts
which leads to text preprocessing and are very helpful in implementing digital skills like
Natural Language Processing(NLP).
The re.sub() method performs global search and global replace on the given string. It is used for
substituting a specific pattern in the string. There are in total 5 arguments of this function.
Parameters:
# a string variable
sentence1 = "It is raining outside."
# a string variable
sentence2 = "Thank you very very much."
# Driver Code:
substitutor()
Output:
It is sunny outside.
Thank you so so much.
No matter how many time the required pattern is present in the string, the re.sub() function
replaces all of them with the given pattern. That’s why both the ‘very’ are replaced by ‘so’ in the
above example.
Output:
Date passed as argument is 1996-12-11
Output:
0 1
1 7
2 2
dtype: int64
Labels
If nothing else is specified, the values are labeled with their index number. First value has
index 0, second value has index 1 etc.
Example
Return the first value of the Series:
print(myvar[0])
Output:
1
Create Labels
With the index argument, you can name your own labels.
Example
Create your own labels:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
Output:
x 1
y 7
z 2
dtype: int64
You can also use a key/value object, like a dictionary, when creating a Series.
Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
Output:
day1 420
day2 380
day3 390
dtype: int64
DataFrames
Example
Create a DataFrame from two Series:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
myvar = pd.DataFrame(data)
print(myvar)
Output:
calories duration
0 420 50
1 380 40
2 390 45
Pandas DataFrames
What is a DataFrame?
Example:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Output:
calories duration
0 420 50
1 380 40
2 390 45
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
Example
Return row 0:
#refer to the row index:
print(df.loc[0])
output:
calories 420
duration 50
Name: 0, dtype: int64
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
Output:
calories duration
day1 420 50
day2 380 40
day3 390 45
Use the named index in the loc attribute to return the specified row(s).
Example
Return "day2":
#refer to the named index:
print(df.loc["day2"])
Output:
calories 380
duration 40
Name: day2, dtype: int64
If your data sets are stored in a file, Pandas can load them into a DataFrame.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:
calories 420
duration 50