0% found this document useful (0 votes)
0 views

Python_for_AIML1

The document outlines a lab on Python for AIML, covering topics such as the AIML environment, data types and functions in NumPy and Pandas, and reading CSV files. It includes instructions for completing lab exercises, explanations of iPython Notebooks, and details on data manipulation with NumPy. Additionally, it provides examples of loading and processing data from CSV files using NumPy functions.

Uploaded by

nagulxlugan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Python_for_AIML1

The document outlines a lab on Python for AIML, covering topics such as the AIML environment, data types and functions in NumPy and Pandas, and reading CSV files. It includes instructions for completing lab exercises, explanations of iPython Notebooks, and details on data manipulation with NumPy. Additionally, it provides examples of loading and processing data from CSV files using NumPy functions.

Uploaded by

nagulxlugan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1 Python for AIML 1

In this lab following topic will be covered:


1. Introduction to AIML environment with iPython Notebooks
2. Datatypes and functions in numpy
3. Reading CSV file using numpy
4. Datatypes and functions in pandas
5. Reading CSV file using pandas
Instruction to compplete lab exercises:
1. Open python notebook file under Lab folder
2. Read the problem statement in the exercise and expected output
3. Uncomment and remove the lines and fill in wiht your answer
4. Run your code to produce expected output.
Noted: Data files are stored in dataset folder

1.1 About iPython Notebooks


iPython Notebooks are interactive coding environments embedded in a webpage. You will be
using iPython notebooks in this class. After writing your code, you can run the cell by either
pressing “SHIFT”+“ENTER” or by clicking on “Run Cell” (denoted by a play symbol) in the upper
bar of the notebook.
Exercise: Set test to "Hello World" in the cell below to print “Hello World” and run the two cells
below.
[1]: test = "Hello world"

[2]: print ("test: " + test)

test: Hello world


Expected output: test: Hello World
What you need to remember: - Run your cells using SHIFT+ENTER (or “Run cell”)

1.2 Datatype in Python (Dictionary)


Basic sequence datatypes such as list/array, tuple store the elements which can be of any Python
datatype, including other lists and tuples. For dictionary datatype holds data in the form of
key/value pair structure. The contents of a dict can be written as a series of key:value pairs within
braces { }, e.g. dict = {key1:value1, key2:value2, . . . }. The “empty dict” is just an empty pair of
curly braces {}.
Strings, numbers, and tuples work as keys, and any type can be a value. Looking up a value which
is not in the dict throws a KeyError – use “in” to check if the key is in the dict, or use dict.get(key)
which returns the value or None if the key is not present (or get(key, not-found) allows you to
specify what value to return in the not-found case).
Example:

1
• dict = {‘a’: ‘alpha’, ‘o’: ‘omega’, ‘g’: ‘gamma’}
• print(dict)

[3]: ## dict[key] = value-for-that-key

dict = {'a': 'alpha', 'o': 'omega', 'g': 'gamma'}

print(dict)
print(dict['a'])

{'a': 'alpha', 'o': 'omega', 'g': 'gamma'}


alpha
A dictionary iterates over its keys by default. The keys will appear in an arbitrary order. The
methods dict.keys() and dict.values() return lists of the keys or values explicitly. There’s also an
items() which returns a list of (key, value) tuples, which is the most efficient way to examine all
the key value data in the dictionary. All of these lists can be passed to the sorted() function.

[4]: print(dict.values())

# accessing each key/value


for key in sorted(dict.keys()):
print(key, dict[key])

dict_values(['alpha', 'omega', 'gamma'])


a alpha
g gamma
o omega

1.3 Datatypes and functions in NumPy


NumPy: NumPy is the most basic and powerful package for scientific computing and data ma-
nipulation in Python. Matrix is created using two dimention array. It is based on one main object:
ndarray (which stands for N-dimensional array). Data type is specified by another NumPy object
called dtype (data-type) and each nd-array is associated with only one type of dtype. In NumPy
dimensions are called axes.n For example two dimensional array has 2 axes. The axis 0 is repre-
sented as row and the axis 1 as the column. Number of the dimensions and items in an array is
defined by its shape. It can be displayed by calling properties of the nd-array eg. ndarray.ndim,
ndarray.shape, ndarray.size

[2]: # to use numpy we need to import the library


import numpy as np

# single axis - 1,3


A = np.array([[1, 2, 3]])

# two dimensions with two axes - 3,3


B = np.array([[0, 1 ,2],
[3, 4, 5],

2
[6, 7, 8]])

# three dimensions with three axes - 2,2,3


A3 = np.array([
[[256 , 25, 155], [211 , 12, 210]],
[[0 , 0, 12], [145 , 12, 100]],
])

print("a and b information")


print(A.ndim, B.ndim, A3.ndim) #ndim (number of dimension)
print(A.shape, B.shape, A3.shape) #shape (size of each dimension)
print(A.size, B.size, A3.shape) #size (the total size of the array)
print(A)
print(B)
print(A3)

a and b information
2 2 3
(1, 3) (3, 3) (2, 2, 3)
3 9 (2, 2, 3)
[[1 2 3]]
[[0 1 2]
[3 4 5]
[6 7 8]]
[[[256 25 155]
[211 12 210]]

[[ 0 0 12]
[145 12 100]]]

1.3.1 Slicing of n-d array


Slicing the array will be extracted the data based on column and row indexs and produce a new
array. Numpy array slices return views rather than copies of the array data as in standard Python
list operation. This default behavior is usefufl as it means that when we deal with hug datsets, we
can access and process these datasets without the need to keep creating copy in the data buffer.

1.3.2 Inserting additional row or column


You can insert the additional row or column using insert function. It will insert values along the
given axis (0 for row and 1 for column) before the given indices. If axis is None then arr is flattened
first.
[3]: #Lets work on multi-dimension slicing
#initialize MA for use
#np.random.seed(0) #seed for reproducibility
#generate ndarray of 3 rows and 4 columns
MA = np.array([[1, 2, 3]])

3
#MA=np.random.randint(12,size=(3,4))
print(MA)
DD = MA[:2,:3]

print(DD) # first two rows and three columns


print(MA[:,1]) #slice column index 1
print(MA[:3,::2]) #all rows, every other column
print(MA[::-1,::-1]) #reverse all in once

#insert the one row and column data

rMA = np.insert(MA, 1, values = [1], axis = 0)


cMA = np.insert(MA, 3, values = [1], axis = 1)

print(rMA)
print(cMA)

[[1 2 3]]
[[1 2 3]]
[2]
[[1 3]]
[[3 2 1]]
[[1 2 3]
[1 1 1]]
[[1 2 3 1]]

1.4 Loading the Data using NumPy


To start a ML project, you need to load the data such as text, images, audio from various sources
such as file, database, online server. In this subject, text and images files will be used as main
data sources. Texual data to develop machine language model are usually stored as CSV (comma-
separated values).
CSV is a simple file format stores tabular data (number and text) in plain text. In Python, CSV files
can be opened and loaded using common libraries packages suchas as CSV, Numpy and Pandas
libaray.

1.4.1 Load CSV with NumPy


One approach to load CSV data file is using NumPy libraries and numpy.genfromtxt() function.
The following is an example of loading CSV data file with the help of it.
• https://numpy.org/doc/stable/reference/arrays.dtypes.html
• https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html

4
Example: Loading the dataset above, numbers.csv file using NumPy.genfromtxt()

[7]: #1) Load the number data from numbers.csv


#2) Explore the datatype and dimension (shape)

import numpy as np

path = "./dataset/numbers.csv"
data = np.genfromtxt(path, dtype=None, names=None, delimiter=",", encoding=None)

print("shape of data : " , data.shape)


print("datatype of data : " , data.dtype)
print("sample of 3 rows of data : ", data[:3])

shape of data : (5, 5)


datatype of data : int32
sample of 3 rows of data : [[ 1 2 3 4 5]
[ 6 7 8 9 10]
[11 12 13 14 15]]
Datatype can be define for a specific columns while reading csv file using genfromtxt function. In
the following example, iris_w_header.csv files has four columns of integer (whole number) data
and string in last column. Array-protocol type strings is used to define the data type of the column.
The first character specifies the kind of data and the remaining characters specify the number of
bytes per item, except for Unicode, where it is interpreted as the number of characters. eg. i4 for
32 bits signed integer (whole number) or f8 for 64-bit floating-point number.
You can declare datatypes of column as string array as follow and use as dtype parameter while
reading the csv file in read_csv function.
types = [‘f8’, ‘f8’, ‘f8’, ‘f8’, ‘U50’]

5
Example: Loading the dataset above, Iris.csv file using NumPy.genfromtxt()

[8]: #3) Load the data from iris_w_header.csv


#4) Explore the datatype and dimension (shape)

import pandas as pd

path = "./dataset/iris_w_header.csv"
types = ['f8', 'f8', 'f8', 'f8', 'U50']
data = np.genfromtxt(path, dtype=None, names=True, delimiter=",", encoding=None)

print("shape of data : " , data.shape)


print("datatype of data : " , data.dtype)
print("names of data : ", data.dtype.names)
print("sample of 3 rows of data : ", data[:3])

shape of data : (150,)


datatype of data : [('sepal_length', '<f8'), ('sepal_width', '<f8'),
('petal_length', '<f8'), ('petal_width', '<f8'), ('class', '<U15')]
names of data : ('sepal_length', 'sepal_width', 'petal_length', 'petal_width',
'class')
sample of 3 rows of data : [(5.1, 3.5, 1.4, 0.2, 'Iris-setosa') (4.9, 3. , 1.4,
0.2, 'Iris-setosa')
(4.7, 3.2, 1.3, 0.2, 'Iris-setosa')]

1.4.2 Loading specific columns of source file


To retrieve the data from the specific columns of file using genfromtxt() function, you can pass the
usecols parameters with the list of column indexes. eg. usecols = (0,1,2,3)

[9]: data = np.genfromtxt(path, dtype=None, skip_header=1, delimiter=",", usecols =


,→(0,1,2,3), encoding=None)

print("shape of data : " , data.shape)


print("datatype of data : " , data.dtype)
print("names of data : ", data.dtype.names)

6
print("sample of 3 rows of data : ", data[:3])

shape of data : (150, 4)


datatype of data : float64
names of data : None
sample of 3 rows of data : [[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]]

1.5 Exercise 1:
Load the data from numbers_ex.csv file using NumPy libraries and numpy.genfromtxt() function.
1. Display the shape of data, type of data.
Expected output: - shape of data : (4, 4) - datatype of data : float64
2. Display area (column 2) and price (column 4) data by slicing the data.
Expected output: - [ 70 60 50 120] - [ 910 1000 890 800]
3. Calculate the cost (cost = price / area) and instert/append cost in existing data
Expected output: - [[1.00000000e+00 7.00000000e+01 8.00000000e+00 9.10500000e+02
1.30071429e+01] - [2.00000000e+00 6.00000000e+01 1.30000000e+01 1.00025000e+03
1.66708333e+01] - [3.00000000e+00 5.00000000e+01 1.80000000e+01 8.90500000e+02
1.78100000e+01] - [4.00000000e+00 1.20000000e+02 2.30000000e+01 8.00000000e+02
6.66666667e+00]]
[2]: import numpy as np

#1) Load the number data from numbers_ex.csv

#_______________________________________________________________________________________________

#_______________________________________________________________________________________________

#2) Explore the datatype and dimension (shape)

#_______________________________________________________________________________________________

#_______________________________________________________________________________________________

print()

# 3) Display area (column 2) and price (column 4) data by slicing the data.

7
#_______________________________________________________________________________________________

#_______________________________________________________________________________________________

#4) Insert the cost data (cost = price / area)

# cost = _________________________________________________

print()

#4) insert/append cost to the data by column wise insert

#np.insert(data, 4, values=cost, axis=1)


#print(data)

shape of data : (4, 4)


datatype of data : float64
names of data : None
sample of 3 rows of data : [[1.00000e+00 7.00000e+01 8.00000e+00 9.10500e+02]
[2.00000e+00 6.00000e+01 1.30000e+01 1.00025e+03]
[3.00000e+00 5.00000e+01 1.80000e+01 8.90500e+02]]

[ 70. 60. 50. 120.]


[ 910.5 1000.25 890.5 800. ]

[[1.00000e+00 7.00000e+01 8.00000e+00 9.10500e+02]


[2.00000e+00 6.00000e+01 1.30000e+01 1.00025e+03]
[3.00000e+00 5.00000e+01 1.80000e+01 8.90500e+02]
[4.00000e+00 1.20000e+02 2.30000e+01 8.00000e+02]]

1.6 Datatypes and functions in Panda


Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data
structures and data analysis tools for the Python programming language.
https://pandas.pydata.org/
Two important datatypes such as Series and DataFrame exist in Panda.

1.6.1 Pandas series


Pandas series wrap both a sequenceof values and a sequence of indices, which can be access with
the value and index attributes in one dimension.

8
1.6.2 Pandas DataFrame
DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible
column names. Just as you might think of a two-dimensional array as an ordered sequence of
aligned one-dimensional columns, you can think of a DataFrame as a sequence of aligned Series
objects. Here, by “aligned” we mean that they share the same index.

[3]: # 1) Importing the pandas package


# 2) Create series data using pandas

import numpy as np
import pandas as pd

data=pd.Series([0.25,0.5,0.75,1.0])
print(data)
print(data.values) #access the values of a pandas series
print(data.index) #acess the index of a pandas series
print(data[2]) #access individual value
print(data[1:3]) #access subset of a series

0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64
[0.25 0.5 0.75 1. ]
RangeIndex(start=0, stop=4, step=1)
0.75
1 0.50
2 0.75
dtype: float64

1.6.3 Panda Dataframes


DataFrame can contain data that is:
Pandas DataFrame
Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels
or index. An example of a Series object is one column from a DataFrame.
NumPy ndarray, which can be a record or structured
two-dimensional ndarray
dictionaries of one-dimensional ndarray’s, lists, dictionaries or Series.

[3]: # 1) Creating panda dataframe using two-d nparray

my_2darray = np.array([[1, 2, 3], [4, 5, 6]])

9
print(pd.DataFrame(my_2darray))

0 1 2
0 1 2 3
1 4 5 6

[4]: # 2) Creating panda dataframe using two-d nparray with column

my_2darray1 = pd.DataFrame(data=my_2darray, index=range(0,2), columns=['A', 'B',


,→'C'])

print(my_2darray1)

A B C
0 1 2 3
1 4 5 6

1.6.4 Traversing and accessing data in DataFrame


column-wise data Columns in DataFrame can be access using the index value of the columns
property or using by label (name) of the column. It can be appended the column data into the
existing DataFrame
row-wise data using slicing and iloc Row in DataFrame can be access using slicing method and
iloc function df[1:3] data in row 1 - 2 df.iloc[:] data in all rows
Iterate Over a Pandas DataFrame DataFrame like other data structure has iterrows method to
access the cell value in DataFrame. You can iterate over the rows of your DataFrame with the help
of a for loop in combination with an iterrows() call on your DataFrame:

[4]: #Declare n-d array of 3x3


npdata = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

df = pd.DataFrame(data=npdata, columns=['A', 'B', 'C'])

print(df['A'])
print(df.columns[1])

df['D'] = [14, 15, 16]


df['E'] = df['A'] * df['D']

print(df)

##Accessing the entire row data


#print(df[1:2])
#print(df.iloc[:])
print(df.iloc[[1,2]])

##Accessing the specific cell data


print(df.iloc[0,4])

10
print(df.at[2,'E'])

0 1
1 4
2 7
Name: A, dtype: int32
B
A B C D E
0 1 2 3 14 14
1 4 5 6 15 60
2 7 8 9 16 112
A B C D E
1 4 5 6 15 60
2 7 8 9 16 112
14
112

1.6.5 Droppping unwanted columns


To remove the unwanted columns from dataframe, use drop function. The drop() function can
pass inplace parameter as True and the axis parameter as 1. This tells Pandas to make direct the
changes/drop the specified column(s) in the data object.
Example:
• columns_to_drop = [“C”]
• df.drop(columns = columns_to_drop, inplace = True, axis = 1)

[7]: columns_to_drop = ["C"]


df.drop(columns = columns_to_drop, inplace = True, axis = 1)

print(df)

A B D E
0 1 2 14 14
1 4 5 15 60
2 7 8 16 112

1.7 Exercise 2:
Create a following data table using panda data frame.
Expected Output:

Index Red Green Blue


0 123 112 0
1 152 115 80
2 132 168 0

11
[15]: #1) Declare n-d array of 3x3 with data

#_______________________________________________________________________________________________

#2) Generate panda dataframe callled color_data and display

#_______________________________________________________________________________________________

Red Green Blue


0 123 112 0
1 152 115 80
2 132 168 0

1.7.1 Load CSV with Pandas


Another approach to load CSV data file is by Pandas and pandas.read_csv()function that re-
turns a pandas.DataFrame. Pandas’s read_csv function can read csv files which contains different
datatypes. Statisics of each column can be displayed using describe() function of the dataframe.
eg.
• path = “./dataset/iris_w_header.csv”
• df = pd.read_csv(path)

[16]: # 1) Load data from iris_w_header.csv file with Pandas.read_csv()

import pandas as pd
import matplotlib.pyplot as plt

path = "./dataset/iris_w_header.csv"
df = pd.read_csv(path)
print("shape of data : ", df.shape)
print("\n datatype of data : " , df.dtypes)
print("\n sample of 3 rows of data : ", df[:3])
print("\n sample of 3 rows of data using head(): ", df.head(3)) # or you can
,→using the head() function

print("\n", df.describe(), "\n") # statistic information of data usually used


,→for Data Science Application

shape of data : (150, 5)

datatype of data : sepal_length float64


sepal_width float64
petal_length float64
petal_width float64
class object
dtype: object

sample of 3 rows of data : sepal_length sepal_width petal_length

12
petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa

sample of 3 rows of data using head(): sepal_length sepal_width


petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa

sepal_length sepal_width petal_length petal_width


count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

1.8 Exercise 3:
Load the data from Income3.csv file using Pandas’s read_csv function Pandas’s read_csv function
can read csv files which contains different datatypes. Statisics of each column can be displayed
using describe() function of the dataframe
1. Load and display top 3 rows of the data.
Expected output:

Observation Years of Higher Education (x) Income (y)


0 1 6 89617
1 2 0 39826
2 3 6 79894
shape of data : (20, 3)

2. Display statical observation of each column using describe function.


Expected output:

Observation Years of Higher Education (x) Income (y)


count 20.00000 20.000000 20.000000
mean 10.50000 3.650000 65344.000000
count 20.00000 20.000000 20.000000
mean 10.50000 3.650000 65344.000000
std 5.91608 2.230766 17568.409127

13
Observation Years of Higher Education (x) Income (y)
min 1.00000 0.000000 31007.000000
25% 5.75000 2.000000 53608.000000
50% 10.50000 4.000000 68876.500000
75% 15.25000 6.000000 79491.250000
max 20.00000 6.000000 89617.000000

3. Append/add a new column named called “Predicted” and load with the default value 0.
Expected output:

Observation Years of Higher Education (x) Income (y) Predicted


0 1 6 89617 0
1 2 0 39826 0
2 3 6 79894 0
(20, 4)

[17]: import pandas as pd


import matplotlib.pyplot as plt

# 1) Load data from iris_w_header.csv file with Pandas.read_csv()

#_______________________________________________________________________________________________

#_______________________________________________________________________________________________

#_______________________________________________________________________________________________

# 2) Display statical observation of each column using describe function.

#_______________________________________________________________________________________________

# 3) Append/add a new column named called "Predicted" and load with the default
,→value 0.

#_______________________________________________________________________________________________

#4) Display the data

14
#_______________________________________________________________________________________________

Observation Years of Higher Education (x) Income (y)


0 1 6 89617
1 2 0 39826
2 3 6 79894

Observation Years of Higher Education (x) Income (y)


count 20.00000 20.000000 20.000000
mean 10.50000 3.650000 65344.000000
std 5.91608 2.230766 17568.409127
min 1.00000 0.000000 31007.000000
25% 5.75000 2.000000 53608.000000
50% 10.50000 4.000000 68876.500000
75% 15.25000 6.000000 79491.250000
max 20.00000 6.000000 89617.000000

Observation Years of Higher Education (x) Income (y) Predicted


0 1 6 89617 0
1 2 0 39826 0
2 3 6 79894 0
(20, 4)

15

You might also like