Unit 1 Pandas - Series and DataFrame
Unit 1 Pandas - Series and DataFrame
Basic feature of series are homogeneous data, Size Immutable But Basic feature of DataFrame are heterogeneous data, Size Mutable And
Values of Data Mutable. also Data Mutable.
Pandas, Series Creation using various python concept
Pandas Series, is like a one dimensional array like structure with homogeneous data. We cannot change its size after creation but we can change
its values using index. It also support forward/backward index accessing and slicing operation which are already used in list, array or in string
concept. We can store only a single row, set of values of same data types. We cannot store different kind of values in one pandas series variable.
Syntax- pandas.Series(data, index, dtype, copy)
Using List Using Scalar value Using Numpy 1-d array Using Dictionary
[ ] (means single constant value will repeat) np.array( [ ] ) {key : value}
Here a single list converts into a Here, we must use index=[0,1,2,3,4] Here first, list convert into 1-d array and Here dictionary convert into a pandas series.
pandas series. statement inside Series function. then 1-d array convert into a pandas series.
import pandas as pd import pandas as pd import pandas as pd import pandas as pd
import numpy as np
list =[11,12,13,14,15] data=15 arr=np.array( [11,12,13,14,15] ) dict = {0 : 11, 1 : 12, 2 : 13, 3 : 14, 4 : 15}
s =pd.Series(list) s =pd.Series(data, index=[0, 1, 2, 3, 4]) s =pd.Series(arr) s =pd.Series(dict)
print(s) print(s) print(s) print(s)
output- output- output- output-
0 11 0 15 0 11 0 11
1 12 1 15 1 12 1 12
2 13 2 15 2 13 2 13
3 14 3 15 3 14 3 14
4 15 4 15 4 15 4 15
Same Above code we can use without Same Above code we can use without data variable Same Above code we can use without arr Same Above code we can use without dict
list variable directly as follows. but on a string constant directly as follows. variable directly as follows. variable directly as follows.
s =pd.Series([11,12,13,14,15]) s =pd.Series(‘ram’, index=[0, 1, 2, 3, 4]) s =pd.Series( np.array( [11,12,13,14,15] ) ) s =pd.Series({0 : 11, 1 : 12, 2 : 13, 3 : 14, 4 : 15})
print(s) print(s) print(s) print(s)
print(s.iloc[0:1]) 11 print(s.iloc[0:1]) 11
print(s.iloc[0:2]) 11 12 print(s.iloc[0:2]) 11 12
print(s.iloc[0:3]) 11 12 13 print(s.iloc[0:3]) 11 12 13
# to display selected values using loc from starting index 0 and here stop index include # to display selected values using loc from starting index ‘a’ and here stop index include
print(s.loc[0:1]) 11 12 print(s.loc['a' : 'a']) 11
print(s.loc[0:2]) 11 12 13 print(s.loc['a' : 'b']) 11 12
print(s.loc[0:3]) 11 12 13 print(s.loc['a' : 'c']) 11 12 13
Pandas head() and tail() function
head() -> to access number of rows from top.
tail() -> to access number of rows from bottom.
Note- head and tail function are used in both series and dataframe to access rows from top position and bottom position.
print(s.head(1)) print(s.tail(1)) print(s.head(1)) print(s.tail(1))
0 11 2 13 a 11 c 13
print(s.head(2)) print(s.tail(2)) print(s.head(2)) print(s.tail(2))
0 11 1 12 a 11 b 12
1 12 2 13 b 12 c 13
print(s.head(3)) print(s.tail(3)) print(s.head(3)) print(s.tail(3))
0 11 0 11 a 11 a 11
1 12 1 12 b 12 b 12
2 13 2 13 c 13 c 13
Pandas, Dataframe Creation using various python concept
Pandas dataframe, is like a two dimensional array like structure or a table like structure with heterogeneous data. We can change its size after
creation and we can also change its values also using index. It also supports forward/backward index accessing and slicing operation which are
already used in list, 1d and 2d array or in string concept. We can store multiple rows and columns with, a set of different values of different data
types in one pandas dataframe variable.
Using a list of inside sub lists Using a dictionary of lists Using a dictionary of pandas series
[ { {
[ , ], Key1 : [a set of values ], Key1 : pd.Series ( [a set of values ] ),
[ , ] Key2 : [a set of values ] Key2 : pd.Series ( [a set of values ] )
] } }
rollno name rollno name rollno name
0 101 ram 0 101 ram 0 101 ram
1 102 mohan 1 102 mohan 1 102 mohan
2 103 sohan 2 103 sohan 2 103 sohan
import pandas as pd import pandas as pd import pandas as pd
]
rollno name rollno name
0 101 ram 0 101 ram
1 102 mohan 1 102 mohan
2 103 sohan 2 103 sohan
import pandas as pd import pandas as pd
data = [
{ 'rollno' : 101, 'name' : 'ram' }, df = pd.read_csv( r "d:\data.csv" )
{ 'rollno' : 102, 'name' : 'mohan' }, print(df)
{ 'rollno' : 103, 'name' : 'sohan' }
]
output-
df = pd.DataFrame(data, columns=['rollno' , 'name']) rollno name
print(df)
0 101 ram
output- 1 102 mohan
rollno name 2 103 sohan
0 101 ram
1 102 mohan Note- here pd.read_csv( ) is method, which are used to read csv file
2 103 sohan from other location and we must use r before path to read our data.
DataFrame Basic # ndim:- It show dimension (means total number of axis) #sum():- to sum of all numeric columns and to
of dataframe. concatenate of string columns.
month sales1 sales2
print(x.ndim) print(x.sum())
0 jan 5 3
1 feb 7 5
output:- output:-
2 mar 6 8 2 month janfebmar
import pandas as pd note:- Total number of axis is also called rank. sales1 18
x=pd.DataFrame({ ------------------------------------------ sales2 16
'month':['jan','feb','mar'], #values:-It is used to show all values of dataframe. ------------------------------------------
'sales1':[5,7,6], print(x.values) #max():-to show numeric columns maximum values and
'sales2':[3,5,8]}) also string columns maximum.
output:-
#describe():- show all types of statics of dataframe [['jan' 5 3] print(x.max())
data. ['feb' 7 5] output:-
print(x.describe()) ['mar' 6 8]] month mar
------------------------------------------ sales1 7
output:- #size:- It is used to show total number of elements.
sales1 sales2 sales2 8
print(x.size) ------------------------------------------
count 3.0 3.000000
mean 6.0 5.333333 #min():-to show numeric columns minimum values and
output:-
std 1.0 2.516611 also string columns minimum.
9
min 5.0 3.000000 ------------------------------------------
25% 5.5 4.000000 print(x.min())
#shape:-It is used to show total no. of row & columns.
50% 6.0 5.000000 output:-
print(x.shape)
75% 6.5 6.500000 month feb
output:-
max 7.0 8.000000 sales1 5
(3,3)
------------------------------------------------------- sales2 3
------------------------------------------
#T:- Transpose the dataframe (row convert into ------------------------------------------
#axes:- It is used to show structure details of dataframe.
columns & columns convert into rows. #var():- to show variance values of all numeric columns.
print(x.axes)
print(x.T) print(x.var())
output:-
output:- output:-
[RangeIndex(start=0, stop=3, step=1),
0 1 2 sales1 1.000000
Index(['month', 'sales1', 'sales2'],
month jan feb mar sales2 6.333333
dtype='object')]
sales1 5 7 6 ------------------------------------------
------------------------------------------
sales2 3 5 8 #std():- to show standard deviation values of all numeric
#count():-It is used to count each column, all values.
------------------------------------------------------- columns. (the square root of variance will produce
print(x.count())
#dtypes:- It show data types of dataframe columns. standard deviation)
print(x.dtypes) output:- print(x.std())
output:- month 3 output:-
month object sales1 3 sales1 1.000000
sales1 int64 sales2 3 sales2 2.516611
sales2 int64
Dataframe, Columns, Rows and Each Values Accessing #to display more than one columns
print(x[ [ 'rollno' , 'name' ] ]) print(x[ [ 'marks' , 'city' ] ])
rollno name marks city
rollno name city state marks
0 101 arpit 0 50 kota
0 101 arpit kota rajasthan 50
1 102 himmat 1 70 jamnagar
1 102 himmat Jamnagar gujarat 70
2 103 rohit 2 60 kota
2 103 rohit kota rajasthan 60
3 104 vinod 3 80 jamnagar
3 104 vinod Jamnagar gujarat 80 4 105 nitin 4 90 ajmer
4 105 nitin ajmer rajasthan 90
import pandas as pd #to display each row separately using -> loc[row index number]
import numpy as np
import matplotlib.pyplot as plt print(x.loc[0]) print(x.loc[3])
rollno name city state marks rollno name city state marks
x=pd.DataFrame({'rollno':[101,102,103,104,105], 0 101 arpit kota rajasthan 50 3 104 vinod jamn. gujarat 80
'name':['arpit','himmat','rohit','vinod','nitin'],
'city':['kota','jamnagar','kota','jamnagar','ajmer'], #to display multiple rows with all columns:-
'state':['rajasthan','gujarat','rajasthan','gujarat','rajasthan'], using -> loc[start row index number : stop row index number]
'marks':[50,70,60,80,90]})
print(x.loc[0:1]) print(x.loc[3:4])
# to display entire data frame rollno name city state marks rollno name city state marks
print (x) 0 101 arpit kota rajasthan 50 3 104 vinod jamn. gujarat 80
1 102 himmat jam.. gujarat 70 4 105 nitin ajmer rajasthan 90
rollno name city state marks
0 101 arpit kota rajasthan 50 #to display multiple rows with selected columns:-
1 102 himmat jamnagar gujarat 70 using -> loc[start row : stop row , start column name : stop column name ]
2 103 rohit kota rajasthan 60 print(x.loc[1:2, "city": "marks"]) print(x.loc[2:4, "name": "city"])
3 104 vinod jamnagar gujarat 80 city state marks name city
4 105 nitin ajmer rajasthan 90 1 jamnagar gujarat 70 2 rohit kota
2 kota rajasthan 60 3 vinod jamnagar
#to display each column separately 4 nitin ajmer
print(x['rollno']) print(x['name']) print(x['city'])
or #to display each value from dataframe:- using iloc[row index, column index]
print(x.rollno) print(x.iloc[0,0]) print(x.iloc[1,0]) print(x.iloc[2,0])
rollno name city 101 102 103
0 101 0 arpit 0 kota
1 102 1 himmat 1 jamnagar print(x.iloc[0,1]) print(x.iloc[1,1]) print(x.iloc[2,1])
2 103 2 rohit 2 kota arpit himmat rohit
3 104 3 vinod 3 jamnagar
4 105 4 nitin 4 ajmer
DataFrame- Boolean indexing Importing/Exporting Data
between CSV files and Data Frames
Pandas, DataFrame also support Boolean indexing. So we can For export- DataFrame . to_csv( )
direct search our data based on True or False indexing. We can use
loc[ ] for this purpose. For import- pd.read_csv( r "path" )
import pandas as pd #export this student1 DataFrame to d drive, a new file std1.csv.
data1={
'rollno' : [101,102,103,104], import pandas as pd
'name' : ['ram','mohan','sohan','rohan'] data1={
} 'rollno' : [101,102],
'name' : ['ram','mohan']
student1 = pd.DataFrame(data1, }
index = [True, False, True, False],
columns=['rollno' , 'name'] student1 = pd.DataFrame(data1, columns=['rollno' , 'name'])
)
print(student1) student1.to_csv(
output- r 'd:\std1.csv',
rollno name
True 101 ram index = False,
False 102 mohan header=True
True 103 sohan )
False 104 rohan Note- In std1.csv file index values 0,1 will not show but rollno and name
---------------------- heading will show on this file.
print(student1.loc[True] ) -------------------------------------------------------------
#Now import this std1.csv file from d drive in a DataFrame student1 again.
output-
import pandas as pd
rollno name
True 101 ram
True 103 sohan student1= pd.read_csv( r "d:\std1.csv" )
----------------------- print(student1)
print(student1.loc[False] ) output-
output- rollno name
rollno name 0 101 ram
False 102 mohan 1 102 mohan
False 104 rohan
Adding Deleting Renaming
a new row using - append() method a existing row – drop (index position) method a existing index rename ( ) method
and a new column in existing DataFrame and a existing column – pop (column name) method and a column rename ( ) method
in existing DataFrame in existing DataFrame
student1 = pd.DataFrame(data1, columns=['rollno' , 'name'] ) student1 = pd.DataFrame(data1, columns=['rollno' , 'name'] ) student1 = pd.DataFrame(data1, columns=['rollno' , 'name'] )
print(student1) print(student1) print(student1)
output- output-
rollno name rollno name output-
0 101 ram 0 101 ram rollno name
1 102 mohan 1 102 mohan 0 101 ram
2 103 sohan 1 102 mohan
#to add a new row in existing a DataFrame #to delete a existing row in a DataFrame #to rename a existing index
student1= student1.append({ 'rollno' : 103, 'name': 'sohan' }, student1= student1. drop(0) student1=student1.rename(index= {0 : 'a' , 1 : 'b'} )
ignore_index=True) print(student1)
print(student1)
print(student1) output-
output- rollno name
output- rollno name a 101 ram
rollno name 1 102 mohan b 102 mohan
0 101 ram 2 103 sohan
1 102 mohan
2 103 sohan
#to add a new column in existing a DataFrame #to delete a existing column in a DataFrame #to rename a existing column
student1[‘marks’] = [50,60,70] student1.pop('rollno') student1=student1.rename(columns=
or {'rollno' : 'sid', 'name':'fullname'})
print(student1)
del student1 ['rollno'] print(student1)
output- print(student1)
output- output-
sid fullname
rollno name marks
name a 101 ram
0 101 ram 50
1 mohan b 102 mohan
1 102 mohan 60
2 103 sohan 70 2 sohan
Iteration - Show rows from top and bottom- Arithmetic operation of DataFrames
Value by value- iloc[row, col ], loc[row, col] head(no. of rows from top) add(a,b) ,
Row by row- iterrows(), itertuples() tail (no. of rows from bottom) subtract(a,b)
Column by column- iteritems() by default- head() shows top 5 rows multiply(a,b)
tail() shows bottom 5 rows divide(a,b) and mod(a,b)
import pandas as pd import pandas as pd import pandas as pd
data1={ data1={ a=pd.DataFrame([
'rollno' : [101,102], 'rollno' : [101,102, 103, 104, 105, 106], [4., 6.],
'name' : ['ram','mohan'] 'name' : ['ram','mohan',’sohan’,’arun’,’rohan’,’shyam’] [10.,12.]
} } ])
student1 = pd.DataFrame(data1, columns=['rollno' , 'name'] )
student1 = pd.DataFrame(data1, columns=['rollno' , 'name'] ) print(student1) b=pd.DataFrame([
print(student1) output- [3., 5.],
output- rollno name [6., 7.]
0 101 ram
rollno name ])
1 102 mohan
0 101 ram 2 103 sohan
1 102 mohan 3 104 arun
4 105 rohan
5 106 shyam
#to iterate/access row by row # show top 5 rows # show bottom 5 rows Output-
print(a+b)
for index, row in student1.iterrows(): print(student1.head()) print(student1.tail()) 0 1
output- output-
or
print (row["rollno"], row["name"])
rollno name rollno name print(np.add(a,b)) 0 7.0 11.0
-------------------or---------------------------------- 0 101 ram 1 102 mohan or
for i in range(len(student1)) : 1 102 mohan 2 103 sohan
2 103 sohan 3 104 arun print(a.add(b)) 1 16.0 19.0
print(student1.iloc[i, 0], student1.iloc[i, 1]) Output-
3 104 arun 4 105 rohan print(a+3)
-------------------or---------------------------------- 4 105 rohan 5 106 shyam 0 1
for i in range(len(student1)) : ---------------------------------- ----------------------------------
or
0 7.0 9.0
print(student1.loc[i, "rollno"], student1.loc[i, "name"]) # show top 2 rows # show bottom 2 rows print(np.add(a,3)) 1 13.0 15.0
print(student1.head(2)) print(student1.tail(2))
-------------------or----------------------------------
output- output-
for row in student1.itertuples(): rollno name rollno name
Vertical sum Horizontal sum
print(row) 0 101 ram 4 105 rohan
print( a.sum (axis=0)) print( a.sum (axis=1))
-------------------or---------------------------------- 1 102 mohan 5 106 shyam
---------------------------------- ---------------------------------- Output- Output-
#to iterate/access column by column # show top 3 rows # show bottom 3 rows
for key, value in student1.iteritems(): print(student1.head(3)) print(student1.tail(3)) 0 14.0 0 10.0
output- output- 1 18.0 1 22.0
print(key, value) Note- we can also
output- rollno name rollno name perform sum, max, min,
101 ram 0 101 ram 3 104 arun count, mean, std and
102 mohan 1 102 mohan 4 105 rohan var of a single row or
2 103 sohan 5 106 shyam column using axis.
103 sohan
1-D Array, 2-D Array, Series & DataFrame:- Each Element Value Accessing Using Loop
1-D Numpy Array Series Numeric Array 2-D Numpy Array DataFrame Numeric Array DataFrame table
each element accessing using each element accessing using loop each element accessing using loop each element accessing using loop each element accessing using loop
loop
import numpy as np import pandas as pd 0 1 2 0 1 2 month sales1 sales2
0 5 3 4 0 5 3 4 0 jan 5 3
n1=np.array([10, 20, 13]) s1=pd.Series([10, 20, 13]) 1 7 5 8 1 7 5 8 1 feb 7 5
2 6 8 3 2 6 8 3 2 mar 6 8
for i in range(0, len(n1)): for i in range(0, len(s1)): 3 9 10 6 3 9 10 6 3 apr 9 10
print(n1[i]) print(s1[i]) import numpy as np import pandas as pd import pandas as pd
rc=df.shape rc=df.shape
totalrow=rc[0] totalrow=rc[0]
totalcol=rc[1] totalcol=rc[1]
for row in range(0, len(arr)): for row in range(0, totalrow): for row in range(0, totalrow):
print() print() print()
for col in range(0, len(arr[row])): for col in range(0, totalcol): for col in range(0, totalcol):
print(arr[row , col ], end=' ') print(df.iloc[row , col ], end=' ') print(df.iloc[row , col ], end=' ')
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
To sum a selected row To sum a selected column To sum a selected row To sum a selected column
print( df.loc['r0'].sum( )) 12 print( df.loc[ : , 'c0'].sum( )) 23 print( df.iloc[0].sum( )) 12 print( df.iloc[ : , 0].sum( )) 23
print( df.loc['r1'] .sum( )) 25 print( df.loc[ : , 'c1'].sum( ) ) 20 print( df.iloc[1] .sum( )) 25 print( df.iloc[ : , 1].sum( ) ) 20
print( df.loc['r2'] .sum( )) 22 print( df.loc[ : , 'c2'].sum( ) ) 22 print( df.iloc[2] .sum( )) 22 print( df.iloc[ : , 2].sum( ) ) 22
print( df.loc['r3'] .sum( )) 26 print( df.loc[ : , 'c3'].sum ( )) 20 print( df.iloc[3] .sum( )) 26 print( df.iloc[ : , 3].sum ( )) 20
To sum - row wise directly To sum - column wise directly To sum - row wise directly To sum - column wise directly
print( df.sum(axis=1 ) ) print( df.sum(axis=0 ) ) print( df.sum(axis=1 ) ) print( df.sum(axis=0 ) )
12 23 12 23
25 20 25 20
22 22 22 22
26 20 26 20
To show various statics of first row To show various statics of first column To show various statics of first row To show various statics of first column
print( df.loc['r0'].mean() ) 3.0 print( df.loc[:,'c0'].mean() ) 5.75 print( df.iloc[0].mean() ) 3.0 print( df.iloc[:,0].mean() ) 5.75
print( df.loc['r0'].median() ) 3.0 print( df.loc[:,'c0'].median() ) 6.0 print( df.iloc[0].median() ) 3.0 print( df.iloc[:,0].median() ) 6.0
print( df.loc['r0'].mode() ) 3 print( df.loc[:,'c0'].mode() ) 3,7,5,8 print( df.iloc[0].mode() ) 3 print( df.iloc[:,0].mode() ) 3,7,5,8
print( df.loc['r0'].var() ) 0.66 print( df.loc[:,'c0'].var() ) 4.91 print( df.iloc[0].var() ) 0.66 print( df.iloc[:,0].var() ) 4.91
print( df.loc['r0'].std() ) 0.81 print( df.loc[:,'c0'].std() ) 2.21 print( df.iloc[0].std() ) 0.81 print( df.iloc[:,0].std() ) 2.21
print( df.loc['r0'].quantile(.25) ) 2.75 print( df.loc[:,'c0'].quantile(.25) ) 4.5 print( df.iloc[0].quantile(.25) ) 2.75 print( df.iloc[:,0].quantile(.25) ) 4.5
print( df.loc['r0'].quantile(.50) ) 3.0 print( df.loc[:,'c0'].quantile(.50) ) 6.0 print( df.iloc[0].quantile(.50) ) 3.0 print( df.iloc[:,0].quantile(.50) ) 6.0
print( df.iloc[0].quantile(.75) ) print( df.iloc[:,0].quantile(.75) )
print( df.loc['r0'].quantile(.75) ) 3.25 print( df.loc[:,'c0'].quantile(.75) ) 7.25 3.25 7.25
print( df.iloc[0].quantile(1) ) print( df.iloc[:,0].quantile(1) )
print( df.loc['r0'].quantile(1) ) 4.0 print( df.loc[:,'c0'].quantile(1) ) 8.0 4.0 8.0
Slice for rows using loc Slice for columns using loc Slice for rows using iloc Slice for columns using iloc
loc include stop index. loc include stop index. iloc is not include stop index. iloc is not include stop index.
print( df.loc['r0' : 'r0' , : ] ) print( df.loc[ : , 'c0' : 'c0' ] ) print( df.iloc[ 0 : 1 , : ] ) print( df.iloc[ : , 0 : 1 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r1' , : ] ) print( df.loc[ : , 'c0' : 'c1' ] ) print( df.iloc[ 0 : 2 , : ] ) print( df.iloc[ : , 0 : 2 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r2' , : ] ) print( df.loc[ : , 'c0' : 'c2' ] ) print( df.iloc[ 0 : 3 , : ] ) print( df.iloc[ : , 0 : 3 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r3' , : ] )) print( df.loc[ : , 'c0' : 'c3' ] ) print( df.iloc[ 0 : 4 , : ] ) print( df.iloc[ : , 0 : 4 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
Slicing- using step for pandas dataframe
Slice for rows using loc Slice for columns using loc Slice for rows using iloc Slice for columns using iloc
loc include stop index. loc include stop index. iloc is not include stop index. iloc is not include stop index.
print( df.loc['r0' : 'r1' : 2 , : ] ) print( df.loc[ : , 'c0' : 'c1' : 2 ] ) print( df.iloc[ 0 : 1 : 2, : ] ) print( df.iloc[ : , 0 : 1: 2 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r2' : 2 , : ] ) print( df.loc[ : , 'c0' : 'c2' : 2 ] ) print( df.iloc[ 0 : 2 : 2, : ] ) print( df.iloc[ : , 0 : 2 : 2] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r3' : 2 , : ] ) print( df.loc[ : , 'c0' : 'c3': 2 ] ) print( df.iloc[ 0 : 3 : 2 , : ] ) print( df.iloc[ : , 0 : 3: 2 ] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
print( df.loc['r0' : 'r4' : 2 , : ] ) print( df.loc[ : , 'c0' : 'c4' : 2] ) print( df.iloc[ 0 : 4 : 2, : ] ) print( df.iloc[ : , 0 : 4 : 2] )
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
Seleced rows with selected columns Slicing- using step for pandas dataframe
c0 c1 c2 c3 c0 c1 c2 c3 0 1 2 3 0 1 2 3
r0 3 2 3 4 r0 3 2 3 4 0 3 2 3 4 0 3 2 3 4
r1 7 4 6 8 r1 7 4 6 8 1 7 4 6 8 1 7 4 6 8
r2 5 6 9 2 r2 5 6 9 2 2 5 6 9 2 2 5 6 9 2
r3 8 8 4 6 r3 8 8 4 6 3 8 8 4 6 3 8 8 4 6
1 7 4 6 8 1 7 4 6 8
2 5 6 9 2 2 5 6 9 2
3 8 8 4 6 3 8 8 4 6
Pythons programs with solutions
1. Create a pandas series from a dictionary of values and an ndarray.
# Create a pandas series from a dictionary of values.
import pandas as pd
arr=np.array( [11,12,13,14,15] )
s =pd.Series(arr)
print(s)
output-
0 11
1 12
2 13
3 14
4 15
2. Given a Series, print all the elements that are above the 75th percentile.
import pandas as pd
x=pd.Series([10,20,30,40,50,50,60,70,70,70,80,90,100])
print(x.loc[x>= x.quantile(.75)])
output-
7 70
8 70
9 70
10 80
11 90
12 100
df=pd.DataFrame([
[3,2,3,4],
[7,4,6,8],
[5,6,9,2],
[8,8,4,6]
])
evencount=np.sum(np.array(df)%2== 0)
print( evencount)
output-
11
------------
oddcount=np.sum(np.array(df)%2== 1)
print( oddcount )
output-
5
Important - Board Sample question and answers
What is series? Explain with the help of an example. Pandas Series is a one-dimensional labeled
array capable of holding data of any type
(integer, string, float, python objects, etc.).
The axis labels are collectively called index.
Example
import pandas as pd
# simple array
data =pd.series([1,2,3,4,5])
print(data)
Hitesh wants to display the last four rows of the dataframe df and has df.tail(4)
written the following code :
df.tail()
But last 5 rows are being displayed. Identify the error and rewrite the
correct code so that last 4 rows get displayed.
Write the command using Insert() function to add a new column in the EMP.insert(loc=3, column=”Salary”,value=Sal)
last place (3rd place) named “Salary” from the list
Sal=[10000,15000,20000] in an existing dataframe named EMP already
having 2 columns.
Consider the following python code and write the output for statement 0.50 8.0
S1 0.75 11.0
import pandas as pd
K=pd.series([2,4,6,8,10,12,14])
K.quantile([0.50, 0.75])
---------------------- S1
CSV stands for _____________ Comma separated values
Write a python code to create a dataframe with appropriate headings import pandas as pd
from the list given below : # initialize list of lists
data = [['S101', 'Amy', 70],
['S101', 'Amy', 70],
['S102', 'Bandhi', 69],
['S102', 'Bandhi', 69], ['S104', 'Cathy', 75],
['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]]
# Create the pandas DataFrame
['S105', 'Gundaho', 82]
df = pd.DataFrame(data, columns = ['ID',
'Name', 'Marks'])
# printdataframe.
print(df )
Write a small python code to create a dataframe with headings(a and b) import pandas as pd
from the list given below : df = pd.DataFrame([[1, 2], [3, 4]], columns =
[ [1,2], [3,4], [5,6], [7,8] ] ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns
= ['a','b'])
df = df.append(df2)
Find the output of the following code:
import pandas as pd
data = [{'a': 10, 'b': 20}, {'a': 6, 'b': 32, 'c': 22}]
#with two column indices, values same as dictionary keys a b
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) first 10 20
#With two column indices with one index with other name second 6 32
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1']) a b1
print(df1) first 10 NaN
print(df2) second 6 NaN
Write the code in pandas to create the following dataframes :
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'mark1':[30,40,15,40],
'mark2':[20,45,30,70]});
df2 = pd.DataFrame({'mark1':[10,20,20,50],
'mark2':[15,25,30,30]});
print(df1)
print(df2)
Write the commands to do the following operations on the dataframes given above :
(i) To rename column mark1 as marks1 in both the dataframes df1 and df2.
note: inplace =True -> directly permanent change into original dataframe.
df1.rename(columns={'mark1':'marks1'}, inplace=True)
print(df1)
df2.rename(columns={'mark1':'marks1'}, inplace=True)
print(df2)
Given a dataframe namely data as shown in adjancant a. Find all rows with the label, ‘apple’. Extract all columns.
figure (fruit names are row labels). data.loc[ ‘apple’, : ]
Write code statement to –
color count price b. List only the columns count and price using loc.
apple red 3 120 data.loc[: , [‘color’, ’price’] ]
apple green 9 110
pear red 25 125 c. List only rows with labels ‘ apple’ and ‘pear’ using loc.
pear green 26 150 data.loc[ [‘apple’, ’pear’] ]
line green 99 70
Syntax and examples of various Pandas data structure operation
Rename columns y=x.rename(columns={'month' : 'monthly', 'sales1':'total sales1'})
To add a new row student1= student1.append({ 'rollno' : 103, 'name': 'sohan' }, ignore_index=True)
To delete a row student1= student1. drop(0)
To delete a column student1.pop('rollno')
To rename columns student1=student1.rename(columns= {'rollno' : 'sid', 'name':'fullname'})