Data Handlinng Using Pandas-I
Data Handlinng Using Pandas-I
Pandas:
• It is a package useful for data analysis and manipulation.
• Pandas provide an easy way to create, manipulate and wrangle the
data.
• Pandas provide powerful and easy-to-use data structures, as well
as the means to quickly perform operations on these structures.
1. Series
2. Data Frame
3. Panel
e.g.-
Index Data
0 10
1 15
2 18
3 22
Program-
import pandas as pd
Output-
import numpy as np Default Index
0 10
arr=np.array([10,15,18,22])
1 15
s = pd.Series(arr) 2 18
print(s) 3 22
Data
Here we create an
array of 4 values.
How to create Series with Mutable index
Program-
print(s)
Creating a series from Scalar value
Print all the values of the Series that are greater than 2.
Example-2
Result of s.head()
Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.
Series provides index label loc and ilocand [] to access rows and
columns.
Syntax:-series_name.loc[StartRange: StopRange]
Example-
Syntax:-series_name.iloc[StartRange : StopRange]
Example-
Syntax:-series_name[StartRange> : StopRange] or
series_name[ index]
Example-
Example-
Slicing in Series
The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.
Example :-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
DATAFRAME
DATAFRAME STRUCTURE
0 ROHIT MI 13
1 VIRAT RCB 17
2 HARDIK MI 14
INDEX DATA
PROPERTIES OF DATAFRAME
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array
Program-
Output-
import pandas as pd
0
s = pd.Series(['a','b','c','d']) 0 a
1 b Default Column Name As 0
df=pd.DataFrame(s)
2 c
print(df) 3 d
DataFrame from Dictionary of Series
Example-
Example-
Iteration on Rows and Columns
1. iterrows ()
2. iteritems ()
iterrows()
Example-
Select operation in data frame
To access the column data ,we can mention the column name as
subscript.
e.g. - df[empid] This can also be done by using df.empid.
To access multiple columns we can write as df[ [col1, col2,---] ]
Example -
>>df.empid or df[‘empid’]
0 101
1 102
2 103
3 104
4 105
5 106
Name: empid, dtype: int64
>>df[[‘empid’,’ename’]]
empid ename
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in data
frame
import pandas as pd
s = pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df[‘List3’]=df[‘List1’]+df[‘List2’] Output-
List1 List2
0 10 20
1 15 20
2 18 20
3 22 20
List1
0 10
1 15
2 18
3 22
To Delete a Column Using drop()
import pandas as pd
s= pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=40
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=[2,3],axis=0) (axis=0) means to delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)
Output-
List1 List2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through loc()
and iloc() method or indexing using Labels
Pandas provide loc() and iloc() methods to access the subset from a
data frame using row/column.
Syntax-
Syntax-
The method head() gives the first 5 rows and the method
tail() returns the last 5 rows.
import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'],
'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata)
print(df)
print(df.head())
print(df.tail())
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
Doj empid ename
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
To display first 2 rows we can use head(2) and to returns last2
rows we can use tail(2) and to return 3rd to 4th row we can write
df[2:5].
import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'],
'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata)
print(df)
print(df.head(2))
print(df.tail(2))
print(df[2:5])
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Example-1
1. Full Outer Join:- The full outer join combines the results of
both the left and the right outer joins. The joined data frame will
contain all records from both the data frames and fill in NaNs for
missing matches on either side. You can perform a full outer join by
specifying the how argument as outer in merge() function.
Example-
Example-
3. RightJoin :-The right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how
argument inside merge() function.
Example-
4. Left Join :- The left join produce a complete set of records
from data frame A(Left side Data Frame) with the matching records
(where available) in data frame B( Right side data frame). If there is
no match left side will contain null. You have to pass left in how
argument inside merge() function.
Example-
5. Joining on Index :-Sometimes you have to perform the join on
the indexes or the row labels. For that you have to specify
right_index( for the indexes of the right data frame ) and left_index(
for the indexes of left data frame) as True.
Example-
CSV File