vertopal.com_12_Pandas
vertopal.com_12_Pandas
We can perform filter data, create charts, create pivot table etc as like Microsoft excel.
Pandas is used for data analysis
Numpy Pandas
series1 = pd.Series(data=[10,20,30,40,50])
print(series1)
0 10
1 20
2 30
3 40
4 50
dtype: int64
import pandas as pd
series1 = pd.Series(data=[10,20,30,40,50],index=[1,2,3,4,5])
print(series1)
1 10
2 20
3 30
4 40
5 50
dtype: int64
import pandas as pd
series1 = pd.Series(data=[10,20,30,40,50],index=["a","b","c","d","e"],
dtype=float)
print(series1)
a 10.0
b 20.0
c 30.0
d 40.0
e 50.0
dtype: float64
dataset1 = pd.DataFrame(data={"hindi":[10,20,30,40,50],"english":
[60,70,80,90,100]})
print(dataset1)
hindi english
0 10 60
1 20 70
2 30 80
3 40 90
4 50 100
import pandas as pd
dataset1 = pd.DataFrame(data={"hindi":[10,20,30,40,50],"english":
[60,70,80,90,100]})
print(dataset1)
hindi english
0 10 60
1 20 70
2 30 80
3 40 90
4 50 100
series1 = pd.Series([23,54,12,47,98],
index=["eng","hindi","sci","maths","ss"])
print(series1)
print("\n")
series2 = pd.Series([76,81,33,51,66],
index=["eng","hindi","sci","maths","ss"])
print(series2)
print("\n")
dataset = pd.DataFrame(data={"column1":series1,"column2":series2})
print(dataset)
eng 23
hindi 54
sci 12
maths 47
ss 98
dtype: int64
eng 76
hindi 81
sci 33
maths 51
ss 66
dtype: int64
column1 column2
eng 23 76
hindi 54 81
sci 12 33
maths 47 51
ss 98 66
print(jeel)
No Name Address
0 101 Jeel Raigadh
1 102 Deep HMT
2 103 Aryan Talod
3 104 Vipul Modasa
4 105 Sachin Gambhoi
5 106 Rohit Vijapur
6 107 Jeel Raigadh
7 108 Deep HMT
8 109 Aryan Talod
9 110 Vipul Modasa
10 111 Sachin Gambhoi
11 112 Rohit Vijapur
12 113 Jeel Raigadh
13 114 Deep HMT
14 115 Aryan Talod
15 116 Vipul Modasa
16 117 Sachin Gambhoi
17 118 Rohit Vijapur
18 119 Vipul Modasa
19 120 Sachin Gambhoi
print(jeel.head())
No Name Address
0 101 Jeel Raigadh
1 102 Deep HMT
2 103 Aryan Talod
3 104 Vipul Modasa
4 105 Sachin Gambhoi
import pandas as pd
print(jeel.head(12))
No Name Address
0 101 Jeel Raigadh
1 102 Deep HMT
2 103 Aryan Talod
3 104 Vipul Modasa
4 105 Sachin Gambhoi
5 106 Rohit Vijapur
6 107 Jeel Raigadh
7 108 Deep HMT
8 109 Aryan Talod
9 110 Vipul Modasa
10 111 Sachin Gambhoi
11 112 Rohit Vijapur
import pandas as pd
print(jeel.tail())
No Name Address
15 116 Vipul Modasa
16 117 Sachin Gambhoi
17 118 Rohit Vijapur
18 119 Vipul Modasa
19 120 Sachin Gambhoi
import pandas as pd
print(jeel.tail(7))
No Name Address
13 114 Deep HMT
14 115 Aryan Talod
15 116 Vipul Modasa
16 117 Sachin Gambhoi
17 118 Rohit Vijapur
18 119 Vipul Modasa
19 120 Sachin Gambhoi
import pandas as pd
print(jeel.dtypes)
No int64
Name object
Address object
dtype: object
import pandas as pd
print(jeel.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 No 20 non-null int64
1 Name 20 non-null object
2 Address 20 non-null object
dtypes: int64(1), object(2)
memory usage: 612.0+ bytes
None
import pandas as pd
print(jeel.describe())
No
count 20.00000
mean 110.50000
std 5.91608
min 101.00000
25% 105.75000
50% 110.50000
75% 115.25000
max 120.00000
Pandas Method for Low Level Understanding
Fetch data from Column
import pandas as pd
jeel["Address"]
0 Raigadh
1 HMT
2 Talod
3 Modasa
4 Gambhoi
5 Vijapur
6 Raigadh
7 HMT
8 Talod
9 Modasa
10 Gambhoi
11 Vijapur
12 Raigadh
13 HMT
14 Talod
15 Modasa
16 Gambhoi
17 Vijapur
18 Modasa
19 Gambhoi
Name: Address, dtype: object
import pandas as pd
jeel["Address"].describe()
count 20
unique 6
top Gambhoi
freq 4
Name: Address, dtype: object
import pandas as pd
jeel.Address
0 Raigadh
1 HMT
2 Talod
3 Modasa
4 Gambhoi
5 Vijapur
6 Raigadh
7 HMT
8 Talod
9 Modasa
10 Gambhoi
11 Vijapur
12 Raigadh
13 HMT
14 Talod
15 Modasa
16 Gambhoi
17 Vijapur
18 Modasa
19 Gambhoi
Name: Address, dtype: object
print(arr)
print(arr.idxmin(axis=0))
print(arr.idxmax(axis=0))
A B C
0 4 11 1
1 5 2 8
2 2 5 66
3 6 8 4
A 2
B 1
C 0
dtype: int64
A 3
B 0
C 2
dtype: int64
name = pd.Series(["Sanjeev","Keshav","Rahul"])
age = pd.Series([37,42,38])
designation = pd.Series(["Manager","Cleark","Accountant"])
d1 =
pd.DataFrame(data={"Name":name,"Age":age,"Designation":designation})
print(d1)
print("\n")
asc1 = d1.sort_values(by='Age')
print(asc1)
print("\n")
desc1 = d1.sort_values(by="Age",ascending=0)
print(desc1)
Example - 1
import pandas as pd
dataframe =
pd.DataFrame({"Player":player,"Team":team,"Category":category,"BidPric
e":bidprice,"Runs":runs})
dataframe
Player Team Category BidPrice Runs
0 hardik Pandya Mumbai Indian Batsman 13 1000
1 KL Rahul Kings Eleven Batsman 12 2400
2 Andre Russel Kolkatta Night Rider Batsman 7 900
3 Jasprit Bumrah Mumbai Indian Bowler 10 200
4 Virat Kohli RCB Batsman 17 3600
5 Rohit Sharma Mumbai Indian Batsman 15 3700
dataframe.iloc[0:2,:]
Player Team
0 hardik Pandya Mumbai Indian
1 KL Rahul Kings Eleven
dataframe.iloc[-3:,:]
Team
Kings Eleven 1
Kolkatta Night Rider 1
Mumbai Indian 3
RCB 1
Name: Player, dtype: int64
que => find player who had highest BidPrice from each Team
val = dataframe.groupby('Team')
print(val['Player','BidPrice'].max())
----------------------------------------------------------------------
-----
ValueError Traceback (most recent call
last)
Cell In[63], line 2
1 val = dataframe.groupby('Team')
----> 2 print(val['Player','BidPrice'].max())
File c:\Users\chauh\AppData\Local\Programs\Python\Python313\Lib\site-
packages\pandas\core\groupby\generic.py:1947, in
DataFrameGroupBy.__getitem__(self, key)
1943 # per GH 23566
1944 if isinstance(key, tuple) and len(key) > 1:
1945 # if len == 1, then it becomes a SeriesGroupBy and this is
actually
1946 # valid syntax, so don't raise
-> 1947 raise ValueError(
1948 "Cannot subset columns with a tuple with more than one
element. "
1949 "Use a list instead."
1950 )
1951 return super().__getitem__(key)
ValueError: Cannot subset columns with a tuple with more than one
element. Use a list instead.
Team
Kings Eleven 2400.000000
Kolkatta Night Rider 900.000000
Mumbai Indian 1633.333333
RCB 3600.000000
Name: Runs, dtype: float64
first_row = dataframe.iloc[0][1]
first_row
C:\Users\chauh\AppData\Local\Temp\ipykernel_2440\774073253.py:1:
FutureWarning: Series.__getitem__ treating keys as positions is
deprecated. In a future version, integer keys will always be treated
as labels (consistent with DataFrame behavior). To access a value by
position, use `ser.iloc[pos]`
first_row = dataframe.iloc[0][1]
'Mumbai Indian'
import pandas as pd
df1 = pd.DataFrame({'key':['a','b','c','d'],'value':[1,2,3,4]})
print(df1)
df2 = pd.DataFrame({'key':['a','b','e','b'],'value':[5,6,7,8]})
print(df2)
df3 = df1.merge(df2,on='key',how='inner')
print(df3)
key value
0 a 1
1 b 2
2 c 3
3 d 4
key value
0 a 5
1 b 6
2 e 7
3 b 8
key value_x value_y
0 a 1 5
1 b 2 6
2 b 2 8
Group By in Pandas
import pandas as pd
# Sample data
data = {
'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles',
'Chicago'],
'Category': ['A', 'A', 'B', 'B', 'A'],
'Sales': [200, 150, 300, 120, 250]
}
df = pd.DataFrame(data)
print(df)
group1 = df.groupby('City').sum()
print(group1)
Category Sales
City
Chicago A 250
Los Angeles AB 270
New York AB 500
group1 = df.groupby('City')["Sales"].sum()
print(group1)
City
Chicago 250
Los Angeles 270
New York 500
Name: Sales, dtype: int64
group2 = df.groupby(['City','Category'])['Sales'].sum()
print(group2)
City Category
Chicago A 250
Los Angeles A 150
B 120
New York A 200
B 300
Name: Sales, dtype: int64
Original DataFrame:
A B C
0 3 6 5
1 7 2 3
2 2 8 7
3 9 1 4