Pandas Dataframe2
Pandas Dataframe2
DATAFRAME-2
Applying Function with DataFrame
Aggregate Function/Multi Row Function
max()
It is used to find maximum value from a given set of values or column of a
dataframe.
df.max( )
df[‘colname’].max()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.max()
name vika
age 26
score 85
dtype: object
>>> df['age'].max()
26
>>> df.max(axis=1)
0 85
1 63
2 55
3 74
4 31
5 77
dtype: int64
min()
It is used to find minimum value from a given set of values or column of a
dataframe.
TATA DAV SCHOOL, SIJUA
df.min( )
df[‘colname’].min()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.min()
name ika
age 22
score 31
dtype: object
>>> df['score'].min()
31
sum()
It is used to add all the values from a given set of values or column of a
dataframe.
df.sum( )
df[‘colname’].sum()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sum()
name inaminatinaikavikatika
age 142
score 385
TATA DAV SCHOOL, SIJUA
dtype: object
>>> df['score'].sum()
385
count()
It is used to count all the values from a given set of values or column of a
dataframe.
df.count( )
df[‘colname’].count()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.count()
name 6
age 6
score 6
dtype: int64
>>> df['score'].count()
6
mode()
It is used to calculate the mode or the most repeated value of a given set of
numbers
df.mode( )
df[‘colname’].mode()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
TATA DAV SCHOOL, SIJUA
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df['age'].mode()
0 23
1 24
dtype: int64
mean()
It is used to calculate the arithmetic mean /average of a given set of
values/numbers
df.mean( )
df[‘colname’].mean()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.mean()
age 23.666667
score 64.166667
dtype: float64
>>> df['age'].mean()
23.666666666666668
median()
It is used to calculate the median or middle vlaue of a given set of
values/numbers
df.median( )
df[‘colname’].median()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
TATA DAV SCHOOL, SIJUA
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.median()
age 23.5
score 68.5
dtype: float64
>>> df['age'].median()
23.5
quantile()
It returns the value at the given quantile over requested axis(0/1)
The word quantile is derived from the word quantity. A quantile is where a
sample is divided into equal size sub-groups.
Common Quantiles:
1. The 2 quantiles are called the median
2. The 3 quantiles are called the terciles
3. The 4 quantiles are called the quartiles
4. The 5 quantiles are called the quintiles
5. The 6 quantiles are called the sextiles
6. The 7 quantiles are called the septiles
7. The 8 quantiles are called the octiles
8. The 10 quantiles are called the deciles
9. The 12 quantiles are called the duodeciles
10. The 20 quantiles are called the vigintiles
11. The 100 quantiles are called the percentiles
12.The 1000 quantiles are called the permilles
-->>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
TATA DAV SCHOOL, SIJUA
4 vika 23 31
5 tika 24 77
>>> df.quantile(0.5)
age 23.5
score 68.5
Name: 0.5, dtype: float64
>>> df.quantile([.1,.25,.5,.75])
age score
0.10 22.5 43.00
0.25 23.0 57.00
0.50 23.5 68.50
0.75 24.0 76.25
var()
It returns the variance of given set numbers. It is calculated the average of
squared deviations from the mean.
How to Calculate Variance
1. Find the mean of the data set. Add all data values and divide by the sample
size n.
2. Find the squared difference from the mean for each data value. Subtract the
mean from each data value and square the result.
3. Find the sum of all the squared differences. ...
4. Calculate the variance.
Std=sqrt(mean(abs(x-x.mean())2)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.std()
age 1.366260
score 19.395017
dtype: float64
>>> df['age'].std()
1.3662601021279466
cumsum()
It returns the cumulative sum of a given series number/values.
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
TATA DAV SCHOOL, SIJUA
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df['age'].cumsum()
0 26
1 50
2 73
3 95
4 118
5 142
Name: age, dtype: int64
sort_values()
It sort the data of given column either in ascending or in descending order.
df.sort_values(by=column,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_values(by='age')
name age score
3 ika 22 74
2 tina 23 55
4 vika 23 31
1 mina 24 63
5 tika 24 77
0 ina 26 85
>>> df.sort_values(by='age',ascending=False)
name age score
0 ina 26 85
1 mina 24 63
5 tika 24 77
2 tina 23 55
4 vika 23 31
TATA DAV SCHOOL, SIJUA
3 ika 22 74
sort_index()
It sort or arrange the value based upon index
df.sort_values(by=None,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index()
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index(ascending=False)
name age score
5 tika 24 77
4 vika 23 31
3 ika 22 74
2 tina 23 55
1 mina 24 63
0 ina 26 85
1. pivot( )
This method creates a new dataframe after reshaping the data based on columns values.
Syntex
Df.pivot(index= ‘column1’ , columns= ‘ column2’ , values= ‘column3’)
Ex.
PIVOT1
import pandas as pd
dic={'tutor':['tahira','gurjot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40],\
'country':['usa','uk','japan','usa','brazil']}
df=pd.DataFrame(dic)
print(df)
pt=df.pivot(index='country',columns='tutor',values='classes')
print("\n ==================================\n\n")
print(pt)
output
Example2
import pandas as pd
import numpy as np
TATA DAV SCHOOL, SIJUA
dic={'tutor':
['tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anush
a','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40,26,37,44,33,41,27,38,45,39,43,228,336,441,832,540],\ 'country':
['usa','uk','japan','usa','brazil','usa','usa','japan','uk','japan','uk','usa','japan','uk','japan','usa','uk','brazil','us
a','brazil'],\
'quarter':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]}
df=pd.DataFrame(dic)
print(df)
#p=df.pivot(index='tutor',columns='country',values='classes')
#print(p)
pt=df.pivot_table(index=['tutor','country'],values='classes',aggfunc="count")
print(pt)
OUTPUT
classes
tutor
anusha 4
gurjyot 4
jacob 4
tahira 4
venkat 4