0% found this document useful (0 votes)
0 views

Pandas Dataframe2

The document provides an overview of various functions and operations that can be applied to a pandas DataFrame, including aggregate functions like max, min, sum, count, mode, mean, median, quantile, variance, standard deviation, cumulative sum, and sorting methods. It also covers pivoting techniques to rearrange data for better analysis. Examples are provided to illustrate the use of these functions with a sample DataFrame.

Uploaded by

manishmcamba2013
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Pandas Dataframe2

The document provides an overview of various functions and operations that can be applied to a pandas DataFrame, including aggregate functions like max, min, sum, count, mode, mean, median, quantile, variance, standard deviation, cumulative sum, and sorting methods. It also covers pivoting techniques to rearrange data for better analysis. Examples are provided to illustrate the use of these functions with a sample DataFrame.

Uploaded by

manishmcamba2013
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

TATA DAV SCHOOL, SIJUA

DATAFRAME-2
Applying Function with DataFrame
Aggregate Function/Multi Row Function
max()
It is used to find maximum value from a given set of values or column of a
dataframe.
df.max( )
df[‘colname’].max()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.max()
name vika
age 26
score 85
dtype: object
>>> df['age'].max()
26
>>> df.max(axis=1)
0 85
1 63
2 55
3 74
4 31
5 77
dtype: int64

min()
It is used to find minimum value from a given set of values or column of a
dataframe.
TATA DAV SCHOOL, SIJUA
df.min( )
df[‘colname’].min()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.min()
name ika
age 22
score 31
dtype: object
>>> df['score'].min()
31
sum()
It is used to add all the values from a given set of values or column of a
dataframe.
df.sum( )
df[‘colname’].sum()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sum()
name inaminatinaikavikatika
age 142
score 385
TATA DAV SCHOOL, SIJUA
dtype: object
>>> df['score'].sum()
385
count()
It is used to count all the values from a given set of values or column of a
dataframe.
df.count( )
df[‘colname’].count()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.count()
name 6
age 6
score 6
dtype: int64
>>> df['score'].count()
6
mode()
It is used to calculate the mode or the most repeated value of a given set of
numbers
df.mode( )
df[‘colname’].mode()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
TATA DAV SCHOOL, SIJUA
3 ika 22 74
4 vika 23 31
5 tika 24 77

>>> df['age'].mode()
0 23
1 24
dtype: int64
mean()
It is used to calculate the arithmetic mean /average of a given set of
values/numbers
df.mean( )
df[‘colname’].mean()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.mean()
age 23.666667
score 64.166667
dtype: float64
>>> df['age'].mean()
23.666666666666668
median()
It is used to calculate the median or middle vlaue of a given set of
values/numbers
df.median( )
df[‘colname’].median()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
TATA DAV SCHOOL, SIJUA
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.median()
age 23.5
score 68.5
dtype: float64
>>> df['age'].median()
23.5
quantile()
It returns the value at the given quantile over requested axis(0/1)
The word quantile is derived from the word quantity. A quantile is where a
sample is divided into equal size sub-groups.
Common Quantiles:
1. The 2 quantiles are called the median
2. The 3 quantiles are called the terciles
3. The 4 quantiles are called the quartiles
4. The 5 quantiles are called the quintiles
5. The 6 quantiles are called the sextiles
6. The 7 quantiles are called the septiles
7. The 8 quantiles are called the octiles
8. The 10 quantiles are called the deciles
9. The 12 quantiles are called the duodeciles
10. The 20 quantiles are called the vigintiles
11. The 100 quantiles are called the percentiles
12.The 1000 quantiles are called the permilles
-->>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
TATA DAV SCHOOL, SIJUA
4 vika 23 31
5 tika 24 77
>>> df.quantile(0.5)
age 23.5
score 68.5
Name: 0.5, dtype: float64
>>> df.quantile([.1,.25,.5,.75])
age score
0.10 22.5 43.00
0.25 23.0 57.00
0.50 23.5 68.50
0.75 24.0 76.25
var()
It returns the variance of given set numbers. It is calculated the average of
squared deviations from the mean.
How to Calculate Variance
1. Find the mean of the data set. Add all data values and divide by the sample
size n.
2. Find the squared difference from the mean for each data value. Subtract the
mean from each data value and square the result.
3. Find the sum of all the squared differences. ...
4. Calculate the variance.

How is squared difference calculated?


Work out the Mean (the simple average of the numbers) Then for each number:
subtract the Mean and square the result (the squared difference). Then work out
the average of those squared differences.

-->>> import pandas as pd


>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.var()
age 1.866667
TATA DAV SCHOOL, SIJUA
score 376.166667
dtype: float64
>>> df['age'].var()
1.8666666666666671
/////////////////////////////////////////
std()
What Is Standard Deviation? ... A standard deviation is a statistic that measures the
dispersion of a dataset relative to its mean.
To calculate the standard deviation of those numbers:
1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result.
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!

Std=sqrt(mean(abs(x-x.mean())2)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.std()
age 1.366260
score 19.395017
dtype: float64
>>> df['age'].std()
1.3662601021279466
cumsum()
It returns the cumulative sum of a given series number/values.
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
TATA DAV SCHOOL, SIJUA
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df['age'].cumsum()
0 26
1 50
2 73
3 95
4 118
5 142
Name: age, dtype: int64
sort_values()
It sort the data of given column either in ascending or in descending order.
df.sort_values(by=column,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_values(by='age')
name age score
3 ika 22 74
2 tina 23 55
4 vika 23 31
1 mina 24 63
5 tika 24 77
0 ina 26 85
>>> df.sort_values(by='age',ascending=False)
name age score
0 ina 26 85
1 mina 24 63
5 tika 24 77
2 tina 23 55
4 vika 23 31
TATA DAV SCHOOL, SIJUA
3 ika 22 74
sort_index()
It sort or arrange the value based upon index
df.sort_values(by=None,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index()
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index(ascending=False)
name age score
5 tika 24 77
4 vika 23 31
3 ika 22 74
2 tina 23 55
1 mina 24 63
0 ina 26 85

PANDAS ADVANCE OPERATION ON DATAFRAMES


PIVOTING
Pivoting techniques re-arranges the data from rows and columns by possibly aggregating data so that
data can be viewed in a different perspectives.
TATA DAV SCHOOL, SIJUA
It summaries the extensive data
It rotates the pivot data by transforming rows into columns

1. pivot( )
This method creates a new dataframe after reshaping the data based on columns values.

Syntex
Df.pivot(index= ‘column1’ , columns= ‘ column2’ , values= ‘column3’)
Ex.
PIVOT1
import pandas as pd
dic={'tutor':['tahira','gurjot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40],\
'country':['usa','uk','japan','usa','brazil']}
df=pd.DataFrame(dic)
print(df)
pt=df.pivot(index='country',columns='tutor',values='classes')
print("\n ==================================\n\n")
print(pt)

output

Program Analysis ,Problem Analysis & Solution


Example1
import pandas as pd
TATA DAV SCHOOL, SIJUA
dic={'invg':['rajesh','naveen','anil','naveen','rajesh'],\
'amt':[550,550,550,550,550]}
df=pd.DataFrame(dic)
print(df)
print(df.pivot(index='invg',columns='amt'))
ERROR
File "C:\Users\mukund\AppData\Roaming\Python\Python36\site-packages\pandas\core\
reshape\reshape.py", line 179, in _make_selectors
raise ValueError("Index contains duplicate entries, cannot reshape")
ValueError: Index contains duplicate entries, cannot reshape
SOLUTION
import pandas as pd
dic={'invg':['rajesh','naveen','anil','naveen','rajesh'],\
'amt':[550,550,550,550,550]}
df=pd.DataFrame(dic)
print(df)
print(df.pivot_table(df,index=['invg'],aggfunc=["sum","max","min","count"]))
OUTPUT

Example2
import pandas as pd
import numpy as np
TATA DAV SCHOOL, SIJUA
dic={'tutor':
['tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anush
a','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40,26,37,44,33,41,27,38,45,39,43,228,336,441,832,540],\ 'country':
['usa','uk','japan','usa','brazil','usa','usa','japan','uk','japan','uk','usa','japan','uk','japan','usa','uk','brazil','us
a','brazil'],\
'quarter':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]}
df=pd.DataFrame(dic)
print(df)
#p=df.pivot(index='tutor',columns='country',values='classes')
#print(p)
pt=df.pivot_table(index=['tutor','country'],values='classes',aggfunc="count")
print(pt)
OUTPUT

classes
tutor
anusha 4
gurjyot 4
jacob 4
tahira 4
venkat 4

You might also like