Pandas Dataframe2

The document provides an overview of various functions and operations that can be applied to a pandas DataFrame, including aggregate functions like max, min, sum, count, mode, mean, median, quantile, variance, standard deviation, cumulative sum, and sorting methods. It also covers pivoting techniques to rearrange data for better analysis. Examples are provided to illustrate the use of these functions with a sample DataFrame.

Uploaded by

manishmcamba2013

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Pandas Dataframe2

Uploaded by

manishmcamba2013

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

TATA DAV SCHOOL, SIJUA

DATAFRAME-2
Applying Function with DataFrame
Aggregate Function/Multi Row Function
max()
It is used to find maximum value from a given set of values or column of a
dataframe.
df.max( )
df[‘colname’].max()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.max()
name vika
age 26
score 85
dtype: object
>>> df['age'].max()
26
>>> df.max(axis=1)
0 85
1 63
2 55
3 74
4 31
5 77
dtype: int64

min()
It is used to find minimum value from a given set of values or column of a
dataframe.
TATA DAV SCHOOL, SIJUA
df.min( )
df[‘colname’].min()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.min()
name ika
age 22
score 31
dtype: object
>>> df['score'].min()
31
sum()
It is used to add all the values from a given set of values or column of a
dataframe.
df.sum( )
df[‘colname’].sum()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sum()
name inaminatinaikavikatika
age 142
score 385
TATA DAV SCHOOL, SIJUA
dtype: object
>>> df['score'].sum()
385
count()
It is used to count all the values from a given set of values or column of a
dataframe.
df.count( )
df[‘colname’].count()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.count()
name 6
age 6
score 6
dtype: int64
>>> df['score'].count()
6
mode()
It is used to calculate the mode or the most repeated value of a given set of
numbers
df.mode( )
df[‘colname’].mode()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
TATA DAV SCHOOL, SIJUA
3 ika 22 74
4 vika 23 31
5 tika 24 77

>>> df['age'].mode()
0 23
1 24
dtype: int64
mean()
It is used to calculate the arithmetic mean /average of a given set of
values/numbers
df.mean( )
df[‘colname’].mean()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.mean()
age 23.666667
score 64.166667
dtype: float64
>>> df['age'].mean()
23.666666666666668
median()
It is used to calculate the median or middle vlaue of a given set of
values/numbers
df.median( )
df[‘colname’].median()
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
TATA DAV SCHOOL, SIJUA
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.median()
age 23.5
score 68.5
dtype: float64
>>> df['age'].median()
23.5
quantile()
It returns the value at the given quantile over requested axis(0/1)
The word quantile is derived from the word quantity. A quantile is where a
sample is divided into equal size sub-groups.
Common Quantiles:
1. The 2 quantiles are called the median
2. The 3 quantiles are called the terciles
3. The 4 quantiles are called the quartiles
4. The 5 quantiles are called the quintiles
5. The 6 quantiles are called the sextiles
6. The 7 quantiles are called the septiles
7. The 8 quantiles are called the octiles
8. The 10 quantiles are called the deciles
9. The 12 quantiles are called the duodeciles
10. The 20 quantiles are called the vigintiles
11. The 100 quantiles are called the percentiles
12.The 1000 quantiles are called the permilles
-->>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
TATA DAV SCHOOL, SIJUA
4 vika 23 31
5 tika 24 77
>>> df.quantile(0.5)
age 23.5
score 68.5
Name: 0.5, dtype: float64
>>> df.quantile([.1,.25,.5,.75])
age score
0.10 22.5 43.00
0.25 23.0 57.00
0.50 23.5 68.50
0.75 24.0 76.25
var()
It returns the variance of given set numbers. It is calculated the average of
squared deviations from the mean.
How to Calculate Variance
1. Find the mean of the data set. Add all data values and divide by the sample
size n.
2. Find the squared difference from the mean for each data value. Subtract the
mean from each data value and square the result.
3. Find the sum of all the squared differences. ...
4. Calculate the variance.

How is squared difference calculated?

Work out the Mean (the simple average of the numbers) Then for each number:
subtract the Mean and square the result (the squared difference). Then work out
the average of those squared differences.

-->>> import pandas as pd

>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.var()
age 1.866667
TATA DAV SCHOOL, SIJUA
score 376.166667
dtype: float64
>>> df['age'].var()
1.8666666666666671
/////////////////////////////////////////
std()
What Is Standard Deviation? ... A standard deviation is a statistic that measures the
dispersion of a dataset relative to its mean.
To calculate the standard deviation of those numbers:
1. Work out the Mean (the simple average of the numbers)
2. Then for each number: subtract the Mean and square the result.
3. Then work out the mean of those squared differences.
4. Take the square root of that and we are done!

Std=sqrt(mean(abs(x-x.mean())2)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.std()
age 1.366260
score 19.395017
dtype: float64
>>> df['age'].std()
1.3662601021279466
cumsum()
It returns the cumulative sum of a given series number/values.
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
TATA DAV SCHOOL, SIJUA
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df['age'].cumsum()
0 26
1 50
2 73
3 95
4 118
5 142
Name: age, dtype: int64
sort_values()
It sort the data of given column either in ascending or in descending order.
df.sort_values(by=column,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_values(by='age')
name age score
3 ika 22 74
2 tina 23 55
4 vika 23 31
1 mina 24 63
5 tika 24 77
0 ina 26 85
>>> df.sort_values(by='age',ascending=False)
name age score
0 ina 26 85
1 mina 24 63
5 tika 24 77
2 tina 23 55
4 vika 23 31
TATA DAV SCHOOL, SIJUA
3 ika 22 74
sort_index()
It sort or arrange the value based upon index
df.sort_values(by=None,axis=0/1,ascending=True,inplace=True)
>>> import pandas as pd
>>>dic={'name':['ina','mina','tina','ika','vika','tika'],'age':[26,24,23,22,23,24],'score':
[85,63,55,74,31,77]}
>>> df=pd.DataFrame(dic)
>>> df
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index()
name age score
0 ina 26 85
1 mina 24 63
2 tina 23 55
3 ika 22 74
4 vika 23 31
5 tika 24 77
>>> df.sort_index(ascending=False)
name age score
5 tika 24 77
4 vika 23 31
3 ika 22 74
2 tina 23 55
1 mina 24 63
0 ina 26 85

PANDAS ADVANCE OPERATION ON DATAFRAMES

PIVOTING
Pivoting techniques re-arranges the data from rows and columns by possibly aggregating data so that
data can be viewed in a different perspectives.
TATA DAV SCHOOL, SIJUA
It summaries the extensive data
It rotates the pivot data by transforming rows into columns

1. pivot( )
This method creates a new dataframe after reshaping the data based on columns values.

Syntex
Df.pivot(index= ‘column1’ , columns= ‘ column2’ , values= ‘column3’)
Ex.
PIVOT1
import pandas as pd
dic={'tutor':['tahira','gurjot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40],\
'country':['usa','uk','japan','usa','brazil']}
df=pd.DataFrame(dic)
print(df)
pt=df.pivot(index='country',columns='tutor',values='classes')
print("\n ==================================\n\n")
print(pt)

output

Program Analysis ,Problem Analysis & Solution

Example1
import pandas as pd
TATA DAV SCHOOL, SIJUA
dic={'invg':['rajesh','naveen','anil','naveen','rajesh'],\
'amt':[550,550,550,550,550]}
df=pd.DataFrame(dic)
print(df)
print(df.pivot(index='invg',columns='amt'))
ERROR
File "C:\Users\mukund\AppData\Roaming\Python\Python36\site-packages\pandas\core\
reshape\reshape.py", line 179, in _make_selectors
raise ValueError("Index contains duplicate entries, cannot reshape")
ValueError: Index contains duplicate entries, cannot reshape
SOLUTION
import pandas as pd
dic={'invg':['rajesh','naveen','anil','naveen','rajesh'],\
'amt':[550,550,550,550,550]}
df=pd.DataFrame(dic)
print(df)
print(df.pivot_table(df,index=['invg'],aggfunc=["sum","max","min","count"]))
OUTPUT

Example2
import pandas as pd
import numpy as np
TATA DAV SCHOOL, SIJUA
dic={'tutor':
['tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat','tahira','gurjyot','anush
a','jacob','venkat','tahira','gurjyot','anusha','jacob','venkat'],\
'classes':[28,36,41,32,40,26,37,44,33,41,27,38,45,39,43,228,336,441,832,540],\ 'country':
['usa','uk','japan','usa','brazil','usa','usa','japan','uk','japan','uk','usa','japan','uk','japan','usa','uk','brazil','us
a','brazil'],\
'quarter':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4]}
df=pd.DataFrame(dic)
print(df)
#p=df.pivot(index='tutor',columns='country',values='classes')
#print(p)
pt=df.pivot_table(index=['tutor','country'],values='classes',aggfunc="count")
print(pt)
OUTPUT

classes
tutor
anusha 4
gurjyot 4
jacob 4
tahira 4
venkat 4

Absenteeism_module
No ratings yet
Absenteeism_module
2 pages
Creating A Series Using Scalar Values
No ratings yet
Creating A Series Using Scalar Values
15 pages
Xii Ip Practical File 24-25
No ratings yet
Xii Ip Practical File 24-25
111 pages
PDF&Rendition=1
No ratings yet
PDF&Rendition=1
47 pages
Journal 12
No ratings yet
Journal 12
54 pages
EXP-3
No ratings yet
EXP-3
10 pages
Seaborn Besant
No ratings yet
Seaborn Besant
27 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
1 (1)
No ratings yet
1 (1)
83 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
PDA_Assignment
No ratings yet
PDA_Assignment
6 pages
R Commands
No ratings yet
R Commands
18 pages
Minicurso R PDF
No ratings yet
Minicurso R PDF
100 pages
Cs2258 Database Management Systems Lab
No ratings yet
Cs2258 Database Management Systems Lab
12 pages
ML LAB Manual
No ratings yet
ML LAB Manual
28 pages
ML_EX2
No ratings yet
ML_EX2
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
609008987-EDA-Lab-Manual
No ratings yet
609008987-EDA-Lab-Manual
93 pages
Python Programs
No ratings yet
Python Programs
25 pages
Implementing KNN Algorithm on the Iris Dataset
No ratings yet
Implementing KNN Algorithm on the Iris Dataset
7 pages
Data Visualization Manual
No ratings yet
Data Visualization Manual
33 pages
06 Seaborn
No ratings yet
06 Seaborn
13 pages
dv mid internal 1
No ratings yet
dv mid internal 1
8 pages
Clothes Size Prediction with KNN (1)
No ratings yet
Clothes Size Prediction with KNN (1)
11 pages
Sulvirah-Rahmi UTS KomstatS2
No ratings yet
Sulvirah-Rahmi UTS KomstatS2
14 pages
DP prog
No ratings yet
DP prog
10 pages
Grouping 2
No ratings yet
Grouping 2
4 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Ds Paper
No ratings yet
Ds Paper
35 pages
Iteration
No ratings yet
Iteration
40 pages
Fiverr
No ratings yet
Fiverr
11 pages
R Console
No ratings yet
R Console
6 pages
Data Preprocessing Python Tome III
No ratings yet
Data Preprocessing Python Tome III
12 pages
List of Practical Ip065 Xii Session 2025 Ckc Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 Ckc Academy
19 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
DOC-20241108-WA0003
No ratings yet
DOC-20241108-WA0003
16 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
Chapter1.2 PythonPandas2
No ratings yet
Chapter1.2 PythonPandas2
38 pages
Class02 - Copy
No ratings yet
Class02 - Copy
8 pages
Assignments IP Class 12
No ratings yet
Assignments IP Class 12
9 pages
Working With Panda
No ratings yet
Working With Panda
13 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Week1 Code Corrected
No ratings yet
Week1 Code Corrected
2 pages
IP - Pandas 1 & 2 (Worksheet) Class 12
No ratings yet
IP - Pandas 1 & 2 (Worksheet) Class 12
16 pages
Dataframe Extended-Ii
No ratings yet
Dataframe Extended-Ii
19 pages
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
No ratings yet
LIST OF PRACTICAL IP065 XII SESSION 2025 CKC ACADEMY
19 pages
IP - Xii - Lab Programs
No ratings yet
IP - Xii - Lab Programs
18 pages
Week2 lab
No ratings yet
Week2 lab
8 pages
Saish IP Project
No ratings yet
Saish IP Project
16 pages
session-1 DataFrame
No ratings yet
session-1 DataFrame
13 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
Experiment 3
No ratings yet
Experiment 3
4 pages
Pandas Dataframe1
No ratings yet
Pandas Dataframe1
43 pages
Week 3 GGG
No ratings yet
Week 3 GGG
17 pages
TranMinhTu1 bt2 2
No ratings yet
TranMinhTu1 bt2 2
5 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Annexure-129 Updated BBA (FIA) Revised Syllabus 2019 (Final)
No ratings yet
Annexure-129 Updated BBA (FIA) Revised Syllabus 2019 (Final)
120 pages
CH Heat Transfer
No ratings yet
CH Heat Transfer
18 pages
1.refactoring: Refactoring Is A Technique Where You Make A Small Change To Your
No ratings yet
1.refactoring: Refactoring Is A Technique Where You Make A Small Change To Your
4 pages
Basic Well Logging - CHAPTER 2
No ratings yet
Basic Well Logging - CHAPTER 2
57 pages
SA 800 Chapter8 Amendments CA Final Audit Manish M. Valechha
No ratings yet
SA 800 Chapter8 Amendments CA Final Audit Manish M. Valechha
10 pages
Lecture 3 - Pearl Millet
No ratings yet
Lecture 3 - Pearl Millet
40 pages
Applied Microsoft SQL Server 2008 Reporting Services PDF
No ratings yet
Applied Microsoft SQL Server 2008 Reporting Services PDF
770 pages
CY3201
No ratings yet
CY3201
1 page
Engineering Guide To Modern Fuel Systems
100% (2)
Engineering Guide To Modern Fuel Systems
28 pages
12 Object-Is-To-Determine-The-Reduced-Level-Of-Existing-Road-Profile-Levelling
No ratings yet
12 Object-Is-To-Determine-The-Reduced-Level-Of-Existing-Road-Profile-Levelling
4 pages
Notes
No ratings yet
Notes
3 pages
Acticide RS: ® Product Information
No ratings yet
Acticide RS: ® Product Information
2 pages
Action Research
0% (2)
Action Research
11 pages
Aamir Resume (MCITP 2008)
No ratings yet
Aamir Resume (MCITP 2008)
2 pages
English 10 Long Test 2nd Quarter
No ratings yet
English 10 Long Test 2nd Quarter
3 pages
Field Day Permission Slip
No ratings yet
Field Day Permission Slip
4 pages
NBG, Nbge, NKG, Nkge: Single-Stage End-Suction Pumps According To ISO 2858 50 HZ
No ratings yet
NBG, Nbge, NKG, Nkge: Single-Stage End-Suction Pumps According To ISO 2858 50 HZ
272 pages
Johd Research Paper Template
No ratings yet
Johd Research Paper Template
3 pages
010 MILCO Presentation V2 - 4 GB
No ratings yet
010 MILCO Presentation V2 - 4 GB
36 pages
Geriatric Phlebotomy
100% (1)
Geriatric Phlebotomy
4 pages
Evergreen Workbook Answers of With The Photographer Treasure Chest A Collection of Short Stories - Shout To Learn - The Ori
100% (1)
Evergreen Workbook Answers of With The Photographer Treasure Chest A Collection of Short Stories - Shout To Learn - The Ori
1 page
Topic 6-DCC40163 Theory of Structure
No ratings yet
Topic 6-DCC40163 Theory of Structure
29 pages
20 DVT
No ratings yet
20 DVT
15 pages
IECE and ATEX The Bees Knees
No ratings yet
IECE and ATEX The Bees Knees
2 pages
Enable Tax To Calculate On Invoices Originating in Oracle R12 Projects
No ratings yet
Enable Tax To Calculate On Invoices Originating in Oracle R12 Projects
23 pages
Hassan, Steven The Strategic Interaction Approach
100% (1)
Hassan, Steven The Strategic Interaction Approach
12 pages
Roll Stickers: File Format Colours
No ratings yet
Roll Stickers: File Format Colours
7 pages
Elementary Problems and Solutions Edited
No ratings yet
Elementary Problems and Solutions Edited
8 pages
Pay Slip Report With Leave
100% (1)
Pay Slip Report With Leave
1 page
State of Downtown 2020 Report
No ratings yet
State of Downtown 2020 Report
15 pages