Python Pandas New Sylabus

The document provides information about Pandas, a popular Python library for data analysis. It discusses Pandas Series and DataFrames, which are the basic data structures. It provides examples of creating Series and DataFrames from various data types like lists, dictionaries, and NumPy arrays. It also describes selecting subsets of data from DataFrames using row and column names or indices.

Uploaded by

Rohan sushil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views53 pages

Python Pandas New Sylabus

Uploaded by

Rohan sushil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

Python Panda Notes

By- Archana (DPSV)

Pandas is one of the most preferred
used data science libraries.
DataFrame is one such data structure of Pandas.
The chapter will cover pivoting, sorting,
aggregation
descriptive statistics, histograms and quantiles.
Series and DataFrames are the basic data
structures
• Import numpy as np
• Import pandas as pd
• A series is a pandas data structure that
represents a one dimensional array like object
containing an array of data (of any numpy data
type) and an associated array of data labels
called the index.
• A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular
fashion in rows and columns.
• Features of DataFrame
• Potentially columns are of different types
• Size – Mutable
• Labeled axes (rows and columns)
• Can Perform Arithmetic operations on rows
and columns
Series and DataFrames are the basic data structures
• A Dataframe is a pandas data structure that represents a two
dimensional labelled array, which is an ordered collection of
columns where columns may store different types of data. e.g
numeric ,string ,float or boolean.
Major characteristics of dataFrame data structure.
• It has two indices / axes. Row index(axis=0) and column
index(axis=1).
• Conceptually it is like a spreadsheet where each value is
identifiable with the combination of row index and column
index.
• Indices can be numbers or strings or letters.
• It is value mutable. i.e you can change the value.
• You can add or delete the rows/columns in dataFrame.
Pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
The parameters of the constructor are as follows −

Sr.No Parameter & Description

1 data
data takes various forms like ndarray, series, map, lists, dict, constants and
also another DataFrame.
2 index
For the row labels, the Index to be used for the resulting frame is Optional
Default np.arange(n) if no index is passed.
3 columns
For column labels, the optional default syntax is - np.arange(n). This is only
true if no index is passed.
4 dtype
Data type of each column.
5 copy
This command (or whatever it is) is used for copying of data, if the default is
False.
Create DataFrame
A pandas DataFrame can be created using various inputs like −
– Lists
– dict
– Series
– Numpy ndarrays
– Another DataFrame
In the subsequent sections of this chapter, we will see how to create a DataFrame
using these inputs.
• Create an Empty DataFrame
• A basic DataFrame, which can be created is an Empty Dataframe.
• Example
import pandas as pd
df = pd.DataFrame()
print (df)
output is as follows −
Empty DataFrame
Columns: []
Index: []
Create a DataFrame from Lists
• The DataFrame can be created using a single list or a list of lists.
• Example 1
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
Its output is as follows −
0
01
12
23
34
45
• Example 2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
• Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age 0 Alex 10.0 1 Bob 12.0 2 Clarke 13.0
• Example 2
• Let us now create an indexed DataFrame using arrays.
• import pandas as pd
• data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'], 'Age‘ :[28,34,29,42]}
• df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
• output is as follows −
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
• Note − Observe, the index parameter assigns an index to each
row.
Create a DataFrame from List of Dicts
• List of Dictionaries can be passed as input data to create a DataFrame. The
dictionary keys are by default taken as column names.
Example 1
• The following example shows how to create a DataFrame by passing a list
of dictionaries.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print dfIts
output is as follows −
a b c
0 1 2 NaN
1 5 10 20.0

Note − Observe, NaN (Not a Number) is appended in missing areas.

• Example 2
• The following example shows how to create a DataFrame by
passing a list of dictionaries and the row indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
output is as follows −
Example 3
• The following example shows how to create a DataFrame with a list of dictionaries, row indices,
and column indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
#With two column indices with one index with other name df2 = pd.DataFrame(data,
index=['first', 'second'], columns=['a', 'b1'])
print df1
print df2

Its output is as follows −
a b
First 1 2
Second 5 7

print(df2)
a b1
First 1 NaN
Second 5 NaN
Data Series
• Series is a one-dimensional labeled array capable of
holding data of any type (integer, string, float, python
objects, etc.). The axis labels are collectively called index.
• pandas.Series
• A pandas Series can be created using the following
constructor −
pandas.Series( data, index, dtype, copy)
A series can be created using various inputs like −
• Array
• Dict
• Scalar value or constant
The parameters of the constructor are as follows −
Sr.No Parameter & Description
1 data
data takes various forms like ndarray, list, constants
2 index
Index values must be unique and hashable, same length
as data. Default np.arange(n) if no index is passed.
3 dtype
dtype is for data type. If None, data type will be inferred
4 copy
Copy data. Default False

Create an Empty Series

A basic series, which can be created is an Empty Series.
Example
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)
Its output is as follows −
Series([], dtype: float64)
Create a Series from ndarray

If data is an ndarray, then index passed must be of the same length. If no index is passed, then
by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].
Example 1

#import the pandas library and aliasing as pd

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object

We did not pass any index, so by default, it assigned the indexes ranging
from 0 to len(data)-1, i.e., 0 to 3.
#import
Example 2the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s

Its output is as follows −
100 a
101 b
102 c
103 d
dtype: object
We passed the index values here. Now we can see the customized indexed
values in the output.
Create a Series from dict
A dict canthe
#import
be passed
pandasas input and if no index
library and isaliasing
specified, then
as
the dictionary keys are taken in a sorted order to construct index.
pdIf index is passed, the values in data corresponding to the labels
import pandas
in the index as pd
will be pulled out.
import
Example 1numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
Its output is as follows −
a 0.0
b 1.0
c 2.0
dtype: float64
Observe − Dictionary keys are used to construct index.
Create a DataFrame from Dict of Series
Dictionary of Series can be passed to form a
DataFrame. The resultant index is the union of
all the series indexes passed.
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)

print(df)
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Selecting/Accessing a Subset from a Dataframe using Row/Column
Names

• You can use following syntax to select/access a subset

from a dataframe object:
<DataFrameObject>.loc[<startrow>:<endrow>,
<startcolumn>:<endcolumn>]
• To access a row , just give the row name/label as this:
<DF object>.loc[<row label>,:]. Make sure not to miss
the COLON AFTER COMMA.
• To access multiple rows, use:
<DF object>.loc [<start row> : <end row> ,:]. Make sure
not to miss the COLON AFTER COMMA.
Selecting subset from DataFrame using Rows/Columns
data = {'Population':[10927986, 12691836, 4631392,4328063],
'Average_income' :[72167810876544, 85007812691836,
422678431392,5261782328063]}

df = pd.DataFrame(data,index=['Delhi','Mumbai','Kolkata','Chennai'])
print(df)
Population Average_income
Delhi 10927986 72167810876544
Mumbai 12691836 85007812691836
Kolkata 4631392 422678431392
Chennai 4328063 5261782328063
To access multiple rows make sure not to
miss the COLON after COMMA
Continue…
• To access selective columns, use :
<DF object>.loc[ : , <start column> :<end row>,:]
Make sure not to miss the COLON BEFORE
COMMA. Like rows, all columns falling between
start and end columns, will also be listed
• To access range of columns from a range of
rows, use:
<DF object>.loc[<startrow> : <endrow>,
<startcolumnn> : <endcolumn>]
To access selective columns make sure not to
miss the COLON before COMMA
To access range of columns from ranges of rows

Df.loc[<startrows>:<endrow>,<startcolumn>:<endcolumn>]
• import pandas as pd
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b',
'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b',
'c', 'd'])}
• df = pd.DataFrame(d)
• print df.loc['b']
• Its output is as follows −
• one 2.0
• two 2.0
• Name: b,
• dtype: float64
Obtaining subset from DataFrame using Rows/Columns
Numeric index position
Df.iloc[<startrow index>:<endrow index>, <startcolumn index>:<endcolumn index>]
• Selection by integer location
• Rows can be selected by passing integer location to
an iloc function.
• import pandas as pd
• d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' :
pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
• df = pd.DataFrame(d)
• print df.iloc[2]
• Its output is as follows −
• one 3.0
• two 3.0
• Name: c,
• dtype: float64
Selecting/Accessing individual Values

Df.<column>[<row name or row numeric index>]

Deleting columns in DataFrames
Del <df object>[<column name>]
Using the previous DataFrame, we will delete a
column # using del function
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a',
'b', 'c']), 'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd']), 'three' :
pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
# using del function print ("Deleting the first
column using DEL function:")
del df['one']
print df
# using pop function print ("Deleting another
column using POP function:")
df.pop('two')
print df
Iteration over a DataFrame
<df>.iteritems() <df>.iterrows()
iteritems()
Iterates over each column as key, value pair with
label as key and column value as a Series
object.

import pandas as pd
import numpy as np
df =
pd.DataFrame(np.random.randn(4,3),columns=['col1','col2',
'col3'])
for key,value in df.iteritems():
print key,value
iterrows()
iterrows() returns the iterator yielding each index value along with a
series containing the data in each row.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])
for row_index,row in df.iterrows():
print (row_index,row)
Describe() Function
• The describe() function computes a summary of statistics
pertaining to the DataFrame columns.
import pandas as pd
import numpy as np
• #Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve',
'Smith','Jack', 'Lee','David','Gasper','Betina','Andres']),
'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80
,4.10,3.65]) }
• #Create a DataFrame
df = pd.DataFrame(d)
print df.describe()
Aggregation/Descriptive statistics - Dataframe

Data aggregation –
Aggregation is the process of turning the values of a dataset (or a
subset of it) into one single value or data aggregation is a
multivalued function ,which require multiple values and return a
single value as a result.There are number of aggregations possible
like count,sum,min,max,median,quartile etc. These(count,sum etc.)
are descriptive statistics and other related operations on
DataFrame Let us make this clear! If we have a DataFrame like…

…then a simple aggregation method is to calculate the summary of the

Score, which is 87+67+89+55+47= 345. Or a different aggregation method
would be to count the number of Name, which is 5.
Aggregation/Descriptive statistics - dataframe
#e.g. program for data aggregation/descriptive
statistics
import pandas as pd
import
#Create numpyofasseries
a Dictionary
np
d = {'Name':pd.Series(['Sachin','Dhoni','Virat','Rohit','Shikhar']),
 'Age':pd.Series([26,25,25,24,31]),
'Score':pd.Series([87,67,89,55,47])} #Create a DataFrame
df = pd.DataFrame(d) print("Dataframe contents")
print (df)
print(df.count())
print("count age",df[['Age']].count())
print("sum of score",df[['Score']].sum())
print("minimum age",df[['Age']].min())
print("maximum score",df[['Score']].max())
print("mean age",df[['Age']].mean())
print("mode of age",df[['Age']].mode())
print("median of score",df[['Score']].median())
Function info()
Function head() and tail() to retrieve top 5 or bottom 5 rows
Cumulative Calculation Functions
cumsum() function – calculates cumulative sum i.e in the output
of this function, the value of each
Row is replaced by sum of all prior rows including this row. String
value rows use concatenation.
Cumsum,cumprod
Sorting by Values and index
Common attributes of series
• <series name>.index- axis lables of the series
• <series name>.values –returns as ndarray.
• <series name>.dtype – returns the dtype object.
• <series name>.shape –returns the tuples of the shape
• <series name>.nbytes –returns no. of bytes.
• <series name>.ndim – returns no. of dimensions.
• <series name>.size – returns no.of elements.
• <series name>.itemsize –returns the size of the dtype.
• <series name>.hasnans – returns True if any NaN values.
• <series name>.empty –returns True if series is empty.
• head() will return first 5
• head(5)
• tail() will return last 5
• tail(5)
• DF.at( )
• Df.iat( )
• Df.T
• Df.count()-
• len(df)

Pandas
No ratings yet
Pandas
16 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas
No ratings yet
Pandas
8 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
4 pages
Pandas
No ratings yet
Pandas
27 pages
40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
87 pages
Pandas
No ratings yet
Pandas
86 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
Lesson 3 - Python Data Structures
No ratings yet
Lesson 3 - Python Data Structures
38 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
Unit3 Python
No ratings yet
Unit3 Python
11 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
10 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
246 pages
Pandas Pivot Tables Guide
No ratings yet
Pandas Pivot Tables Guide
14 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
PANDAS Cheatsheet
No ratings yet
PANDAS Cheatsheet
4 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Class XII Pandas & SQL Practical List
100% (1)
Class XII Pandas & SQL Practical List
7 pages
Python Pandas for Data Science
No ratings yet
Python Pandas for Data Science
22 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
OOP Cheat Sheet Python
100% (1)
OOP Cheat Sheet Python
3 pages
DBMS Classification Guide
No ratings yet
DBMS Classification Guide
9 pages
Pandas & Matplotlib Cheat Sheet
No ratings yet
Pandas & Matplotlib Cheat Sheet
2 pages
Python - 1 Year - Unit-2
No ratings yet
Python - 1 Year - Unit-2
116 pages
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Django Web Development Guide
No ratings yet
Django Web Development Guide
40 pages
05 NumPy - Arrays and Vectorized Computation
No ratings yet
05 NumPy - Arrays and Vectorized Computation
47 pages
Journal 12
No ratings yet
Journal 12
54 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Python List and Tuple Guide
No ratings yet
Python List and Tuple Guide
49 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
Python Data Visualization Guide
No ratings yet
Python Data Visualization Guide
17 pages
IP TERM-1 Study Material (Session 2021-22)
No ratings yet
IP TERM-1 Study Material (Session 2021-22)
84 pages
Numpy ML - AI
No ratings yet
Numpy ML - AI
135 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
Python Training Techavera
No ratings yet
Python Training Techavera
5 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
03-Python Libraries - Numpy - Matplotlib
No ratings yet
03-Python Libraries - Numpy - Matplotlib
56 pages
Matplotlib Tutorial
No ratings yet
Matplotlib Tutorial
17 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
03-Example - Stackspan PDF
No ratings yet
03-Example - Stackspan PDF
20 pages
Mini Project On C++
No ratings yet
Mini Project On C++
27 pages
Load and Store Instructions
100% (1)
Load and Store Instructions
32 pages
Darshan Institute of Engineering & Technology 140705 - OOP With C++ Computer Engineering Unit - 1 Concepts of OOP
No ratings yet
Darshan Institute of Engineering & Technology 140705 - OOP With C++ Computer Engineering Unit - 1 Concepts of OOP
66 pages
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
100% (1)
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
8 pages
LBP V2.6 Implementation DOC en
No ratings yet
LBP V2.6 Implementation DOC en
166 pages
Expt 8 Endfire
No ratings yet
Expt 8 Endfire
8 pages
Advanced Excel For Finance Professionals
No ratings yet
Advanced Excel For Finance Professionals
9 pages
C Interview Questions and Answers
No ratings yet
C Interview Questions and Answers
14 pages
Array Data Structure Lect-3
No ratings yet
Array Data Structure Lect-3
16 pages
Data Structure MCQs
83% (6)
Data Structure MCQs
40 pages
Navigating and Editing A ClientDataSet
No ratings yet
Navigating and Editing A ClientDataSet
6 pages
1) Aptitude Test: Questions 82 Time
92% (12)
1) Aptitude Test: Questions 82 Time
13 pages
PLC Code Sys
No ratings yet
PLC Code Sys
31 pages
05 - Array
No ratings yet
05 - Array
83 pages
Release Notes
No ratings yet
Release Notes
13 pages
Javappt
No ratings yet
Javappt
38 pages
Task 2-1: Entering and Evaluating An Equation: Symbols Group, Click Operators. The Operators List Opens. Click X
No ratings yet
Task 2-1: Entering and Evaluating An Equation: Symbols Group, Click Operators. The Operators List Opens. Click X
36 pages
The Graphical Kernel System (GKS) : Daduceand Frahopgood
No ratings yet
The Graphical Kernel System (GKS) : Daduceand Frahopgood
14 pages
Guardant User's Manual
No ratings yet
Guardant User's Manual
25 pages
CMDBuild WorkflowManual ENG V240 PDF
No ratings yet
CMDBuild WorkflowManual ENG V240 PDF
70 pages
C Programming Notes
No ratings yet
C Programming Notes
12 pages
Homework 13
No ratings yet
Homework 13
2 pages
Medi-Caps University, Indore: Computer Science
No ratings yet
Medi-Caps University, Indore: Computer Science
8 pages
How To Use Excel SUMIFS and SUMIF With Multiple Criteria
No ratings yet
How To Use Excel SUMIFS and SUMIF With Multiple Criteria
14 pages
Nursery Teacher Training
80% (5)
Nursery Teacher Training
19 pages
Wipro Coding Questions - Set 1
100% (1)
Wipro Coding Questions - Set 1
7 pages
BCA T112C Language Question Bank
No ratings yet
BCA T112C Language Question Bank
5 pages
Lecture 5 Linked Lists in C++
100% (1)
Lecture 5 Linked Lists in C++
40 pages
Modbus Ethernet Library Guide
No ratings yet
Modbus Ethernet Library Guide
20 pages