Eda Unit 2

The document discusses various data manipulation techniques using Pandas library in Python like data indexing and selection, handling missing data, hierarchical indexing, combining datasets, aggregation and grouping. It covers Pandas objects like Series, DataFrame, introducing Pandas indexing techniques like [], loc[], iloc[] and ix[] along with examples.

Uploaded by

60 Vibha Shree.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views65 pages

Eda Unit 2

Uploaded by

60 Vibha Shree.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

UNIT II

EDA USING PYTHON

UNIT II EDA USING PYTHON

Data Manipulation using Pandas – Pandas Objects

– Data Indexing and Selection – Operating on
Data – Handling Missing Data – Hierarchical
Indexing – Combining datasets – Concat, Append,
Merge and Join – Aggregation and grouping –
Pivot Tables – Vectorized String Operations
Installing and Using Pandas
 Once Pandas is installed, you can import it and check the
version:
In[1]: import pandas
pandas.__version__
Out[1]: '0.18.1'
 Just as we generally import NumPy under the alias np, we will
import Pandas under the alias pd:
In[2]: import pandas as p
 For example, to display all the contents of the pandas
namespace, you can type this:
In [3]: pd.<TAB>
 And to display the built-in Pandas documentation, you can use
this:
In [4]: pd?
Introducing Pandas Objects
 Pandas objects can be thought of as enhanced versions of NumPy
structured arrays in which the rows and columns are identified
with labels rather than simple integer indices.
 Pandas provides a host of useful tools, methods, and functionality
on top of the basic data structures, but nearly everything that
follows will require an understanding of what these structures
are.
 Thus, before we go any further, let’s introduce these three
fundamental Pandas data structures: the Series, DataFrame,
and Index.
 We will start our code sessions with the standard NumPy and
Pandas imports:
 In[1]: import numpy as np
import pandas as pd
Introducing Pandas Objects
Series as generalized NumPy array
The essential difference is the presence of the index: while the NumPy array has
an implicitly defined integer index used to access the values, the Pandas Series
has an explicitly defined index associated with the values.
Series as specialized dictionary

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values,

and a Series is a structure that maps typed keys to a set of typed values.
Constructing Series objects
The Pandas DataFrame Object
The Pandas DataFrame Object
DataFrame as specialized dictionary
Indexing and Selecting Data with Pandas
Indexing in Pandas :
Indexing in pandas means simply selecting particular
rows and columns of data from a DataFrame. Indexing
could mean selecting all the rows and some of the
columns, some of the rows and all of the columns, or
some of each of the rows and columns. Indexing can
also be known as Subset Selection.
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
Indexing and Selecting Data with Pandas
 Pandas Indexing using [ ], .loc[], .iloc[ ], .ix[ ]
 There are a lot of ways to pull the elements, rows, and columns
from a DataFrame. There are some indexing method in Pandas
which help in getting an element from a DataFrame. These
indexing methods appear very similar but behave very differently.
Pandas support four types of Multi-axes indexing they are:
 Dataframe.[ ] ; This function also known as indexing operator
 Dataframe.loc[ ] : This function is used for labels.
 Dataframe.iloc[ ] : This function is used for positions or integer
based
 Dataframe.ix[] : This function is used for both label and integer
based
 Collectively, they are called the indexers. These are by far the most
common ways to index data. These are four function which help in
getting the elements, rows, and columns from a DataFrame.
Indexing and Selecting Data with Pandas
Selecting a single columns
In order to select a single column, we
simply put the name of the column in-
between the brackets
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
Indexing and Selecting Data with Pandas
Selecting multiple columns
In order to select multiple columns, we
have to pass a list of columns in an
indexing operator.
 # importing pandas package
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col="Name")
 # retrieving multiple columns by indexing
operator
 first = data[["Age", "College", "Salary"]]
 first
Indexing and Selecting Data with Pandas
 Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and
columns. The df.loc indexer selects data in a different way
than just the indexing operator. It can select subsets of
rows or columns. It can also simultaneously select
subsets of rows and columns.
 Selecting a single row
 In order to select a single row using .loc[], we put a single
row label in a .loc function.
 # importing pandas package
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving row by loc method
 first = data.loc["Avery Bradley"]
 second = data.loc["R.J. Hunter"]
 print(first, "\n\n\n", second)
Indexing and Selecting Data with Pandas
Selecting multiple rows
In order to select multiple rows, we put
all the row labels in a list and pass
that to .loc function.
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col
="Name")
 # retrieving multiple rows by loc method
 first = data.loc[["Avery Bradley", "R.J. Hunter"]]
 print( first)
Indexing and Selecting Data with Pandas
Selecting two rows and three columns
In order to select two rows and three columns, we select a two
rows which we want to select and three columns and put it in
a separate list like this:
 Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]]
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving two rows and three columns by loc method
 first = data.loc[["Avery Bradley", "R.J. Hunter"],
 ["Team", "Number", "Position"]]
 print(first)
Indexing and Selecting Data with Pandas
Selecting all of the rows and some columns
 In order to select all of the rows and some
columns, we use single colon [:] to select all of
rows and list of some columns which we want
to select like this:
 Dataframe.loc[:, ["column1", "column2", "column3"]]
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving all rows and some columns by loc
method
 first = data.loc[:, ["Team", "Number", "Position"]]
 print( first)
Indexing and Selecting Data with Pandas
 Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by
position. In order to do that, we’ll need to specify the
positions of the rows that we want, and the positions of
the columns that we want as well. The df.iloc indexer is
very similar to df.loc but only uses integer locations to
make its selections.
 Selecting a single row
 In order to select a single row using .iloc[], we can pass a
single integer to .iloc[] function.
 import pandas as pd
 # making data frame from csv file
 data = pd.read_csv("nba.csv", index_col ="Name")
 # retrieving rows by iloc method
 row2 = data.iloc[3]
 print(row2)
Indexing and Selecting Data with Pandas
 Indexing a using Dataframe.ix[ ] :

Early in the development of pandas, there existed another indexer, ix. This
indexer was capable of selecting both by label and by integer location. While it
was versatile, it caused lots of confusion because it’s not explicit. Sometimes
integers can also be labels for rows or columns. Thus there were instances
where it was ambiguous. Generally, ix is label based and acts just as
the .loc indexer. However, .ix also supports integer type selections (as in .iloc)
where passed an integer. This only works where the index of the DataFrame is
not integer based .ix will accept any of the inputs of .loc and .iloc.
Hierarchical Indexing
 The index is like an address, that’s how any data point across the data
frame or series can be accessed. Rows and columns both have indexes,
rows indices are called index and for columns, it’s general column
names.
 Hierarchical Indexes
 Hierarchical Indexes are also known as multi-indexing is setting more
than one column name as the index. In this article, we are going to use
homelessness.csv file.
Hierarchical Indexing
 # importing pandas library as alias pd
 import pandas as pd
 # calling the pandas read_csv() function.
 # and storing the result in DataFrame df
 df = pd.read_csv('homelessness.csv')
 print(df.head())
Hierarchical Indexing
Columns in the Dataframe:
# using the pandas columns attribute.
col = df.columns
print(col)
Output:
Index([‘Unnamed: 0’, ‘region’, ‘state’, ‘individuals’,
‘family_members’,
‘state_pop’],
dtype=’object’)
Hierarchical Indexing
 To make the column an index, we use the Set_index() function of pandas. If
we want to make one column an index, we can simply pass the name of the
column as a string in set_index(). If we want to do multi-indexing or
Hierarchical Indexing, we pass the list of column names in the set_index().
 Below Code demonstrates Hierarchical Indexing in pandas:
 # using the pandas set_index() function.
 df_ind3 = df.set_index(['region', 'state', 'individuals'])
 # we can sort the data by using sort_index()
 df_ind3.sort_index()
 print(df_ind3.head(10))
Hierarchical Indexing
 Now the dataframe is using Hierarchical Indexing or multi-indexing.

 Note that here we have made 3 columns as an index (‘region’, ‘state’,

‘individuals’ ). The first index ‘region’ is called level(0) index, which is on

top of the Hierarchy of indexes, next index ‘state’ is level(1) index which
is below the main or level(0) index, and so on. So, the Hierarchy of
indexes is formed that’s why this is called Hierarchical indexing.
 We may sometimes need to make a column as an index, or we want to

convert an index column into the normal column, so there is a pandas

reset_index(inplace = True) function, which makes the index column the
normal column.
Hierarchical Indexing
Selecting Data in a Hierarchical Index or using the Hierarchical
Indexing:For selecting the data from the dataframe using the .loc()
method we have to pass the name of the indexes in a list.
 # selecting the 'Pacific' and 'Mountain'
 # region from the dataframe.
 # selecting data using level(0) index or main index.
 df_ind3_region = df_ind3.loc[['Pacific', 'Mountain']]
 print(df_ind3_region.head(10))
Hierarchical Indexing
 We cannot use only level(1) index for getting data from the dataframe,
if we do so it will give an error. We can only use level (1) index or the
inner indexes with the level(0) or main index with the help list of
tuples.
 # using the inner index 'state' for getting data.
 df_ind3_state = df_ind3.loc[['Alaska', 'California', 'Idaho']]
 print(df_ind3_state.head(10))
Hierarchical Indexing
 Using inner levels indexes with the help of a list of tuples:
 Syntax:
 df.loc[[ ( level( 0 ) , level( 1 ) , level( 2 ) ) ]]Python3
 # selecting data by passing all levels index.
 df_ind3_region_state = df_ind3.loc[[("Pacific", "Alaska", 1434),
 ("Pacific", "Hawaii", 4131),
 ("Mountain", "Arizona", 7259),
 ("Mountain", "Idaho", 1297)]]
 df_ind3_region_state
Combine datasets
 In Pandas forusing Pandas merge(),
a horizontal join(), concat()
combination we haveand append()and join(), whereas for
merge()
vertical combination we can use concat() and append(). Merge and join perform
similar tasks but internally they have some differences, similar to concat and
append.
1.merge() is used for combining data on common columns
or indices.
import pandas as pd
d1 = {‘Id’: [‘A1’, ‘A2’, ‘A3’, ‘A4’,’A5'], ‘Name’:[‘Vivek’, ‘Rahul’,
‘Gaurav’, ‘Ankit’,’Vishakha’], ‘Age’:[27, 24, 22, 32, 28],}
d2 = {‘Id’: [‘A1’, ‘A2’, ‘A3’, ‘A4’], ‘Address’:[‘Delhi’, ‘Gurgaon’,
‘Noida’, ‘Pune’], ‘Qualification’:[‘Btech’, ‘B.A’, ‘Bcom’, ‘B.hons’]}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
Case 1. merging data on common columns ‘Id’
#Inner Join
pd.merge(df1,df2)
pd.merge(df1,df2, how='inner)
Left Join pd.merge(df1,df2,how=’left’)
 #matching and non matching records from left DF which is df1 is present in
result data frame

Right Join pd.merge(df1,df2,how=’right’)

#matching and non matching records from right DF, df2 will come in result df
#outer join pd.merge(df1,df2,how=’outer’)
#all the matching and non matching records are
available in resultant dataset from both data frames
2. join() is used for combining data on a key column
or an index.
import pandas as pd
df1 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K5’, ‘K3’, ‘K4’,
‘K2’], ‘A’: [‘A0’, ‘A1’, ‘A5’, ‘A3’, ‘A4’, ‘A2’]})
df2 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K2’], ‘B’: [‘B0’, ‘B1’,
‘B2’]})
Case 1. join on indexes
By default, pandas join operation is performed on
indexes both data frames have default indexes values,
so no need to specify any join key, join will implicitly
be performed on indexes.
Case 1.nature
 #default joinofon indexes
pandas join is left outer join
df1.join(df2, lsuffix=’_l’, rsuffix=’_r’)

Index values in both data frames are different, in the case

of inner/equi join resultant data set will be empty but data
is present from left DF (df1).
Create two data frames with different index values
df1 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K5’, ‘K3’, ‘K4’, ‘K2’], ‘A’:
[‘A0’, ‘A1’, ‘A5’, ‘A3’, ‘A4’, ‘A2’]}, index=[0,1,2,3,4,5])
df2 = pd.DataFrame({‘key’: [‘K0’, ‘K1’, ‘K2’], ‘B’: [‘B0’, ‘B1’,
‘B2’]},index=[6,7,8])
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’)
#df1 is left DF and df2 is right DF
#inner join
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’,
how=’inner’)

#outer join
df1.join(df2,lsuffix=’_l’,rsuffix=’_r’,
how=’outer’)
Case 2. join on columns
Data frames can be joined on columns as well, but as joins work on
indexes, we need to convert the join key into the index and then
perform join, rest every thin is similar.

df1.set_index(‘key1’).join(df2.set_index(‘key2’))
3. concat() is used for combining Data Frames across
rows or columns.
Case 1. concat data frames on axis=0, default
operation
import pandas as pd
m1 = pd.DataFrame({ ‘Name’: [‘Alex’, ‘Amy’, ‘Allen’, ‘Alice’,
‘Ayoung’], ‘subject_id’ : [ ‘ sub1 ’,’ sub2 ',’ sub4 ',’ sub6',’sub5'],
‘Marks_scored’:[98,90,87,69,78]}, index=[1,2,3,4,5])
m2 = pd.DataFrame({ ‘Name’: [‘Billy’, ‘Brian’, ‘Bran’, ‘Bryce’,
‘Betty’], ‘subject_id’:[‘sub2’,’sub4',’sub3',’sub6',’sub5'],
‘Marks_scored’:[89,80,79,97,88]}, index=[4,5,6,7,8])
pd.concat([m1,m2])
Case 1. concat data frames on axis=0, default operation
pd.concat([m1,m2],ignore_index=True)
Case 2. concat operation on axis=1, horizontal
operation
pd.concat([m1,m2],axis=1)
4. append() combine data frames vertically
fashion
Case 1. appending data frames, duplicate
index issue
m1 = pd.DataFrame({ ‘Name’: [‘Vivek’, ‘Vishakha’, ‘Ash’,
‘Natalie’, ‘Ayoung’], ‘subject_id’ : [ ‘sub1’ ,’ sub2 ',’ sub4 ',’ sub6
',’sub5'], ‘Marks_scored’:[98,90,87,69,78], ‘ Rank ’ :
[1,3,6,20,13]}, index=[1,2,3,4,5])
m2 = pd.DataFrame({ ‘Name’: [‘Barak’, ‘Wayne’, ‘ Saurav ’ ,
‘Yuvraj’, ‘Suresh’], ‘ subject_id ’ : [ ‘ sub2 ’,’ sub4 ',’
sub3',’sub6',’sub5'], ‘Marks_scored’:[89,80,79,97,88],},
index=[1,2,3,4,5])
m1.append(m2)
Case 1. appending data frames, duplicate index issue
m1.append(m2)
Aggregation and grouping
 Grouping and aggregating will help to achieve data analysis easily using
various functions. These methods will help us to the group and
summarize our data and make complex analysis comparatively easy.
Aggregation and grouping

Aggregation and grouping
 Aggregation in Pandas
Aggregation in pandas provides various functions that perform a mathematical or logical
operation on our dataset and returns a summary of that function. Aggregation can be used to get
a summary of columns in our dataset like getting sum, minimum, maximum, etc. from a
particular column of our dataset. The function used for aggregation is agg(), the parameter is the
function we want to perform.
 Some functions used in the aggregation are:
 Function Description:
sum() :Compute sum of column values
min() :Compute min of column values
max() :Compute max of column values
mean() :Compute mean of column
size() :Compute column sizes
describe() :Generates descriptive statistics
first() :Compute first of group values
last() :Compute last of group values
count() :Compute count of column values
std() :Standard deviation of column
var() :Compute variance of column
sem() :Standard error of the mean of column

df.sum()

df.agg(['sum', 'min', 'max'])

Grouping in Pandas
Grouping is used to group data using some criteria from our
dataset. It is used as split-apply-combine strategy.
Splitting the data into groups based on some criteria.
Applying a function to each group independently.
Combining the results into a data structure.
Applying groupby() function to group the data on
“Maths” value. To view result of formed groups use
first() function.
a = df.groupby('Maths')
a.first()
b = df.groupby(['Maths', 'Science'])
b.first()
Vectorized String Operations
Introducing Pandas String Operations
 We saw in previous sections how tools like NumPy and Pandas
generalize arithmetic operations so that we can easily and quickly
perform the same operation on many array elements. For
example:
import numpy as np
x = np.array([2, 3, 5, 7, 11, 13])
x * 2
Output:
array([ 4, 6, 10, 14, 22, 26])
 This vectorization of operations simplifies the syntax of operating
on arrays of data: we no longer have to worry about the size or
shape of the array, but just about what operation we want done.
Eg1:
data = ['peter', 'Paul', 'MARY', 'gUIDO']
[s.capitalize() for s in data]
Output:
['Peter', 'Paul', 'Mary', 'Guido']
Eg2:
import pandas as pd
names = pd.Series(data) names
Output:
0 peter
1 Paul
2 None
3 MARY
4 gUIDO
dtype: object
Tables of Pandas String Methods
If you have a good understanding of string manipulation in
Python, most of Pandas string syntax is intuitive enough
that it's probably sufficient to just list a table of available
methods; we will start with that here, before diving deeper
into a few of the subtleties. The examples in this section
use the following series of names:
monte = pd.Series(['Graham Chapman', 'John Cleese',
'Terry Gilliam', 'Eric Idle', 'Terry Jones', 'Michael Palin'])

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Synopsis For Weather Forecasting System
100% (2)
Synopsis For Weather Forecasting System
4 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Computer Networks Lab Manual Latest
100% (16)
Computer Networks Lab Manual Latest
45 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Sample Explore Coding Through Scratch Worksheets
50% (2)
Sample Explore Coding Through Scratch Worksheets
10 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Iloc and Loc Uses PDF
No ratings yet
Iloc and Loc Uses PDF
16 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Pandas Row/Column Selection Guide
No ratings yet
Pandas Row/Column Selection Guide
7 pages
Pandas 1
No ratings yet
Pandas 1
49 pages
SAP Validation and Substitution in S4
No ratings yet
SAP Validation and Substitution in S4
11 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
No ratings yet
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
22 pages
Pandas
No ratings yet
Pandas
5 pages
IoT & AI in Smart Agriculture Review
No ratings yet
IoT & AI in Smart Agriculture Review
33 pages
Pandas Data Indexing & Selection Guide
No ratings yet
Pandas Data Indexing & Selection Guide
8 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Unit 4
No ratings yet
Unit 4
36 pages
Canva Tips PDF
No ratings yet
Canva Tips PDF
26 pages
SIM Overview Servicenow
No ratings yet
SIM Overview Servicenow
9 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
BCS100 Height Controller UserManual V3.22
No ratings yet
BCS100 Height Controller UserManual V3.22
43 pages
Data Frames
No ratings yet
Data Frames
60 pages
Pandas: DataFrames & Series Guide
No ratings yet
Pandas: DataFrames & Series Guide
2 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Performance Monitoring
No ratings yet
Performance Monitoring
11 pages
Notes - EDA-Unit2
No ratings yet
Notes - EDA-Unit2
43 pages
DisplayLink USB Graphics Software For Windows11.0 M0-Release Notes
No ratings yet
DisplayLink USB Graphics Software For Windows11.0 M0-Release Notes
3 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Modern Web Development: Brought To You in Partnership With
No ratings yet
Modern Web Development: Brought To You in Partnership With
37 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
CS Lab Midterm Instructions
No ratings yet
CS Lab Midterm Instructions
1 page
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
No ratings yet
Asterisk vs. ShoreTel vs. Cisco PBX Comparison
4 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
UM - E-OCD II Debugger Manual - V1.0.2
No ratings yet
UM - E-OCD II Debugger Manual - V1.0.2
92 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
Pandas
No ratings yet
Pandas
63 pages
Lecture 2 - Data Wrangling - Update
No ratings yet
Lecture 2 - Data Wrangling - Update
114 pages
Chế Độ Cắt Gia Công Cơ Khí - Nguyễn Ngọc Đào, 256 Trang
No ratings yet
Chế Độ Cắt Gia Công Cơ Khí - Nguyễn Ngọc Đào, 256 Trang
256 pages
2015 CASA New Orleans Superhero Race Results
No ratings yet
2015 CASA New Orleans Superhero Race Results
10 pages
Cbs 350 Chapter 08
No ratings yet
Cbs 350 Chapter 08
18 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Unit 3 Data Analysis Using Pandas
No ratings yet
Unit 3 Data Analysis Using Pandas
49 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
Digital Communication
No ratings yet
Digital Communication
2 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Propuesta Doble X
No ratings yet
Propuesta Doble X
39 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
8086 Microprocessor Guide
No ratings yet
8086 Microprocessor Guide
26 pages
Pandas
No ratings yet
Pandas
26 pages
26 Must Have AI Tools 1696814944
No ratings yet
26 Must Have AI Tools 1696814944
28 pages
Esko-Interview-experience H C Srihari
No ratings yet
Esko-Interview-experience H C Srihari
7 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
1020241-Ty Diploma Ajp - Chapter 3 Vimp
No ratings yet
1020241-Ty Diploma Ajp - Chapter 3 Vimp
31 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas
No ratings yet
Pandas
7 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
CrewAI Vs LangChain - The Clash of AI Titans in The LLM Arena - by Cogni Down Under - Nov, 2024 - Medium
No ratings yet
CrewAI Vs LangChain - The Clash of AI Titans in The LLM Arena - by Cogni Down Under - Nov, 2024 - Medium
13 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Module 1 Computer Packages Notes Bizziland
No ratings yet
Module 1 Computer Packages Notes Bizziland
36 pages
Inference Engines
No ratings yet
Inference Engines
3 pages
NymiandEvidianGuide RFID V12 20250123
No ratings yet
NymiandEvidianGuide RFID V12 20250123
148 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
8 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Eda U2
No ratings yet
Eda U2
61 pages
SRT Gemini Traslation Documnetation
No ratings yet
SRT Gemini Traslation Documnetation
10 pages
Unit III - Notes
No ratings yet
Unit III - Notes
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Larkiyon Ka School 10
No ratings yet
Larkiyon Ka School 10
22 pages
1 - Indexing in Pandas
No ratings yet
1 - Indexing in Pandas
8 pages
Subject IP
No ratings yet
Subject IP
9 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
4 Pandas
No ratings yet
4 Pandas
35 pages

Eda Unit 2

Uploaded by

Eda Unit 2

Uploaded by

UNIT II

EDA USING PYTHON

Data Manipulation using Pandas – Pandas Objects

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values,

 Note that here we have made 3 columns as an index (‘region’, ‘state’,

‘individuals’ ). The first index ‘region’ is called level(0) index, which is on

convert an index column into the normal column, so there is a pandas

Right Join pd.merge(df1,df2,how=’right’)

Index values in both data frames are different, in the case

df.agg(['sum', 'min', 'max'])

You might also like