Pandas in Python
Pandas is an open-source library that is made mainly for working with relational or labeled
data both easily and intuitively.
It provides various data structures and operations for manipulating numerical data and time
series. This library is built on the top of the NumPy library.
Pandas is fast and it has high-performance & productivity for users.
Install and import
Pandas is an easy package to install. Open up your terminal program (for Mac users) or
command line (for PC users) and install it using either of the following commands:
conda install pandas
OR
pip install pandas
Alternatively, if you're currently viewing this article in a Jupyter notebook you can run this
cell:
!pip install pandas
The ! at the beginning runs cells as if they were in a terminal.
To import pandas we usually import it with a shorter name since it's used so much:
import pandas as pd
Core components of pandas: Series and DataFrames
The primary two components of pandas are the Series and DataFrame.
A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of
a collection of Series.
Creating a Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.). The axis labels are collectively called index.
Pandas Series is nothing but a column in an excel sheet.
Creating a series from array:
In order to create a series from array, we have to import a numpy module and have to use
array() function.
In [1]: # to use panda
import pandas as pd
# to use numpy
import numpy as np
# simple array
data = [Link](['n','i','e','l','i','t'])
ser = [Link](data)
print(ser)
0 n
1 i
2 e
3 l
4 i
5 t
dtype: object
Creating a series from Lists:
In order to create a series from list, we have to first create a list after that we can create a
series from list.
In [2]: import pandas as pd
# a simple list
list = ['n','i','e','l','i','t']
# create series form a list
ser = [Link](list)
print(ser)
0 n
1 i
2 e
3 l
4 i
5 t
dtype: object
Creating a series from Dictionary:
In order to create a series from dictionary, we have to first create a dictionary after that we
can make a series using dictionary. Dictionary key are used to construct a index.
In [3]: import pandas as pd
# a simple dictionary
dict = {"a)" : 'n', "b)" : 'i', "c)" : 'e', 'd)' : 'l', 'e)' : 'i', 'f)' :
# create series from dictionary
ser = [Link](dict)
print(ser)
a) n
b) i
c) e
d) l
e) i
f) t
dtype: object
Creating a series from array with index :
In order to create a series from array with index, we have to provide index with same
number of element as it is in array.
In [4]: import pandas as pd # import pandas as pd
import numpy as np # import numpy as np
data1 = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array
# providing an index
ser = [Link](data1, index =[10, 11, 12, 13, 14, 15])
print(ser)
print("The data at 13th index is", ser[13]) # accessing a data by i
10 n
11 i
12 e
13 l
14 i
15 t
dtype: object
The data at 13th index is l
In [5]: #Combining two series
import pandas as pd # import pandas as pd
import numpy as np # import numpy as np
data1 = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array
data2=["j", "b", "c"]
#data1= [Link](data1,['a', 'b'])
# providing an index
ser1 = [Link](data1, index =[10, 11, 12, 13, 14, 15])
ser2 = [Link](data2, index =[16,17,18])
ser=[Link](ser2)
print(ser)
10 n
11 i
12 e
13 l
14 i
15 t
16 j
17 b
18 c
dtype: object
C:\Users\acer\AppData\Local\Temp\ipykernel_7352\[Link]: FutureWa
rning: The [Link] method is deprecated and will be removed from pan
das in a future version. Use [Link] instead.
ser=[Link](ser2)
In [6]: #droping element from a series
import pandas as pd # import pandas as pd
import numpy as np # import numpy as np
data = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array
ser=[Link](data)
print(ser)
ser1=[Link](index=[3,4])
print(ser1)
ser2=[Link](1,4) # truncate(before=1,after=4)
print(ser2)
0 n
1 i
2 e
3 l
4 i
5 t
dtype: object
0 n
1 i
2 e
5 t
dtype: object
1 i
2 e
3 l
4 i
dtype: object
DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns.
Pandas DataFrame consists of three principal components, the data, rows, and columns.
Create Pandas Dataframe
1. Creating DataFrame from dict of narray/lists
2. Creating Pandas DataFrame from lists of lists.
3. Creates a indexes DataFrame using arrays.
4. Creating Dataframe from list of dicts
5. Creating DataFrame using zip() function.
6. Creating DataFrame from Dicts of series.
In [7]: # DataFrame from dict narray / lists
# By default addresses.
import pandas as pd
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
# Create DataFrame
df = [Link](data)
# Print the output.
(df)
Out[7]:
Name Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18
In [8]: # Import pandas library
import pandas as pd
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = [Link](data, columns = ['Name', 'Age'])
# print dataframe.
df
Out[8]:
Name Age
0 tom 10
1 nick 15
2 juli 14
In [9]: # pandas DataFrame with indexed by
# DataFrame using arrays.
import pandas as pd
# initialise data of lists.
data = {'Name':['Tom', 'Jack', 'nick', 'juli'], 'marks':[99, 98, 95, 90]}
# Creates pandas DataFrame.
df = [Link](data, index =['rank1', 'rank2', 'rank3', 'rank4'])
# print the data
df
Out[9]:
Name marks
rank1 Tom 99
rank2 Jack 98
rank3 nick 95
rank4 juli 90
In [10]: # Pandas DataFrame by lists of dicts.
import pandas as pd
# Initialise data to lists.
data = [{'a': 1, 'b': 2, 'c':3}, {'a':10, 'b': 20, 'c': 30}]
# Creates DataFrame.
df = [Link](data)
# Print the data
df
Out[10]:
a b c
0 1 2 3
1 10 20 30
In [ ]: # pandas Datadaframe from lists using zip.
import pandas as pd
# List1
Name = ['tom', 'krish', 'nick', 'juli']
# List2
Age = [25, 30, 26, 22]
# get the list of tuples from two lists.
# and merge them by using zip().
list_of_tuples = list(zip([Name, Age]))
# Assign data to tuples.
list_of_tuples
# Converting lists of tuples into
# pandas Dataframe.
df = [Link](list_of_tuples, columns = ['Name', 'Age'])
# Print data.
df
In [13]: # Pandas Dataframe from Dicts of series.
import pandas as pd
# Intialise data to Dicts of series.
d = {'one' : [Link]([10, 20, 30, 40], index =['a', 'b', 'c', 'd']),
'two' : [Link]([10, 20, 30, 40], index =['a', 'b', 'c', 'd'])}
# creates Dataframe.
df = [Link](d)
# print the data.
df
Out[13]:
one two
a 10 10
b 20 20
c 30 30
d 40 40
Column Selection: In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.
Row Selection: Pandas provide a unique method to retrieve rows from a Data frame.
[Link][] method is used to retrieve rows from Pandas DataFrame.
In [14]: # Import pandas package
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = [Link](data)
(df)
print(df[["Name","Age"]]) #two columns selected
print(df[[Link][0:2]])
[Link]()
print(df[["Age"]])
Name Age
0 Jai 27
1 Princi 24
2 Gaurav 22
3 Anuj 32
Name Age
0 Jai 27
1 Princi 24
2 Gaurav 22
3 Anuj 32
Age
0 27
1 24
2 22
3 32
In [15]: # select two rows
first = [Link][1] #loc-->location
second = [Link][3]
print(first, "\n \n" ,second,"\n")
tb=[Link][1:3]
print(tb,"\n")
Name Princi
Age 24
Address Kanpur
Qualification MA
Name: 1, dtype: object
Name Anuj
Age 32
Address Kannauj
Qualification Phd
Name: 3, dtype: object
Name Age Address Qualification
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
Panda read and write from csv file
Reading:
The read_csv() method returns a Pandas DataFrame that contains the data of the CSV file
writting:
Create a Pandas DataFrame first, then use to_csv() to write DataFrame to the CSV file.
Panda read and write from excel file
install this module:
pip install xlwt openpyxl xlsxwriter xlrd
Write an Excel File: Once you have those packages installed, you can save your
DataFrame in an Excel file with .to_excel():
Read an Excel File You can load data from Excel files with read_excel():
In [ ]: # importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.read_csv("[Link]")
df
[Link](["Name", "City"])
In [ ]: # Import pandas package
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = [Link](data)
df.to_csv('write_siru.csv', index=False)
#df.to_csv('write_demo2.csv')
# saving the dataframe
df.to_csv(r'D:\[Link]', index=False)
Accessing data from DataFrames
In [ ]: # importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.read_csv("[Link]")
print(df)
In [ ]: print([Link][2])
#or
print([Link][2,"City"])
#or
print([Link][2,2])
In [ ]: # importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.read_csv("[Link]")
print(df,"\n")
[Link]([0,4],inplace = True) # deleting row data)
# display
print(df)
In [ ]: # importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.read_csv("[Link]")
print(df,"\n")
[Link](["Age"],axis=1, inplace = True) #deleting column data
# display
print(df)
Joining two DataFrame
In [ ]: # importing pandas as pd
import pandas as pd
df1 = [Link]({"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2
df2 = [Link]({"Low_Tier_HPI":[50,45,67,34],"Unemployment":[1,3,5,6]},
print(df1,"\n")
print(df2,"\n")
joined_df= [Link](df2)
print(joined_df)
In [ ]: import pandas as pd
import numpy as np
a = [Link](['Java', 'C', 'C++', [Link]])
b=[Link]({[Link]:"Python"})
print(b)
In [ ]: