0% found this document useful (0 votes)
56 views12 pages

Introduction to Pandas Library

Pandas is an open-source Python library designed for easy manipulation of relational or labeled data, built on top of NumPy. It provides data structures like Series and DataFrames for handling numerical data and time series, and includes functionalities for data creation, selection, and file I/O operations. Users can install Pandas via pip or conda and utilize its features for efficient data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views12 pages

Introduction to Pandas Library

Pandas is an open-source Python library designed for easy manipulation of relational or labeled data, built on top of NumPy. It provides data structures like Series and DataFrames for handling numerical data and time series, and includes functionalities for data creation, selection, and file I/O operations. Users can install Pandas via pip or conda and utilize its features for efficient data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pandas in Python

Pandas is an open-source library that is made mainly for working with relational or labeled
data both easily and intuitively.

It provides various data structures and operations for manipulating numerical data and time
series. This library is built on the top of the NumPy library.

Pandas is fast and it has high-performance & productivity for users.

Install and import


Pandas is an easy package to install. Open up your terminal program (for Mac users) or
command line (for PC users) and install it using either of the following commands:

conda install pandas

OR

pip install pandas

Alternatively, if you're currently viewing this article in a Jupyter notebook you can run this
cell:

!pip install pandas

The ! at the beginning runs cells as if they were in a terminal.

To import pandas we usually import it with a shorter name since it's used so much:

import pandas as pd

Core components of pandas: Series and DataFrames


The primary two components of pandas are the Series and DataFrame.

A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of


a collection of Series.
Creating a Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data of any type
(integer, string, float, python objects, etc.). The axis labels are collectively called index.
Pandas Series is nothing but a column in an excel sheet.

Creating a series from array:


In order to create a series from array, we have to import a numpy module and have to use
array() function.

In [1]: # to use panda


import pandas as pd

# to use numpy
import numpy as np

# simple array
data = [Link](['n','i','e','l','i','t'])
ser = [Link](data)
print(ser)

0 n
1 i
2 e
3 l
4 i
5 t
dtype: object

Creating a series from Lists:


In order to create a series from list, we have to first create a list after that we can create a
series from list.

In [2]: import pandas as pd

# a simple list
list = ['n','i','e','l','i','t']

# create series form a list


ser = [Link](list)
print(ser)

0 n
1 i
2 e
3 l
4 i
5 t
dtype: object
Creating a series from Dictionary:
In order to create a series from dictionary, we have to first create a dictionary after that we
can make a series using dictionary. Dictionary key are used to construct a index.

In [3]: import pandas as pd

# a simple dictionary
dict = {"a)" : 'n', "b)" : 'i', "c)" : 'e', 'd)' : 'l', 'e)' : 'i', 'f)' :

# create series from dictionary


ser = [Link](dict)

print(ser)

a) n
b) i
c) e
d) l
e) i
f) t
dtype: object

Creating a series from array with index :


In order to create a series from array with index, we have to provide index with same
number of element as it is in array.

In [4]: import pandas as pd # import pandas as pd


import numpy as np # import numpy as np

data1 = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array

# providing an index
ser = [Link](data1, index =[10, 11, 12, 13, 14, 15])
print(ser)

print("The data at 13th index is", ser[13]) # accessing a data by i

10 n
11 i
12 e
13 l
14 i
15 t
dtype: object
The data at 13th index is l
In [5]: #Combining two series
import pandas as pd # import pandas as pd
import numpy as np # import numpy as np

data1 = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array
data2=["j", "b", "c"]
#data1= [Link](data1,['a', 'b'])
# providing an index
ser1 = [Link](data1, index =[10, 11, 12, 13, 14, 15])
ser2 = [Link](data2, index =[16,17,18])
ser=[Link](ser2)
print(ser)

10 n
11 i
12 e
13 l
14 i
15 t
16 j
17 b
18 c
dtype: object

C:\Users\acer\AppData\Local\Temp\ipykernel_7352\[Link]: FutureWa
rning: The [Link] method is deprecated and will be removed from pan
das in a future version. Use [Link] instead.
ser=[Link](ser2)

In [6]: #droping element from a series


import pandas as pd # import pandas as pd
import numpy as np # import numpy as np

data = [Link](['n', 'i', 'e', 'l', 'i', 't']) # simple array
ser=[Link](data)
print(ser)

ser1=[Link](index=[3,4])
print(ser1)

ser2=[Link](1,4) # truncate(before=1,after=4)
print(ser2)

0 n
1 i
2 e
3 l
4 i
5 t
dtype: object
0 n
1 i
2 e
5 t
dtype: object
1 i
2 e
3 l
4 i
dtype: object
DataFrame
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in
rows and columns.

Pandas DataFrame consists of three principal components, the data, rows, and columns.

Create Pandas Dataframe


1. Creating DataFrame from dict of narray/lists
2. Creating Pandas DataFrame from lists of lists.
3. Creates a indexes DataFrame using arrays.
4. Creating Dataframe from list of dicts
5. Creating DataFrame using zip() function.
6. Creating DataFrame from Dicts of series.
In [7]: # DataFrame from dict narray / lists
# By default addresses.

import pandas as pd

# intialise data of lists.


data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

# Create DataFrame
df = [Link](data)

# Print the output.


(df)

Out[7]:
Name Age

0 Tom 20

1 nick 21

2 krish 19

3 jack 18

In [8]: # Import pandas library


import pandas as pd

# initialize list of lists


data = [['tom', 10], ['nick', 15], ['juli', 14]]

# Create the pandas DataFrame


df = [Link](data, columns = ['Name', 'Age'])

# print dataframe.
df

Out[8]:
Name Age

0 tom 10

1 nick 15

2 juli 14
In [9]: # pandas DataFrame with indexed by

# DataFrame using arrays.


import pandas as pd

# initialise data of lists.


data = {'Name':['Tom', 'Jack', 'nick', 'juli'], 'marks':[99, 98, 95, 90]}

# Creates pandas DataFrame.


df = [Link](data, index =['rank1', 'rank2', 'rank3', 'rank4'])

# print the data


df

Out[9]:
Name marks

rank1 Tom 99

rank2 Jack 98

rank3 nick 95

rank4 juli 90

In [10]: # Pandas DataFrame by lists of dicts.


import pandas as pd

# Initialise data to lists.


data = [{'a': 1, 'b': 2, 'c':3}, {'a':10, 'b': 20, 'c': 30}]

# Creates DataFrame.
df = [Link](data)

# Print the data


df

Out[10]:
a b c

0 1 2 3

1 10 20 30
In [ ]: # pandas Datadaframe from lists using zip.

import pandas as pd

# List1
Name = ['tom', 'krish', 'nick', 'juli']

# List2
Age = [25, 30, 26, 22]

# get the list of tuples from two lists.


# and merge them by using zip().
list_of_tuples = list(zip([Name, Age]))

# Assign data to tuples.


list_of_tuples

# Converting lists of tuples into


# pandas Dataframe.
df = [Link](list_of_tuples, columns = ['Name', 'Age'])

# Print data.
df

In [13]: # Pandas Dataframe from Dicts of series.

import pandas as pd

# Intialise data to Dicts of series.


d = {'one' : [Link]([10, 20, 30, 40], index =['a', 'b', 'c', 'd']),
'two' : [Link]([10, 20, 30, 40], index =['a', 'b', 'c', 'd'])}

# creates Dataframe.
df = [Link](d)

# print the data.


df

Out[13]:
one two

a 10 10

b 20 20

c 30 30

d 40 40

Column Selection: In Order to select a column in Pandas DataFrame, we can either access
the columns by calling them by their columns name.

Row Selection: Pandas provide a unique method to retrieve rows from a Data frame.
[Link][] method is used to retrieve rows from Pandas DataFrame.
In [14]: # Import pandas package
import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame


df = [Link](data)
(df)
print(df[["Name","Age"]]) #two columns selected
print(df[[Link][0:2]])
[Link]()
print(df[["Age"]])

Name Age
0 Jai 27
1 Princi 24
2 Gaurav 22
3 Anuj 32
Name Age
0 Jai 27
1 Princi 24
2 Gaurav 22
3 Anuj 32
Age
0 27
1 24
2 22
3 32

In [15]: # select two rows


first = [Link][1] #loc-->location
second = [Link][3]
print(first, "\n \n" ,second,"\n")

tb=[Link][1:3]
print(tb,"\n")

Name Princi
Age 24
Address Kanpur
Qualification MA
Name: 1, dtype: object

Name Anuj
Age 32
Address Kannauj
Qualification Phd
Name: 3, dtype: object

Name Age Address Qualification


1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannauj Phd
Panda read and write from csv file
Reading:

The read_csv() method returns a Pandas DataFrame that contains the data of the CSV file

writting:

Create a Pandas DataFrame first, then use to_csv() to write DataFrame to the CSV file.

Panda read and write from excel file


install this module:

pip install xlwt openpyxl xlsxwriter xlrd

Write an Excel File: Once you have those packages installed, you can save your
DataFrame in an Excel file with .to_excel():

Read an Excel File You can load data from Excel files with read_excel():

In [ ]: # importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.read_csv("[Link]")

df
[Link](["Name", "City"])

In [ ]: # Import pandas package


import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame


df = [Link](data)
df.to_csv('write_siru.csv', index=False)

#df.to_csv('write_demo2.csv')
# saving the dataframe
df.to_csv(r'D:\[Link]', index=False)
Accessing data from DataFrames

In [ ]: # importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.read_csv("[Link]")
print(df)

In [ ]: print([Link][2])
#or
print([Link][2,"City"])
#or
print([Link][2,2])

In [ ]: # importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.read_csv("[Link]")
print(df,"\n")
[Link]([0,4],inplace = True) # deleting row data)

# display
print(df)

In [ ]: # importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.read_csv("[Link]")
print(df,"\n")

[Link](["Age"],axis=1, inplace = True) #deleting column data

# display
print(df)

Joining two DataFrame

In [ ]: # importing pandas as pd
import pandas as pd

df1 = [Link]({"Int_Rate":[2,1,2,3], "IND_GDP":[50,45,45,67]}, index=[2

df2 = [Link]({"Low_Tier_HPI":[50,45,67,34],"Unemployment":[1,3,5,6]},

print(df1,"\n")
print(df2,"\n")
joined_df= [Link](df2)
print(joined_df)
In [ ]: import pandas as pd
import numpy as np
a = [Link](['Java', 'C', 'C++', [Link]])
b=[Link]({[Link]:"Python"})
print(b)

In [ ]: ​

You might also like