Ln. 1 - Data handling using Pandas - Series & Dataframe
Ln. 1 - Data handling using Pandas - Series & Dataframe
Pandas-
• Pandas is a high-performance open-source library for data analysis in Python developed by
Wes McKinney in 2008.
• The term ‘Pandas’ is derived from ‘Panel data system’, which is a term used for
multidimensional, structured data set.
• Pandas is built on top of two core Python libraries—matplotlib for data visualization and
NumPy (Numerical Python) for mathematical operations.
• It is a most famous Python package for data science, which offers powerful and flexible data
structures that make data analysis and manipulation easy.
Numpy vs Pandas-
Pandas Numpy
Pandas Datatypes
Series
• The Series is the primary building block of Pandas.
• It is a one-dimensional labelled array capable of holding data of any type (integer, string,
float etc ) with homogeneous data.
• The data values are mutable (can be changed) but the size of Series data is immutable.
• It contains a sequence of values and an associated position of data labels called its index.
• If we add different data types, then all of the data will get upcasted to the same
dtype=object.
• We can imagine a Pandas Series as a column in a spreadsheet.
Creation of Series
• A Series in Pandas can be created using the ‘Series’ method.
• It can be created using various input data like − Array , Dict , Scalar value or constant , List
• Syntax-
import pandas as pd
pandas.Series( data, index, dtype, copy)
• import statement is used to load Pandas module into memory and can be used to work with.
• pd is an alternate name given to the Pandas module. Its significance is that we can use ‘pd’
instead of typing Pandas every time we need to use it.
Note –
• Series () displays an empty list along with its default data type.
• Here ‘s’ is the Series Object.
Note-
• type() will give the data type of the series.
• tolist() will convert the series back to a list.
Program
Write a program to convert a dictionary to a Pandas series. The dictionary named Students must contain-
Key : Name, RollNo, Class ,Marks , Grade
Value : Your name, rollNo, class,marks and grade
Arrays-
An array is a data structure that contains a group of elements.
Arrays are commonly used in computer programs to organize data so that a related set of
values can be easily sorted or searched.
Each element can be uniquely identified by its index in the array.
Array Series
import pandas as pd
import numpy as np
a=['J','F','M','A']
s= pd.Series(a, index = ["Jan", "Feb", "Mar", "Apr"])
print (s)
NaN
Any item for which one or the other does not have an entry is marked by
NaN, or “Not a Number”, which is how Pandas marks missing data.
>>> import numpy as np
>>> s = pd.Series([1,2,3,4,np.NaN,5,np.NaN])
>>> s
import pandas as pd
import numpy as np s = pd.Series([2,3,np.nan,7,"The Hobbit"])
Note-
The index values associated with the series can be altered by assigning new index values.
Eg:- a.index=[‘May’,’June’,’July’]
To extract part of a series, slicing is done.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
Observe that updating the values in a series using slicing also excludes the value at the end
index position.
But, it changes the value at the end index label when slicing is done using labels.
>>> seriesAlph['c':'e'] = 500
>>> seriesAlph
Program
Write a Pandas program to compare the elements of the two Pandas Series.
Attributes in Series
Note-The output of both the given codes below are the same. We can use np.arange or range
function to generate a set of numbers automatically.
Accessing rows using head () and tail() function
✓ Series.head() function will display the top 5 rows in the series.
✓ Series.tail() function will display the last 5 rows in the series
Series vs Dataframe
• A Series is essentially a column, and a DataFrame is a multi-dimensional table made
up of a collection of Series.
Create DataFrame
It can be created using- Lists , dict , Series , Numpy arrays , Another DataFrame
✓ Here, the dictionary keys are taken as column labels, and the values corresponding to each
key are taken as rows.
✓ There will be as many rows as the number of dictionaries present in the list.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b’,’c’])
>>> df1
import pandas as pd
ab=[{'Name': 'Shaun' , 'Age': 35, 'Marks': 91},{'Name': 'Ritika', 'Age':
31, 'Marks': 87},{'Name': 'Smriti', 'Age': 33, 'Marks': 78},{'Name':
'Jacob' , 'Age': 23, 'Marks': 93}]
ab1=pd.DataFrame(ab,index=['a','b','c','d'])
ab1
import pandas as pd
a=["Jitender","Purnima","Arpit","Jyoti"]
b=[210,211,114,178]
s = pd.Series(a)
s1= pd.Series(b)
df=pd.DataFrame({"Author":s,"Article":s1})
df
>>> p={'one':pd.Series([1,2,3], index=['a','b','c']), 'two':pd.Series([11,22,33,44],
index=['a','b','c','d'])}
>>> q=pd.DataFrame(p)
>>> q