Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
Introduction
Pandas or Python Pandas is Python's library for data analysis. Pandas has derived its name from
"panel data system", which is an econometrics term for multi-dimensional, structured data sets.
Today, Pandas has become a popular choice for data analysis. As you must be aware of that data
analysis refers to process of evaluating big data sets using analytical and statistical tools so as to
discover useful information and conclusion to support business decision-making.
Pandas makes available various tools for data analysis and makes it a simple and easy process as
compared to other available tools. The main author of Pandas is Wes McKinney.
Installing pandas
Go to your search bar on your desktop and search for cmd. An application called Command prompt
should show up. Click to start it.
1. Series
A Series is a Pandas data structure that represents a one dimensional array-like object containing an
array of data (of any Numpy data type) and an associated array of data labels, called its index.
A Series type object has two main components :
> an array of actual data
> an associated array of indexes or data labels.
Both components are one-dimensional arrays with the same length. The index is used to access
individual data values, e.g.,
2. DataFrames
A DataFrame is a two-dimensional labelled array like, pandas data structure that stores an ordered
collection columns that can store data of different types.
If we print this dictionary using print(name_dict) command, it will show us the output like this:
{'name': ['Anita', 'Sajal', 'Ayaan', 'Abhey'], 'age': [14, 32, 3, 6]}
import pandas as pd
name_dict = { 'Name' : ["Anita", "Sajal", "Ayaan", "Abhey"], 'Age' : [14,32, 4, 6] }
df = pd.DataFrame(name_dict)
print(df)
Output
Name Age
0 Anita 14
1 Sajal 15
2 Ayaan 4
3 Abhey 6
As you can see the output generated for the DataFrame object looks similar to what we have seen in
the excel sheet as. Only difference is that the default index value for the first row is 0 in DataFrame
whereas in excel sheet this value is 1. We can also customize this index value as per our need.
Output
Name Age
1 Anita 14
2 Sajal 15
3 Ayaan 4
4 Abhey 6
In the preceding output the index values start from 1 instead of 0