0% found this document useful (0 votes)
10 views

Pandas

Uploaded by

jay4pelican
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Pandas

Uploaded by

jay4pelican
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Introduction to Pandas

A POWERFUL DATA MANIPULATION LIBRARY

BY: JAYANTILAL BHANUSHALI


Introduction to Pandas

• Pandas is a Python Library for Data Analysis and Manipulation


• Pandas is a powerful open-source data manipulation and analysis
library for Python.
• Developed by Wes McKinney and first released in 2008.
• Built on top of NumPy, providing easy-to-use data structures and
functions needed for data manipulation and analysis.
• Simplifies data manipulation tasks that would be complex in raw Python
or NumPy.
• Offers high-level data structures like Data Frame and Series, making
data analysis more intuitive.
Importance in data analysis and
manipulation

• Introduces key data structures – Data Frame and Series.


• Data Frame allows easy representation of tabular data, resembling a
spreadsheet.
• Supports various data formats: CSV, Excel, SQL, JSON, and more.
• Seamless integration with different data storage systems.
• Integration with NumPy and other libraries makes it a powerful tool in
the data science ecosystem.
• Reduces the amount of code needed for complex data tasks.
Installation of Pandas

 We don’t need to install pandas in jupyter Notebook We just simply


write “import pandas as pd” where pd is a variable in which all the
functionalities stored which we simply use through pd.
 How to Install Pandas
 In terminal we have to write “pip install pandas”
 Then we have to write “import pandas as pd”
Modules of Pandas

 Data Frame
 Series
 Reading and Writing Data
 Data Exploration and Manipulation
 Data Aggregation and Grouping
Data Frame

In Pandas, a Data Frame is a two-dimensional, tabular data structure with


labeled axes (rows and columns). It is one of the primary data structures
used for data manipulation and analysis in Python. The Data Frame is
similar to a spreadsheet or SQL table.
Data Frames can dynamically grow or shrink in size. You can add or
remove rows and columns as needed.
Pandas provides a wide range of functions and methods for data
manipulation, cleaning, and analysis. This includes operations for merging,
grouping, aggregating, filtering, and more.
Series

 A Series is a one-dimensional labeled array capable of holding data of


any type. It consists of two main components:
1. Data: The actual data values contained in the Series. These can be of
any valid data type, including integers, floats, strings, or even complex
objects.
2. Index: The index is a set of labels assigned to each element in the
Series, allowing for easy and efficient access to the data.
 You can think of a Pandas Series as a column in an Excel spreadsheet or
a single column in a database table.
 Some Examples of Series Module:-
Reading and Writing Data

 In Python, there are several modules and libraries that facilitate reading
and writing data. Two popular modules for this purpose are ‘Pandas’
and ‘openpyxl’.
 Reading and Writing on particular files
 CSV File
 Excel File
 JSON File
Reading and Writing Example through pandas
:-
Data Exploration and
Manipulation
Data exploration and manipulation are critical steps in the data analysis process. In Python, the
‘pandas’ library is widely used for these tasks.
 Data Exploration:
 Loading Data
 Understanding the Data
 Handling Missing Data
 Exploratory Data Analysis (EDA)
 Data Manipulation
 Filtering and Subsetting
 Grouping and Aggregation
 Merging and Joining DataFrames
 Pivoting and Melting

 Here are some examples of Data Exploration and Manipulation


Data Aggregation and Grouping

 Data Aggregation
Data aggregation involves combining data values from multiple rows into a
single value. Common aggregation functions include sum, mean, median,
count, min, and max.
 Grouping
Grouping involves splitting the data into groups based on some criteria,
applying a function to each group independently, and then combining the
results. The ‘groupby’ function in Pandas is central to this process.
 Here is the examples of Data Aggregation and Grouping
:-
Case Study : Analyzing
Automobile dataset

Scenario:
You work for a data analytics firm, and you have been given access to the
Automobile dataset.
Your task is to analyze the data using pandas.

You might also like