0% found this document useful (0 votes)

44 views13 pages

Pandas Data Structures: Sections

The Pandas library is a powerful tool for data analysis and manipulation in Python. It provides two main data structures - Series for one-dimensional data and DataFrame for two-dimensional tabular data. Pandas allows users to easily load and export data from CSV, Excel, and SQL databases. It also offers a variety of functions for data wrangling tasks like selecting, sorting, ranking, summarizing, and aligning data.

Uploaded by

Vinothkumar Radhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views13 pages

Pandas Data Structures: Sections

Uploaded by

Vinothkumar Radhakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

The Pandas library is one of the most powerful libraries in Python.

It is built on
NumPy and provides easy-to-use data structures and data analysis tools for the Python
programming language.

Check out the sections below to learn the various functions and tools Pandas offers.

Sections:

1. Pandas Data Structures

2. Dropping

3. Sort & Rank

4. Retrieving Series/DataFrame Information

5. DataFrame Summary

6. Selection

7. Applying Functions

8. Data Alignment

9. In/Out

Pandas Data Structures

There are two main types of data structures that the Pandas library is centered
around. The first is a one-dimensional array called a Series, and the second is a two-
dimensional table called a Data Frame.

Series — One dimensional labeled array

>>> s = pd.Series([3, -5, 7, 4], index = ['a','b','c','d'])

a 3

b -5

c 7

d 4

Data Frame — A two dimensional labeled data structure

>>> data = {'Country':['Belgium','India','Brazil'], 'Capital':

['Brussels','New Delhi','Brasilia'], 'Population':
['111907','1303021','208476']}
>>> df = pd.DataFrame(data, columns =
['Country','Capital','Population'])

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Dropping
In this section, you’ll learn how to remove specific values from a Series, and how to
remove columns or rows from a Data Frame.

s and df in the code below are used as examples of a Series and Data Frame
throughout this section.

>>> s

a 6
b -5

c 7

d 4

>>> df

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Drop values from rows (axis = 0)

>>> s.drop(['a','c'])

b -5
d 4

Drop values from columns (axis = 1)

>>> df.drop('Country', axis = 1)

Capital Population
0 Brussels

111907
1 New Delhi 1303021

2 Brasilia 208476

Sort & Rank

In this section, you’ll learn how to sort Data Frames by an index, or column, along
with learning how to rank column values.

df in the code below is used as an example Data Frame throughout this section.

>>> df

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Sort by labels along an axis

>>> df.sort_index()

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Sort by values along an axis

>>> df.sort_values(by = 'Country')

Country Capital Population

0 Belgium Brussels 111907

2 Brazil Brasilia 208476

1 India New Delhi 1303021

Assign ranks to entries

>>> df.rank()

Country Capital Population

0 1.0 2.0 1.0

1 3.0 3.0 2.0

2 2.0 1.0 3.0

Retrieving Series/DataFrame Information

In this section, you’ll learn how to retrieve info from a Data Frame that includes the
dimensions, column names column types, and index range.

df in the code below is used as an example Data Frame throughout this section.

>>> df

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

(rows, columns)

>>> df.shape
(3, 3)

Describe index

>>> df.index

RangeIndex(start=0, stop=3, step=1)

Describe DataFrame columns

>>> df.columns

Index(['Country', 'Capital', 'Population'], dtype='object')

Info on DataFrame

>>> df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

Country 3 non-null object

Capital 3 non-null object

Population 3 non-null object

dtypes: object(3)

memory usage: 152.0+ bytes

Number of non-NA values

>>> df.count()

Country 3
Capital 3

Population 3

DataFrame Summary
In this section, you’ll learn how to retrieve summary statistics of a Data Frame which
include the sum of each column, min/max values of each column, mean values of
each column, and others.

df in the code below is used as an example of a Data Frame throughout this section.

>>> df

Even Odd
0 2 1

1 4 3

2 6 5

Sum of values

>>> df.sum()
Even 12

Odd 9

Cumulative sum of values

>>> df.cumsum()

Even Odd
0 2 1

1 6 4

2 12 9

Minimum value

>>> df.min()
Even 2

Odd 1

Maximum value
>>> df.max()
Even 6

Odd 5

Summary statistics

>>> df.describe()

Even Odd
count 3.0 3.0

mean 4.0 3.0

std 2.0 2.0

min 2.0 1.0

25% 3.0 2.0

50% 4.0 3.0

75% 5.0 4.0

max 6.0 5.0

Mean of values

>>> df.mean()
Even 4.0

Odd 3.0

Median of values

>>> df.median()
Even 4.0

Odd 3.0

Selection
In this section, you’ll learn how to retrieve specific values from a Series and Data
Frame.
s and df in the code below are used as examples of a Series and Data Frame
throughout this section.

>>> s

a 6
b -5

c 7

d 4

>>> df

Country Capital Population

0 Belgium Brussels 111907

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Get one element

>>> s['b']
-5

Get subset of a DataFrame

>>> df[1:]

Country Capital Population

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Select single value by row & column

>>> df.iloc[0,0]
'Belgium'

Select single value by row and column labels

>>> df.loc[0,'Country']
'Belgium'

Select single row of subset rows

>>> df.ix[2]

Country Brazil
Capital Brasilia

Population 208476

Select a single column of subset of columns

>>> df.ix[:,'Capital']

0 Brussels
1 New Delhi

2 Brasilia

Select rows and columns

>>> df.ix[1,'Capital']
'New Delhi'

Use filter to adjust DataFrame

>>> df[df['Population'] > 120000]

Country Capital Population

1 India New Delhi 1303021

2 Brazil Brasilia 208476

Set index a of Series s to 6

>>> s['a'] = 6

a 6
b -5

c 7

d 4

Applying Functions
In this section, you’ll learn how to apply a function to all values of a Data Frame or a
specific column.

df in the code below is used as an example of a Data Frame throughout this section.

>>> df

Even Odd
0 2 1

1 4 3

2 6 5

Apply function

>>> df.apply(lambda x: x*2)

Even Odd
0 4 2

1 8 6

2 12 10

Data Alignment
In this section, you’ll learn how to add, subtract, and divide two series that have
different indexes from one another.

s and s3in the code below are used as examples of Series throughout this section.

>>> s

a 6
b -5

c 7

d 4

>>> s3

a 7
c -2

d 3

Internal Data Alignment

>>> s + s3

a 13.0
b NaN

c 5.0

d 7.0

#NA values are introduced in the indices that don't overlap

Arithmetic Operations with Fill Methods

>>> s.add(s3, fill_value = 0)

a 13.0
b -5.0

c 5.0

d 7.0

>>> s.sub(s3, fill_value = 2)

a -1.0
b -7.0
c 9.0
d 1.0

>>> s.div(s3, fill_value = 4)

a 0.857143
b -1.250000

c -3.500000

d 1.333333

In/Out
In this section, you’ll learn how to read a CSV file, Excel file, and SQL Query into
Python using Pandas. You will also learn how to export a Data Frame from Pandas into
a CSV file, Excel file, and SQL Query.

Read CSV file

>>> pd.read_csv('file.csv')

Write to CSV file

>>> df.to_csv('myDataFrame.csv')

Read Excel file

>>> pd.read_excel('file.xlsx')

Write to Excel file

>>> pd.to_excel('dir/'myDataFrame.xlsx')

Read multiple sheets from the same file

>>> xlsx = pd.ExcelFile('file.xls')

>>> df = pd.read_excel(xlsx, Sheet1')

Read SQL Query

>>> from sqlalchemy import create_engine

>>>

engine = create_engine('sqlite:///:memory:')
>>>

pd.read_sql('SELECT * FROM my_table;', engine)

>>> pd.read_sql_table('my_table', engine)

Write to SQL Query

>>> pd.to_sql('myDF', engine)

Python is the top dog when it comes to data science for now and in the foreseeable
future. Knowledge of Pandas, one of its most powerful libraries is often a requirement
for Data Scientists today.

Use this cheat sheet as a guide in the beginning and come back to it when needed, and
you’ll be well on your way to mastering the Pandas library.

Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
BSS/OSS in The Era of Digital Services
100% (1)
BSS/OSS in The Era of Digital Services
48 pages
Quiz 3.2 - Equivalent Expressions
No ratings yet
Quiz 3.2 - Equivalent Expressions
11 pages
1750001761_0
No ratings yet
1750001761_0
37 pages
ReleaseNotes_2025.1.0_v13
No ratings yet
ReleaseNotes_2025.1.0_v13
73 pages
Puma HSSE ver 4.0
No ratings yet
Puma HSSE ver 4.0
6 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
mic workbook
No ratings yet
mic workbook
15 pages
EOM Checklist
No ratings yet
EOM Checklist
4 pages
Pandas
No ratings yet
Pandas
13 pages
Lenovo Legion Pro 5i
No ratings yet
Lenovo Legion Pro 5i
36 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
pandas-cheet-sheet
No ratings yet
pandas-cheet-sheet
1 page
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
unit 3
No ratings yet
unit 3
10 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Assignment2021 1 2-1
No ratings yet
Assignment2021 1 2-1
3 pages
G3 - Las - 3
No ratings yet
G3 - Las - 3
10 pages
Gungriffon Blaze - Manual
No ratings yet
Gungriffon Blaze - Manual
14 pages
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
Pandas_Cheat_Sheet (1)_240511_113437
No ratings yet
Pandas_Cheat_Sheet (1)_240511_113437
1 page
Mutant-1 0
No ratings yet
Mutant-1 0
3 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
1080sprint2 Brochure 2023 Digital28129 1
No ratings yet
1080sprint2 Brochure 2023 Digital28129 1
10 pages
Data Science Notes Unit-1 Part -2
No ratings yet
Data Science Notes Unit-1 Part -2
22 pages
Tulip Epicor In-Depth Report
No ratings yet
Tulip Epicor In-Depth Report
3 pages
Chapter 3 Multithreading
No ratings yet
Chapter 3 Multithreading
65 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Enterprise Networking, Security, and Autom - Cisco Networking Academy.s
100% (1)
Enterprise Networking, Security, and Autom - Cisco Networking Academy.s
1,490 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
Home Credit Default Risk
No ratings yet
Home Credit Default Risk
21 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
CFSETUP
No ratings yet
CFSETUP
43 pages
M&D Game - 01
No ratings yet
M&D Game - 01
2 pages
05Getting Started With Pandas
No ratings yet
05Getting Started With Pandas
44 pages
GP-A GitHub Tutorial
No ratings yet
GP-A GitHub Tutorial
3 pages
CO3_1_Pandas Series and Data Frame
No ratings yet
CO3_1_Pandas Series and Data Frame
37 pages
13.4 Human Factors Psychology and Workplace Design - Psychology 2e - OpenStax
No ratings yet
13.4 Human Factors Psychology and Workplace Design - Psychology 2e - OpenStax
3 pages
Pandas
No ratings yet
Pandas
9 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
BSCS Curriculum 2018
No ratings yet
BSCS Curriculum 2018
4 pages
All questions
No ratings yet
All questions
68 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
26 pages
Guidelines On PD Modelling: Fondi Besa
No ratings yet
Guidelines On PD Modelling: Fondi Besa
13 pages
Libro Introducción A La Psicologia Médica - Solanes
No ratings yet
Libro Introducción A La Psicologia Médica - Solanes
123 pages
PPT for Assignment-3 (Final_Pandas_Lab)
No ratings yet
PPT for Assignment-3 (Final_Pandas_Lab)
40 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Pandas
No ratings yet
Pandas
42 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Lecture 2 - STAT - 2022
No ratings yet
Lecture 2 - STAT - 2022
6 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
The Value of Canvas
No ratings yet
The Value of Canvas
1 page
Pandas
No ratings yet
Pandas
13 pages
Network Fire Alarm Control Panel FX-2009-12NDS: Features
No ratings yet
Network Fire Alarm Control Panel FX-2009-12NDS: Features
6 pages
Loadwise 502 RCI Operators Manual
No ratings yet
Loadwise 502 RCI Operators Manual
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
IPU43K4-B3A21 PSU Specification
No ratings yet
IPU43K4-B3A21 PSU Specification
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
IFRS 9 Discussion - 22-Feb-22
No ratings yet
IFRS 9 Discussion - 22-Feb-22
1 page
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Capstone-2 Market Basket Analysis Vinothkumar R
No ratings yet
Capstone-2 Market Basket Analysis Vinothkumar R
18 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
JD Senior Analyst Governance
No ratings yet
JD Senior Analyst Governance
2 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
12 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Graph Theory: Adithya Bhaskar January 28, 2016
No ratings yet
Graph Theory: Adithya Bhaskar January 28, 2016
5 pages
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
From Everand
The Informed Company: How to Build Modern Agile Data Stacks that Drive Winning Insights
Dave Fowler
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Top 100 Free Utilities
100% (1)
Top 100 Free Utilities
4 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Microsoft Azure Data Engineer DP 203
From Everand
Microsoft Azure Data Engineer DP 203
Manish Soni
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet