Intro to
Pandas
For Data Science
Andres Vourakis
Data Scientist & Mentor
@andresvourakis
Understanding
Pandas will allow you
to manipulate and
analyze data
efficiently, enabling
you to derive insights
quickly and prepare
data for advanced
analysis.
Core
Pandas Concepts
Here is what we’ll cover:
1. DataFrames and Series
2. Data Selection and Filtering
3. Data Cleaning and Transformation
4. Merging and Joining DataFrames
5. Aggregation and Grouping
6. Working with Dates and Times
DataFrames and Series
Pandas' core structures allow for powerful data
manipulation and analysis.
Series Series DataFrame
Visits Page Page Visits
0 20 0 /home 0 /home 20
1 40 1 /about 1 /about 40
2 10 2 /contact 2 /contact 10
3 90 3 /projects 3 /projects 90
Data Selection and
Filtering
Use .loc[] to filter data by row and column names
or conditions, and .iloc[] to filter data by row and
column numbers.
Before Filtering
After Filtering
Page Visits
Page
0 /home 20
1 /about
1 /about 40
3 /projects
2 /contact 10
3 /projects 90
Data Cleaning and
Transformation
Handle missing data with functions like:
fillna() to replace NaNs.
dropna() to remove NaNs.
apply() and map() for transformations.
Pandas simplifies the process of preparing your
data for analysis.
Merging and Joining
DataFrames
Combine datasets using:
merge() for relational joins.
concat() to concatenate DataFrames.
join() for combining on index.
Pandas provides powerful functions to work with
multiple datasets.
Aggregation and
Grouping
Summarize data with:
groupby() to group data based on one or more
columns.
Aggregation functions like mean(), sum(), count(), etc.
Before Grouping
After Grouping
Category Page Visits
0 Main /home 20 Category
1 Main /about 40 0 Main 150
2 Support /contact 10 1 Support 10
3 Main /projects 90
Working with Dates
and Times
Pandas supports powerful time series
functionality:
to_datetime() for converting strings to
datetime objects.
resample() for time-based groupings.
This!
Event Date Event Date Day of Week
0 Event 1 2024-08-22 0 Event 1 2024-08-22 Thursday
1 Event 2 2024-08-23 1 Event 2 2024-08-23 Friday
2 Event 3 2024-08-24 2 Event 3 2024-08-24 Saturday
Bonus Tip
PandasAI
PandasAI is a Python library that integrates
generative artificial intelligence capabilities into
pandas, making dataframes conversational.
Andres Vourakis
Data Scientist & Mentor
Follow for more
Data Science content