0% found this document useful (0 votes)
4 views

PPT for Assignment-3 (Final_Pandas_Lab)

Pandas is a Python library designed for data manipulation and analysis, providing tools for cleaning, exploring, and analyzing large datasets. It features two primary data structures, DataFrame and Series, which allow users to work with tabular data and perform operations like filtering, slicing, and statistical analysis. Additionally, Pandas can read and write data from CSV files, making it a versatile tool in data science.

Uploaded by

skaushal1be23
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

PPT for Assignment-3 (Final_Pandas_Lab)

Pandas is a Python library designed for data manipulation and analysis, providing tools for cleaning, exploring, and analyzing large datasets. It features two primary data structures, DataFrame and Series, which allow users to work with tabular data and perform operations like filtering, slicing, and statistical analysis. Additionally, Pandas can read and write data from CSV files, making it a versatile tool in data science.

Uploaded by

skaushal1be23
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Pandas

Pandas
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and
manipulating data.
Why Use Pandas?
• Pandas allows us to analyze big data and make conclusions
based on statistical theories.
• Pandas can clean messy data sets, and make them readable
and relevant.
• Relevant data is very important in data science.
What Can Pandas Do?
• Pandas gives you answers about the data. Like:
– Is there a correlation between two or more columns?
– What is average value?
– Max value?
– Min value?
• Pandas are also able to delete rows that are not relevant, or
contains wrong values, like empty or NULL values. This is
called cleaning the data.
• CSV, Excel files converted to dataframe
Pandas Data Structures
•Data Structures: Pandas offers two primary data structures
»DataFrame and Series
Pandas DataFrames
• A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.
• Pandas DataFrame is a two-dimensional, size-mutable, and
heterogeneous data structure (similar to a table in a relational
database or an Excel spreadsheet).
Pandas DataFrames
• Example:
• Create a simple Pandas DataFrame:
• import pandas as pd //Importing pandas library

data = {
’calories’: [420, 380, 390], Keys correspond to columns and values
’duration’: [50, 40, 45] respond to rows
}

#load data into a DataFrame object: Creating a dataframe out of a dictionary


df = pd.DataFrame(data) Calories Duration
0 420 50
print(df) 1 380 40
2 390 45
Example
• import pandas as pd

mydataset = { Cars Passings


'cars': ["BMW", "Volvo", "Ford"], 0 BMW 3
'passings': [3, 7, 2] 1 Volvo 7

} 2 Ford 2

myvar = pd.DataFrame(mydataset)

print(myvar)
Dataframe with one column
• X = df[[‘Length’]]
Dataframe with multiple columns
Pandas DataFrames
• df.head() – First five rows of the dataframe
• df.tail() - View the last n rows of the DataFrame (default is 5 rows).
• df.info(): This method provides a concise summary of the DataFrame, including
the number of non-null entries, column names, and data types.
• df.describe(): Returns description of the data in the DataFrame.
• count - The number of not-empty values.
mean - The average (mean) value.
std - The standard deviation.
min - the minimum value.
25% - The 25% percentile*.
50% - The 50% percentile*.
75% - The 75% percentile*.
max - the maximum value.
Pandas DataFrames
• Locate Row:
– As you can see from the result above, the DataFrame is like a table
with rows and columns.
– Pandas use the loc attribute to return one or more specified row(s)
Pandas DataFrames
• Example
• Return row 0:
• #refer to the row index:
print(df.loc[0])
Pandas Series
• A one-dimensional labeled array, essentially a single column or row of
data.
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.
• Example:
– Create a simple Pandas Series from a list:
– import pandas as pd

a = [1, 7, 2]

data1 = pd.Series(a)
Labels
• If nothing else is specified, the values are labeled with their
index number. First value has index 0, second value has index 1
etc.
• This label can be used to access a specified value.
• With the index argument, you can name your own labels.
Labels
• Example
• Create your own labels:
myVar
• import pandas as pd x 1

y 7
a = [1, 7, 2]
z 2

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)
Df.loc[‘b’:’e’, ‘Artist’]
Navigating DataFrame – loc and iloc
• Both used for accessing rows and columns in dataframe
Label-based Indexing (loc) Integer location-based Indexing (iloc)
• Access rows and columns by their • Access rows and columns by their
labels integer positions
• Includes both the start and end • Excludes end positions in slicing
positions in slicing • Does not accept Boolean data
• df.loc[row_labels, column_labels] • df.iloc[row_labels, column_labels]

df.loc[1, 'column_name'] # Accesses the row with label 1 and column 'column_name'
df.loc[1:3, 'col1':'col3'] # Accesses rows 1 to 3 and columns 'col1' to 'col3'

df.iloc[1, 0] # Accesses the second row (index 1) and first column (index 0)
df.iloc[1:3, 0:2] # Accesses rows at index 1 and 2, and columns at index 0 and 1
Navigating Data Frame
• iloc exclusively uses integer positions for accessing data.
• As a result, it makes it particularly useful when dealing with data where labels
might be unknown or irrelevant.

• df.iloc[row number/slice]
• df.iloc[4], df.iloc[1:4], df.iloc[:],
• df.iloc[1:4, 5:8] --- SLICING
Dataframe Slicing
Dataframe Slicing
Navigating Data Frame
• df.iloc[4]-This command selects the 5th row (index 4) from the
DataFrame df. It returns a single row as a Series.
• df.iloc[1:4]:This command selects a slice of rows from index 1
to 3 (excluding index 4) from df. It returns multiple rows as a
DataFrame.
Navigating Data Frame
• df.iloc[:]
• -This command selects all rows and columns from df. It’s
essentially the same as df, returning the entire DataFrame.
• df.iloc[1:4, 5:8]:This command selects rows from index 1 to 3
(excluding 4) and columns from index 5 to 7 (excluding 8). It
returns the specified subset as a DataFrame.
Navigating Data Frame
• df.iloc[:,2]-This will select all rows (:) for the specified column
index (3rd column), effectively giving you the entire column
without specifically extracting any single row.
• This is the closest way to extract a column with .iloc without
targeting individual rows.
Listing Unique Values
create a new database consisting of songs from the 1980s and after
When you use the Boolean Series as an index (e.g., df[boolean_series]), pandas returns a new
DataFrame containing only the rows where the Boolean Series has True values.
Pandas Read CSV
• A simple way to store big data sets is to use
CSV files (comma separated files).
• CSV files contains plain text and is a well know
format that can be read by everyone including
Pandas.
• In our examples we will be using a CSV file
called 'data.csv'.
Pandas Read CSV
• Example:
• Load the CSV into a DataFrame:
• import pandas as pd

df = pd.read_csv('data.csv’)
df.head() #show only first 5 rows
print(df.to_string()) #show all the rows
Pandas Read CSV
• The pd.read_csv() function is used to read the data from the
data.csv file.
• df.to_string() converts the entire DataFrame df into a string
representation, showing all rows and columns.
• If you have a large DataFrame with many rows, Pandas will only
return the first 5 rows, and the last 5 rows:
Delete a column from Dataset
• You can delete a column or feature
from a dataset-
– df.drop(labels, axis, index, columns,
– df.drop(labels, axis, index, columns, inplace)
inplace) Method 2
Method 1
Dropping both
Dropping either – Index = labels
– Column = labels
– Axis = Whether to drop labels from the
index (0 or ‘index’) or columns (1 or
‘columns’)
– Labels = Index or column labels to drop
df.drop(labels=[‘Age’], axis=1, inplace=True)
df.drop(index=[1], columns=['Salary'],
inplace=True)
Delete a column from Dataset

– df.drop(df.columns[1], axis=1, inplace=True)

•Column Selection: df.columns[1] is used to select the second column.


•Axis Parameter: axis=1 specifies you are dropping a column. For rows, use axis=0.
•Inplace=True - If you want to modify the DataFrame in place.
•Inplace=False- If you do not want to modify the DataFrame in place
Delete a row from Dataset
• You can delete a row or feature from a dataset-
– df.drop(1, axis=0, inplace=True)

•Row Selection: The first parameter ‘1’ is used to select the first row.
•Axis Parameter: axis=0 specifies you are dropping a row.
•Inplace=True - If you want to modify the DataFrame in place.
•Inplace=False- If you do not want to modify the DataFrame in place

You might also like