0% found this document useful (0 votes)
2 views

What is pandas

Pandas is a widely-used Python library for data manipulation and analysis, providing tools for handling structured data like DataFrames. Key functionalities include data cleaning, transformation, analysis, and visualization, along with built-in functions for creating, inspecting, and manipulating DataFrames. To become proficient in Pandas, one should learn its data structures, data import/export methods, and various functions for data exploration, cleaning, transformation, and aggregation.

Uploaded by

Atif Firoz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

What is pandas

Pandas is a widely-used Python library for data manipulation and analysis, providing tools for handling structured data like DataFrames. Key functionalities include data cleaning, transformation, analysis, and visualization, along with built-in functions for creating, inspecting, and manipulating DataFrames. To become proficient in Pandas, one should learn its data structures, data import/export methods, and various functions for data exploration, cleaning, transformation, and aggregation.

Uploaded by

Atif Firoz
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

PANDAS

 What is pandas. And why it is used for?

Ans: Pandas is a popular Python library used for data manipulation and analysis. It provides
powerful tools to work with structured data, like tables (DataFrames), which makes it easier to
handle, clean, and analyze data.

Why is it used?

 Data handling: It helps you store and organize data in a table format (rows and
columns).
 Data cleaning: You can remove or fix missing or incorrect data.
 Data transformation: It allows you to manipulate data (e.g., sorting, filtering, and
grouping).
 Analysis: You can easily calculate statistics, like averages or sums.
 Visualization: Pandas integrates with other libraries to visualize data trends and insights.

 What things to learn to became proficient in pandas?

Ans: for this

Basics of Pandas

 Data Structures:
o Series: A one-dimensional labeled array (similar to a list or array).
o DataFrame: A two-dimensional table (like a spreadsheet or SQL table)

Dataframe: A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous


tabular data structure in programming.

How We Can Use DataFrames to Manipulate Data:

 Creating a DataFrame: A DataFrame can be created from various data sources, such as
dictionaries, lists, or even from external files like CSV or Excel.

Code:

import pandas as pd

# Creating a DataFrame from a dictionary

data = {

'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los
Angeles', 'Chicago'] }
df = pd.DataFrame(data)

print(df).

 Reading a file
Code:
df = pd.read_csv('data.csv') # Loading data from a CSV file

 Accessing and Selecting Data: You can access specific rows, columns, or cells using
labels or indices.
Code:
df['Age'] -- for selectinf columns

df.iloc[1] # Access row by index - for selecting rows

df.loc[1] # Access row by label (if labeled)

 Filtering Data: You can filter rows based on conditions.

df[df['Age'] > 30] # Rows where the Age is greater than 30

 Handling Missing Data: DataFrames have built-in methods to handle missing values (NaN).

df['Age'].fillna(df['Age'].mean(), inplace=True) - for filling missing values

df.dropna(inplace=True) -- Dropping missing values

 Aggregating and Summarizing Data: You can perform statistical operations such as sum, mean,
median, etc.

df['Age'].mean() # Find the average of the Age column

df.groupby('City')['Age'].mean() # Group by City and calculate mean Age

 Sorting Data: You can sort the data by one or more columns.

df.sort_values(by='Age', ascending=False)

 Merging DataFrames: If you have multiple data sources, you can combine DataFrames using
merge or concat.

df2 = pd.DataFrame({
'Name': ['Alice', 'Bob'],

'Salary': [50000, 60000] })

merged_df = pd.merge(df, df2, on='Name')

Inbuilt functions

DataFrame Creation & Inspection

 pd.DataFrame(): Used to create DataFrame objects.


 pd.read_csv(): Reads a CSV file into a DataFrame.
 pd.read_excel(): Reads an Excel file into a DataFrame.
 head(): Returns the first n rows of the DataFrame (default n=5).
 tail(): Returns the last n rows of the DataFrame (default n=5).
 info(): Provides summary of the DataFrame, including column types and non-null
values.
 shape(): Returns a tuple representing the dimensionality (rows, columns) of the
DataFrame.
 columns: Returns the column names of the DataFrame.
 describe(): Provides summary statistics of numerical columns.
 dtypes: Displays the data types of each column.

Data Cleaning & Transformation

 dropna(): Removes missing values.


 fillna(): Fills missing values with a specified value.
 replace(): Replaces specified values with other values.
 astype(): Casts a DataFrame column to a specified type.
 rename(): Renames columns or indices.
 apply(): Applies a function along the axis of the DataFrame (row-wise or column-wise).
 map(): Applies a mapping function on Series.
 cut(): Bins continuous data into discrete intervals.
 merge(): Merges two DataFrames on a specific column or index (similar to SQL JOIN).
 concat(): Concatenates two or more DataFrames along a particular axis.
 pivot(): Pivots a DataFrame (reshapes it by converting unique values into columns).
 melt(): Unpivots a DataFrame from wide to long format.

Log in

Sign up
You said:
hey gpt tell all the inbuilt function in pandas important for data analysis

ChatGPT said:

Pandas provides a wide array of built-in functions that are essential for data analysis. Here's a
breakdown of the most important ones:

1. DataFrame Creation & Inspection

 pd.DataFrame(): Used to create DataFrame objects.


 pd.read_csv(): Reads a CSV file into a DataFrame.
 pd.read_excel(): Reads an Excel file into a DataFrame.
 head(): Returns the first n rows of the DataFrame (default n=5).
 tail(): Returns the last n rows of the DataFrame (default n=5).
 info(): Provides summary of the DataFrame, including column types and non-null values.
 shape(): Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
 columns: Returns the column names of the DataFrame.
 describe(): Provides summary statistics of numerical columns.
 dtypes: Displays the data types of each column.

2. Data Cleaning & Transformation

 dropna(): Removes missing values.


 fillna(): Fills missing values with a specified value.
 replace(): Replaces specified values with other values.
 astype(): Casts a DataFrame column to a specified type.
 rename(): Renames columns or indices.
 apply(): Applies a function along the axis of the DataFrame (row-wise or column-wise).
 map(): Applies a mapping function on Series.
 cut(): Bins continuous data into discrete intervals.
 merge(): Merges two DataFrames on a specific column or index (similar to SQL JOIN).
 concat(): Concatenates two or more DataFrames along a particular axis.
 pivot(): Pivots a DataFrame (reshapes it by converting unique values into columns).
 melt(): Unpivots a DataFrame from wide to long format.

3. Data Selection & Indexing

 loc[]: Label-based indexing for selecting rows and columns by labels.


 iloc[]: Integer-location based indexing for selecting by index position.
 at[]: Accesses a single value for a row/column label pair.
 iat[]: Accesses a single value for a row/column index pair.
 query(): Filters a DataFrame based on a query string.
 set_index(): Sets one or more columns as the index.
 reset_index(): Resets the index back to default integer indexing.
 get(): Gets a specific column or index by label.
4.Data Aggregation & Grouping

 groupby(): Groups data by columns and applies aggregation functions.


 agg(): Applies multiple aggregation functions to grouped data.
 sum(): Computes the sum of values in each column or group.
 mean(): Computes the mean (average) of values.
 count(): Counts the number of non-null entries.
 min() / max(): Finds the minimum/maximum values.
 std(): Computes the standard deviation.
 cumsum(): Cumulative sum of values.
 cumprod(): Cumulative product of values.
 rank(): Ranks the data.
 pivot_table(): Creates a pivot table for aggregating data.

5.String Manipulation

 str.contains(): Checks if a substring exists in each element.


 str.replace(): Replaces occurrences of a substring with another.
 str.split(): Splits each string in the column into a list of strings.
 str.lower() / str.upper(): Converts strings to lowercase/uppercase.
 str.strip(): Removes leading and trailing whitespace.
 str.len(): Returns the length of each string.
 str.extract(): Extracts specific patterns using regular expressions.

6. Time Series & Date Manipulation

 to_datetime(): Converts a string or other object to datetime format.


 dt: Accessor for datetime properties such as year, month, day, etc.
 resample(): Resamples time series data to a different frequency (e.g., daily to monthly).
 timedelta: Represents a difference between two dates/times.

7. Data Visualization

 plot(): Basic plotting function to plot data directly from DataFrame (requires
matplotlib).
 hist(): Creates a histogram of a column.
 boxplot(): Generates boxplots.
 scatter(): Creates a scatter plot.

8. Handling Duplicates

 duplicated(): Returns a boolean Series indicating duplicate rows.


 drop_duplicates(): Removes duplicate rows from a DataFrame.
9. Statistical Functions

 corr(): Calculates the correlation between columns.


 cov(): Computes covariance between columns.
 skew(): Computes the skewness of the data.
 kurt(): Computes the kurtosis of the data.

10. Window Functions

 rolling(): Provides rolling window calculations (e.g., moving average).


 expanding(): Cumulative calculations across an expanding window.
 ewm(): Exponentially weighted functions for smoothing and analysis.

11. Exporting Data

 to_csv(): Exports a DataFrame to a CSV file.


 to_excel(): Exports a DataFrame to an Excel file.
 to_sql(): Exports a DataFrame to an SQL database.
FINAL SHOT

1. Basics of Pandas

 Data Structures:
o Series: A one-dimensional labeled array (similar to a list or array).
o DataFrame: A two-dimensional table (like a spreadsheet or SQL table).
 Creating DataFrames: Learn how to create DataFrames from lists, dictionaries, CSVs,
Excel files, or other sources.

2. Data Import & Export

 Reading data:
o pd.read_csv(), pd.read_excel(), pd.read_sql(), etc.
 Exporting data:
o to_csv(), to_excel(), to_sql().

3. Data Exploration

 Viewing data:
o head(), tail() (First and last n rows).
 Summary:
o info(), describe(), shape, dtypes.
 Indexing & Selecting Data:
o loc[], iloc[], at[], iat[] (label-based and integer-based indexing).
 Filtering:
o Conditional filtering: Selecting data based on certain conditions (e.g.,
df[df['column'] > 5]).

4. Data Cleaning

 Handling missing data:


o dropna(), fillna(), isna().
 Removing duplicates:
o drop_duplicates().
 Replacing data:
o replace() (Replace values in a column).
 Renaming columns:
o rename().

5. Data Transformation

 Changing data types:


o astype() (e.g., convert a column to int or float).

 Applying functions:
o apply(), map() (Apply a function to rows or columns).
 Sorting:
o sort_values(), sort_index() (Sort data based on values or index).
 Combining data:
o concat(), merge(), join() (Merging, concatenating, or joining multiple
DataFrames).
 Binning data:
o cut() (Divide continuous data into bins).

6. Grouping & Aggregation

 Grouping data:
o groupby() (Group by one or more columns).
 Aggregation:
o sum(), mean(), count(), min(), max(), agg() (Apply aggregation functions to
groups).
 Pivot tables:
o pivot_table() (Create pivot tables for summarizing data).

7. Handling Time Series Data

 Date and time manipulation:


o to_datetime(), dt (Extracting year, month, day, etc., from datetime objects).
 Resampling:
o resample() (Change the frequency of time series data, e.g., daily to monthly).
 Shifting data:
o shift(), diff() (Shifting or calculating differences in time series).

8. String Manipulation

 String operations:
o str.contains(), str.replace(), str.split(), str.lower(), str.upper(),
str.strip(), str.len() (Manipulate text data in columns).

9. Windowing & Rolling Operations

 Rolling windows:
o rolling() (Used for moving averages, cumulative sums, etc.).
 Expanding windows:
o expanding() (Cumulative calculations).

10. Data Visualization (Basic)

 Plotting:
o plot()(Generate simple plots directly from a DataFrame).
 Histograms and Boxplots:
o hist(), boxplot() (Visualize distributions and summary statistics).

11. Advanced Features

 MultiIndex:
o Learn to handle hierarchical indexing using set_index() and reset_index().
 Concatenating along axes:
o concat() (Append rows/columns to existing DataFrames).
 Merging:
o merge() (Perform SQL-style joins between DataFrames).
 Pivoting & Melting:
o pivot(), melt() (Reshape data).

You might also like