0% found this document useful (0 votes)
14 views2 pages

DL_EDA_process

Uploaded by

nickn1390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

DL_EDA_process

Uploaded by

nickn1390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) in machine learning is the process of analyzing and
visualizing datasets to understand their main characteristics, identify patterns, detect anomalies,
and check assumptions. EDA helps prepare data for modeling by revealing insights and guiding
feature selection, data transformation, and pre-processing steps. Here’s an overview of EDA’s
purpose and components:

1. Understanding Data Structure

● Data Summary: Get a high-level overview of the data, including the number of rows,
columns, and data types.
● Data Types and Format: Identify the data types (e.g., numerical, categorical) to
determine which statistical or visualization techniques to apply.
● Null Values: Check for missing values using .isna() and .sum() and decide on
strategies for handling them (imputation, deletion, etc.).

2. Descriptive Statistics

● Central Tendency: Examine measures like mean, median, and mode to understand the
central values of each feature.
● Spread and Range: Check variance, standard deviation, and range to understand how
data points are spread out.
● Distribution: Visualize distributions using histograms, box plots, or density plots to spot
skewness, kurtosis, and outliers.

3. Data Relationships

● Correlation Analysis: Use correlation matrices or heatmaps to identify relationships


between numerical features, which can inform feature selection and multicollinearity
concerns.
● Categorical Analysis: Analyze counts and distributions of categorical features using bar
charts, pie charts, and value_counts().

4. Identifying Outliers and Anomalies

● Outlier Detection: Use box plots, scatter plots, and z-scores to detect unusual data
points that may need addressing.
5. Feature Engineering Insights

● Identifying Useful Transformations: Based on data distributions, you may identify


opportunities for transformations (e.g., log transformation, normalization, or encoding of
categorical variables).
● Creating New Features: EDA can reveal patterns suggesting new feature combinations
or aggregations.

6. Data Cleaning

● Address missing values, incorrect or inconsistent data, and outliers based on insights
gained from EDA.

Benefits of EDA in Machine Learning

EDA provides critical information for building effective models by:

● Highlighting patterns that might affect modeling.


● Guiding the choice of algorithms and model parameters.
● Improving model performance by informing data preparation steps.

Overall, EDA is a foundational step in the machine learning pipeline, setting the stage for more
reliable, accurate models.

You might also like