Chapter 2. Pre-Processing Data
Chapter 2. Pre-Processing Data
Pre-processing data
Content
• # Replace NA with 0
• data[is.na(data)] <- 0
• # Replace NA in a specific column with the mean
• data$ColumnName[is.na(data$ColumnName)] <- mean(data$ColumnName, na.rm = TRUE)
# Verify by plotting
ggplot(data, aes(y = value)) +
geom_boxplot(fill = "skyblue", color = "black") +
labs(title = "Boxplot After Imputing Outliers", y = "Value") +
theme_minimal()
Summary
• Identify Outliers: Use summary statistics and visualizations like boxplots.
• Process Outliers: Choose a method based on your analysis goals:
• Remove Outliers: Exclude them from your dataset.
• Transform Data: Apply transformations to mitigate their impact.
• Impute Outliers: Replace them with a more representative value.
Content