0% found this document useful (0 votes)
17 views2 pages

Prac3 AAM

This document outlines practical steps for data preprocessing in data analysis and machine learning, including reading datasets in various formats (text, CSV, JSON, XML), identifying numeric and categorical attributes, and handling missing data. It emphasizes the importance of rescaling and encoding data, as well as performing feature selection based on correlation analysis to prepare datasets for modeling. Overall, these steps are crucial for ensuring data accuracy and readiness for advanced analysis.

Uploaded by

Khan Rahil Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Prac3 AAM

This document outlines practical steps for data preprocessing in data analysis and machine learning, including reading datasets in various formats (text, CSV, JSON, XML), identifying numeric and categorical attributes, and handling missing data. It emphasizes the importance of rescaling and encoding data, as well as performing feature selection based on correlation analysis to prepare datasets for modeling. Overall, these steps are crucial for ensuring data accuracy and readiness for advanced analysis.

Uploaded by

Khan Rahil Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Practical – 3:

Aim: - Perform following operations:


a. Write program to read dataset (Text, CSV, JSON, XML):
b. Which of the attributes are numeric and which are categorical?
c. Performing Data Cleaning, Handling Missing Data, Removing Null data:
d. Rescaling Data v. Encoding Data:
e. Feature Selection:

Introduction:
In data analysis and machine learning, working with datasets is key. In this Practical, we'll go through steps like
reading data, identifying types, handling missing info, and selecting important features. These steps prep data
for analysis or building models. They ensure data is accurate and ready for use.

Dataset Overview:
The dataset comprises information about cars, encompassing both numeric and categorical attributes. A
thorough understanding of these attributes is crucial before proceeding with any analysis.

Data Analysis:
1. Reading Data:
Text Format: The dataset in text format was read line by line, and each line was processed to extract relevant
information about the cars.
Code:
# Example code to read text file
with open("cars_dataset.txt", "r") as file:
for line in file:
# Process each line to extract information
pass

CSV Format: The dataset in CSV format was read into a Python environment using the Pandas library.
Code:
# Example code to read CSV file
import pandas as pd
df = pd.read_csv("cars_dataset.csv")

JSON Format: The dataset in JSON format was loaded into memory using Python's built-in JSON library.
Code:
# Example code to read JSON file

import json
with open("cars_dataset.json", "r") as file:
data = json.load(file)
XML Format: The dataset in XML format was parsed using Python's lxml library.
Code:
# Example code to read XML file
from lxml import etree
tree = etree.parse("cars_dataset.xml")
root = tree.getroot()
# Traverse XML structure to extract information
2. Attribute Types: The attributes in the dataset were categorized into two types:
 Numeric Attributes
 Categorical Attributes

Data Preprocessing:
Handling Missing Data: Missing data can hinder analysis and modeling. To address this issue, rows containing
null values were removed from the dataset.
Rescaling and Encoding: To prepare the data for analysis and modeling, rescaling and encoding were
performed:
 Rescaling Data: Numeric attributes were rescaled using min-max scaling to bring them within a
common range, ensuring fair comparison between attributes.
 Encoding Data: Categorical attributes were encoded using one-hot encoding to convert them into a
numerical format suitable for machine learning algorithms.

Feature Selection:
Feature Selection: Feature selection is crucial for building accurate predictive models. In this report, feature
selection was performed based on correlation analysis:
 Correlation Analysis: The correlation matrix was computed to identify features highly correlated with
the target variable. Features with correlation coefficients above a threshold (e.g., 0.5) were selected for
further analysis.

Conclusion:
In this practical, we explored various steps involved in preprocessing a dataset for analysis and modeling tasks.
By reading data in different formats such as text, CSV, JSON, and XML, we gained insights into the dataset's
structure. We identified numeric and categorical attributes, which provided a foundation for subsequent data
cleaning and preprocessing steps. By handling missing data and applying techniques like rescaling and
encoding, we ensured the dataset was ready for analysis. Additionally, feature selection based on correlation
analysis allowed us to focus on relevant attributes for predictive modeling. Overall, these preprocessing steps
are essential for ensuring data accuracy and readiness for advanced analysis techniques in data science and
machine learning.

You might also like