Prac3 AAM

This document outlines practical steps for data preprocessing in data analysis and machine learning, including reading datasets in various formats (text, CSV, JSON, XML), identifying numeric and categorical attributes, and handling missing data. It emphasizes the importance of rescaling and encoding data, as well as performing feature selection based on correlation analysis to prepare datasets for modeling. Overall, these steps are crucial for ensuring data accuracy and readiness for advanced analysis.

Uploaded by

Khan Rahil Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views2 pages

Prac3 AAM

Uploaded by

Khan Rahil Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Practical – 3:

Aim: - Perform following operations:

a. Write program to read dataset (Text, CSV, JSON, XML):
b. Which of the attributes are numeric and which are categorical?
c. Performing Data Cleaning, Handling Missing Data, Removing Null data:
d. Rescaling Data v. Encoding Data:
e. Feature Selection:

Introduction:
In data analysis and machine learning, working with datasets is key. In this Practical, we'll go through steps like
reading data, identifying types, handling missing info, and selecting important features. These steps prep data
for analysis or building models. They ensure data is accurate and ready for use.

Dataset Overview:
The dataset comprises information about cars, encompassing both numeric and categorical attributes. A
thorough understanding of these attributes is crucial before proceeding with any analysis.

Data Analysis:
1. Reading Data:
Text Format: The dataset in text format was read line by line, and each line was processed to extract relevant
information about the cars.
Code:
# Example code to read text file
with open("cars_dataset.txt", "r") as file:
for line in file:
# Process each line to extract information
pass

CSV Format: The dataset in CSV format was read into a Python environment using the Pandas library.
Code:
# Example code to read CSV file
import pandas as pd
df = pd.read_csv("cars_dataset.csv")

JSON Format: The dataset in JSON format was loaded into memory using Python's built-in JSON library.
Code:
# Example code to read JSON file

import json
with open("cars_dataset.json", "r") as file:
data = json.load(file)
XML Format: The dataset in XML format was parsed using Python's lxml library.
Code:
# Example code to read XML file
from lxml import etree
tree = etree.parse("cars_dataset.xml")
root = tree.getroot()
# Traverse XML structure to extract information
2. Attribute Types: The attributes in the dataset were categorized into two types:
 Numeric Attributes
 Categorical Attributes

Data Preprocessing:
Handling Missing Data: Missing data can hinder analysis and modeling. To address this issue, rows containing
null values were removed from the dataset.
Rescaling and Encoding: To prepare the data for analysis and modeling, rescaling and encoding were
performed:
 Rescaling Data: Numeric attributes were rescaled using min-max scaling to bring them within a
common range, ensuring fair comparison between attributes.
 Encoding Data: Categorical attributes were encoded using one-hot encoding to convert them into a
numerical format suitable for machine learning algorithms.

Feature Selection:
Feature Selection: Feature selection is crucial for building accurate predictive models. In this report, feature
selection was performed based on correlation analysis:
 Correlation Analysis: The correlation matrix was computed to identify features highly correlated with
the target variable. Features with correlation coefficients above a threshold (e.g., 0.5) were selected for
further analysis.

Conclusion:
In this practical, we explored various steps involved in preprocessing a dataset for analysis and modeling tasks.
By reading data in different formats such as text, CSV, JSON, and XML, we gained insights into the dataset's
structure. We identified numeric and categorical attributes, which provided a foundation for subsequent data
cleaning and preprocessing steps. By handling missing data and applying techniques like rescaling and
encoding, we ensured the dataset was ready for analysis. Additionally, feature selection based on correlation
analysis allowed us to focus on relevant attributes for predictive modeling. Overall, these preprocessing steps
are essential for ensuring data accuracy and readiness for advanced analysis techniques in data science and
machine learning.

Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
L.2 - Moby Dick
No ratings yet
L.2 - Moby Dick
51 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
dsbda-lab-manual
No ratings yet
dsbda-lab-manual
112 pages
Learning Predictive Analytics With Python Gain Practical Insights Into Predictive Modelling By Implementing Predictive Analytics Algorithms On Public Datasets With Python Gulipalli instant download
No ratings yet
Learning Predictive Analytics With Python Gain Practical Insights Into Predictive Modelling By Implementing Predictive Analytics Algorithms On Public Datasets With Python Gulipalli instant download
77 pages
Task Design & CALL Proceedings PDF
No ratings yet
Task Design & CALL Proceedings PDF
479 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
4 pages
DATA_SCIENCE_MANAUL (TE) (1)
No ratings yet
DATA_SCIENCE_MANAUL (TE) (1)
78 pages
11.ZEROS AND ONES
No ratings yet
11.ZEROS AND ONES
99 pages
35_cse_dwm
No ratings yet
35_cse_dwm
41 pages
Data analytics qp may 25 (1)
No ratings yet
Data analytics qp may 25 (1)
4 pages
plate-notebook-guided-project-1-1
No ratings yet
plate-notebook-guided-project-1-1
58 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
110 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
111 pages
ML LAB
No ratings yet
ML LAB
46 pages
Session 4 Machine Learning Process (1)
No ratings yet
Session 4 Machine Learning Process (1)
28 pages
report
No ratings yet
report
2 pages
exp1
No ratings yet
exp1
5 pages
PW3
No ratings yet
PW3
7 pages
DSBDA LAB_1_1736243987425
No ratings yet
DSBDA LAB_1_1736243987425
10 pages
Prac2 AAM
No ratings yet
Prac2 AAM
3 pages
Ass2 Transformation
No ratings yet
Ass2 Transformation
6 pages
Advance Python
No ratings yet
Advance Python
5 pages
Practical 01 Dms
No ratings yet
Practical 01 Dms
2 pages
PGDE 709 lecture 1
No ratings yet
PGDE 709 lecture 1
2 pages
UI21CS29_Lab2
No ratings yet
UI21CS29_Lab2
11 pages
Assignment 2 Itech1103
No ratings yet
Assignment 2 Itech1103
22 pages
Indian Aesthetics and Poetics
No ratings yet
Indian Aesthetics and Poetics
12 pages
AMM12153
No ratings yet
AMM12153
1 page
Untitled Document-18
No ratings yet
Untitled Document-18
4 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Missing Manual Pages
No ratings yet
Missing Manual Pages
7 pages
Delhivery Feature Engineering - Solution Approach
No ratings yet
Delhivery Feature Engineering - Solution Approach
7 pages
S-9
No ratings yet
S-9
18 pages
MSC Academic Internship Config Manual IDS Improvement Using MIGBM Feature Selection
No ratings yet
MSC Academic Internship Config Manual IDS Improvement Using MIGBM Feature Selection
19 pages
INFO 3607 - Fundamentals of WAN Technologies - Group 1 (1)
No ratings yet
INFO 3607 - Fundamentals of WAN Technologies - Group 1 (1)
10 pages
1 Grammar 1 - 9+10
No ratings yet
1 Grammar 1 - 9+10
14 pages
(Bohdan Dziemidok, Peter McCormick (Eds.) ) On The
No ratings yet
(Bohdan Dziemidok, Peter McCormick (Eds.) ) On The
310 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
Ads Phase3
No ratings yet
Ads Phase3
9 pages
Intro
No ratings yet
Intro
26 pages
Seventh Sanctum - Fantasy Race Generator
No ratings yet
Seventh Sanctum - Fantasy Race Generator
3 pages
Dlpu 112
No ratings yet
Dlpu 112
12 pages
ML2 Write-Ups Prac 1-5
No ratings yet
ML2 Write-Ups Prac 1-5
11 pages
Schedule Jee Advanced 2023 Batch 1
No ratings yet
Schedule Jee Advanced 2023 Batch 1
2 pages
Report On Petroleum Consumption Data Analytics: - Submitted by
No ratings yet
Report On Petroleum Consumption Data Analytics: - Submitted by
18 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Gita Meditations In-World
No ratings yet
Gita Meditations In-World
136 pages
Engo 645
No ratings yet
Engo 645
9 pages
SBI PO Mains 2023 Previous Year Paper
No ratings yet
SBI PO Mains 2023 Previous Year Paper
132 pages
PCA9955A
No ratings yet
PCA9955A
62 pages
Archetypal Character
No ratings yet
Archetypal Character
5 pages
Osy Chapter 02
No ratings yet
Osy Chapter 02
9 pages
Assignment03 DataScience Report
No ratings yet
Assignment03 DataScience Report
4 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Siga Los Ejemplos y Traduzca Las Siguientes Oraciones.: Possessive Pronouns
No ratings yet
Siga Los Ejemplos y Traduzca Las Siguientes Oraciones.: Possessive Pronouns
1 page
72b85f60-8523-423f-9efc-ff56aa21f3f3
No ratings yet
72b85f60-8523-423f-9efc-ff56aa21f3f3
29 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
AI-MAJOR-AUGUST - Aryal Ashish
No ratings yet
AI-MAJOR-AUGUST - Aryal Ashish
16 pages
RPH Tanjong Rhu
No ratings yet
RPH Tanjong Rhu
2 pages
S03 B2 SB Contents
No ratings yet
S03 B2 SB Contents
2 pages
Pep Practice Ability Test Booklet Answer Sheet C2a9 Pep Practice 2020
100% (1)
Pep Practice Ability Test Booklet Answer Sheet C2a9 Pep Practice 2020
9 pages
Machine Learning Lab Record Report
No ratings yet
Machine Learning Lab Record Report
38 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Client Release PDF
No ratings yet
Client Release PDF
16 pages
Upstream Unit 3 Test
No ratings yet
Upstream Unit 3 Test
4 pages
Prac 7
No ratings yet
Prac 7
5 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Kavin
No ratings yet
Kavin
13 pages
Phase 2
No ratings yet
Phase 2
14 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Experiment 01: AIM: To Perform Data Preparation Using Numpy and Panda. Theory
No ratings yet
Experiment 01: AIM: To Perform Data Preparation Using Numpy and Panda. Theory
5 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
ML (Prac1)
No ratings yet
ML (Prac1)
12 pages
Navigate A2 Unit Wordlist
100% (1)
Navigate A2 Unit Wordlist
24 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Sample Phase 2 Document
No ratings yet
Sample Phase 2 Document
7 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Wulf & Eadwacer With Two Translations
No ratings yet
Wulf & Eadwacer With Two Translations
2 pages
Loaders and Linkers
No ratings yet
Loaders and Linkers
37 pages
2019 Summer Question Paper (Msbte Study Resources)
No ratings yet
2019 Summer Question Paper (Msbte Study Resources)
4 pages
Weekly Diary Report-244
No ratings yet
Weekly Diary Report-244
9 pages
ENGLISH-LAS-Grade-4-Competency-2.2 - Final Edited
No ratings yet
ENGLISH-LAS-Grade-4-Competency-2.2 - Final Edited
7 pages
Predictive Modelling Project
No ratings yet
Predictive Modelling Project
29 pages
Essay On Test
No ratings yet
Essay On Test
21 pages
Module 1 - LESSON 3
No ratings yet
Module 1 - LESSON 3
11 pages
National Artists For Dance Music Film
No ratings yet
National Artists For Dance Music Film
36 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Prac3 AAM

Uploaded by

Prac3 AAM

Uploaded by

Practical – 3:

Aim: - Perform following operations:

You might also like