0% found this document useful (0 votes)
27 views

Final 1

Uploaded by

Aryan Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Final 1

Uploaded by

Aryan Mishra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Indian Institute Of Technlogy, Delhi

Project Report on: "Impact of Feature


Engineering on the Performance of Machine
Learning Model"

Submitted To:
Prof. Brejesh Lall,
Department of Electrical Engineering,
Indian Institute Of Technlogy, Delhi
Submitted By:

Aryan Mishra,
Roll No.: 22117023,
Semester: V,
Department of Electrical Engineering,
National Institute of Technology, Raipur
CERTIFICATE

This is to certify that the minor project report entitled "Impact of Feature
Engineering on the performance of Machine Learning Model",
submitted by Aryan Mishra is the bonafied work completed under
my supervision and guidance during his research internship at
Indian Institute of Technology, Delhi.

...........
Prof. Brejesh Lall,
Department of Electrical Engineering,
Indian Institute of Technology, Delhi

2
ACKNOWLEDGEMENTS

The portion of success is brewed by the efforts put in by many individuals. It is constant
support provided by people who give you the initiative, who inspire you at each step of your
endeavor that eventually helps you in your goal. I would like to express my sincere gratitude
to Prof. Brejesh Lall for providing me with the opportunity to undertake this internship
project. I would also like to extend my heartfelt thanks to Amit Oberoi Sir for his continuous
guidance and mentorship throughout the internship. Her knowledge, expertise, and
patience have been instrumental in shaping my research and enhancing my understanding
of the subject matter. I am grateful for his constant support, insightful discussions, and
valuable suggestions that have significantly enriched my learning experience.

Aryan Mishra
ROLL NO: 22117023
5th Sem
B.Tech[ElectricalEngineering]
National Institute of Technology Raipur

i
ABSTRACT

This internship report focuses on the analysis and preprocessing of datasets, along with the
study of machine learning algorithms and neural networks for pattern recognition. The
report begins with an exploration of the importance and limitations of datasets, followed by
an examination of different types of data and techniques for converting categorical and
continuous data. Additionally, correlation, covariance, and outlier detection methods are
discussed, along with strategies for treating outliers. Feature scaling and the application of
Principal Component Analysis (PCA) are also explored.

Chapter 3 introduces various machine learning algorithms, including regression


and classification models such as linear regression, support vector regression, logistic
regression, and support vector classification. The report further presents an internship
project that analyzes the impact of feature engineering on the performance of
different machine learning models, with a focus on logistic regression, support vector
classification.The project involves analysis and preprocessing of the dataset, followed by
prediction using the mentioned supervised machine learning algorithms.

Overall, this internship report offers valuable insights into the pre-processing of
datasets, the application of machine learning algorithms, and the fundamentals of neural
networks. The practical project demonstrates the impact of feature engineering on model
performance, while the exploration of neural networks expands the understanding of
pattern recognition. The findings from this report contribute to the broader field of
data analysis and machine learning, providing a foundation for further research and
application.

ii
TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Need of project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objective of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

CHAPTER 2: Dataset Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Importance of Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Limitation of Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Types of Data in a Dataset . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Converting Categorical Data to Numeric Data . . . . . . . . . . . . . . 7

2.4.1 One Hot Encoding . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.2 Ordinal Encoding . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.3 Dummy Variable Encoding . . . . . . . . . . . . . . . . . . . . 8

2.5 Converting Continuous Data to Discrete Data . . . . . . . . . . . . . . 8

2.5.1 Uniform Discretization: . . . . . . . . . . . . . . . . . . . . . 9

2.5.2 K-means Discretization: . . . . . . . . . . . . . . . . . . . . . 9

2.5.3 Quantile Discretization: . . . . . . . . . . . . . . . . . . . . . 9

2.6 Correlation and Covariance in Dataset . . . . . . . . . . . . . . . . . . 9

2.6.1 Correlation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6.2 Covariance: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

i
2.7 Detecting the Outliers in the Dataset . . . . . . . . . . . . . . . . . . . 11

2.7.1 Using Graphical Analysis . . . . . . . . . . . . . . . . . . . . 11

2.7.2 Using Z-score Analysis . . . . . . . . . . . . . . . . . . . . . . 11

2.7.3 Using Inter-Quartile Range Analysis . . . . . . . . . . . . . . . 12

2.8 Treating the Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.8.1 Trimming/Removing the outliers . . . . . . . . . . . . . . . . . 13

2.8.2 Mean/Median Imputation . . . . . . . . . . . . . . . . . . . . . 14

2.9 Feature Scaling or Transformation of Dataset . . . . . . . . . . . . . . 14

2.9.1 Min-Max Normalization . . . . . . . . . . . . . . . . . . . . . 15

2.9.2 Standardization/ Z-Score Normalization . . . . . . . . . . . . . 15

2.10 Principal Component Analysis(PCA) . . . . . . . . . . . . . . . . . . . 16

2.10.1 Working of PCA . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.10.2 Steps for PCA Algorithm . . . . . . . . . . . . . . . . . . . . . 17

2.10.3 Advantages of PCA . . . . . . . . . . . . . . . . . . . . . . . . 17

2.10.4 Disadvantages of PCA . . . . . . . . . . . . . . . . . . . . . . 17

2.10.5 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

CHAPTER 3: Machine Learning Algorithms. . . . . . . . . . . . . . . . . . 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Regression Models: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Support Vector Regression . . . . . . . . . . . . . . . . . . . . 22

3.3 Classification Models: . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.2 Support Vector Classification . . . . . . . . . . . . . . . . . . . 26

3.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

ii

You might also like