0% found this document useful (0 votes)

9 views

Project Report

Uploaded by

bellezechs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Project Report

Uploaded by

bellezechs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

PROJECT REPORT

ON
Predicting House Prices with Machine Learning

SUBMITTED BY : DIVYANSHU MISHRA

YASH PARCHA

SAGAR

DEVANSH GARG

INSTITUTE OF TECHNOLOGY AND SCIENCE

MOHAN NAGAR, GHAZIABAD
SINCE 1995

1
INDEX
1. Introduction to Python and Machine Learning 3-4

2. IMPORTING DATA 6

3. Gathering Basic information 7-11

4. Identification of Outliers using Box Plot and 12-17

Outliers
5. RESETING INDEX AFTER REMOVING OUTLIERS 18

6. Dataset Trend Visualization 19-26

7. Test and Train Splitting 27

8. Logistic REGRESSOR 28

9. ACCURACY AND PREDICTION 29

10. DECISION TREE 30

11. RANDOM FOREST 31-32

12. K - Nearest Neighbor 33

13. Gaussian Naive Bayes 34-35

14. DATA COMPARISION 36

2
Introduction to Python and Machine Learning
Introduction to python :
Python is a general-purpose, dynamically typed, high-level, compiled and interpreted, garbage-
collected, and purely object-oriented programming language that supports procedural, object-
oriented, and functional programming. It was Created by Guido van Rossum and first released in
1991, Python emphasizes code readability and allows programmers to express concepts in
fewer lines of code compared to languages like C++ or Java.

Why learn Python?

o Easy to use and Learn: Python has a simple and easy-to-understand syntax, unlike traditional
languages like C, C++, Java, etc., making it easy for beginners to learn.

o Interpreted Language: Python does not require compilation, allowing rapid development and
testing. It uses Interpreter instead of Compiler.

o Object-Oriented Language: It supports object oriented programming

i.e(inheritance,encapsulation,polymorphism,abstraction) making writing reusable and modular
code easy.

o Extensive Libraries : Python has a rich ecosystem of libraries and frameworks, such as NumPy,
Pandas, and Matplotlib, which simplify tasks like data manipulation and visualization.

Python Popular Frameworks and Libraries

o Mathematics - NumPy, Pandas, etc.

o REST framework: a toolkit for building RESTful APIs

o MachineLearning – Numpy, Seaborn, Matplotlib etc.

Where is Python used?

o Data Science: Python is important in this field because it is easy to use and has powerful tools
for data analysis and visualization like NumPy, Pandas, and Matplotlib.

o Machine Learning: Python is widely used for machine learning due to its simplicity, ease of
use, and availability of powerful machine learning libraries.

3
Introduction to Machine Learning
Machine learning (ML) is a subfield of artificial intelligence (AI) that involves the development of
algorithms and statistical models enabling computers to perform tasks without explicit instructions.
Instead, these systems learn patterns and make decisions based on data. Machine learning is
transforming various industries by automating complex processes, providing insights from large datasets,
and creating new opportunities for innovation.

Definition and Scope

Machine learning leverages computational methods to improve performance on a given task over time with
experience. This process involves:

1. Data Collection: Gathering large and diverse datasets.

2. Data Preprocessing: Cleaning and formatting data to be suitable for analysis.

3. Model Selection: Choosing an appropriate algorithm or model based on the task.

4. Training: Feeding the data into the model to learn patterns.

5. Evaluation: Assessing the model's performance using metrics and validation techniques.

6. Deployment: Implementing the model in real-world applications.

7. Maintenance: Continuously updating and refining the model as new data becomes available.

Types of Machine Learning

Machine learning techniques can be broadly categorized into three types:

1. Supervised Learning: The model is trained on a labeled dataset, meaning that each training example is paired
with an output label. Common algorithms include:

o Linear Regression

o Decision Trees

o Support Vector Machines (SVM)

o Neural Networks

2. Unsupervised Learning: The model is provided with unlabeled data and must find inherent patterns or
groupings. Common algorithms include:

o Clustering (e.g., K-Means, Hierarchical Clustering)

o Association Rules (e.g., Apriori, Eclat)

o Principal Component Analysis (PCA)

4
3. Reinforcement Learning: The model learns by interacting with an environment, receiving rewards or penalties
based on its actions, and aims to maximize cumulative rewards. Key concepts include:

o Markov Decision Processes (MDP)

o Q-Learning

o Deep Q-Networks (DQN)

5
IMPORTING DATA
. Libraries

Pandas: It provides data structures like Data Frames and Series to handle and analyze data
efficiently. NumPy is a library for numerical computing in Python. Seaborn is a statistical data
visualization library. Matplotlib is a plotting library for Plotting graphs, histograms, scatter plots,
and customizing visualizations.

. Dataset Read

pd. read_csv(' Housing.csv') is a function in Pandas that reads a comma-separated values (CSV)
file into a Data Frame df. This function is reading the weather_classification_data.

6
Gathering Basic information
df. shape

df. shape returns a tuple representing the dimensionality of the Data

Frame. This dataset contains 545 rows and 13 columns.

df.info ()

df.info () is a function that returns the count of rows, null status and the
datatype of each column. Example The column price has 545 rows each
containing non null values and datatype is int64.

7
df. count ()

df. count () function is used to return the number of rows in each

column in the dataset. Example area has 545 rows of data.

df.min()

8
df.min () function is used to return the minimum value from each
column. For example, area has 1650 as the minimum value in the entire
column.

df.max () function is used to return the maximum value from each

column. For example, area has 16200 as the maximum value.

df.describe()

df. describe () function returns the Measure of Central Tendency and

Five Point Summary.

9
df.head()

df. head () function returns the top 5 rows of the dataset.

df. tail ()

df. tail () function returns the bottom 5 rows of the dataset.

10
df.isnull().sum()

df.isnull().sum() is a chain function formed using isnull() and sum()

functions it shows the sum of null values in a column.

df. duplicated().sum()

df. duplicated().sum() function returns the total duplicate values in the

dataset. This dataset has 0 duplicate values.

df.drop_duplicates(inplace=True) function drop all the duplicate values.

11
Identification of Outliers using Box Plot and Outliers

→Figure represents the Boxplot for price.

12
→ Figure represents the Boxplot for area.

13
→Figure represents the Boxplot for bedrooms

14
→Figure represents the Boxplot for bathrooms

15
→ Figure represents the Boxplot for stories.

16
→ Figure represents the Boxplot for parking.

17
RESETING INDEX AFTER REMOVING OUTLIERS

Outliers are data points that deviate significantly from the rest of the
dataset. They can arise due to measurement errors, data entry errors,
or inherent variability in the data. Outliers can skew the analysis results,
leading to inaccurate conclusions. Therefore, it is essential to identify
and handle outliers before performing statistical analysis.

18
Dataset Trend Visualization
PAIRPLOTS

Pair plots are particularly useful in the context of outlier detection and data preprocessing. They
provide a clear visual representation of how each variable interacts with others, making it easier
to spot anomalies that do not follow the general pattern of the data.

19
SCATTER PLOT

The image shows a scatter plot generated using the sns.scatterplot()

function from the Seaborn library. Scatter plots are useful for visualizing
the relationship between two continuous variables. They help identify
trends, patterns, and potential outliers in the data.

20
REGRESSION PLOT

A regression plot is a graphical representation of the relationship

between two or more variables, typically used to show how a
dependent variable changes as an independent variable changes. Here’s
a step-by-step explanation of how to create a regression plot.

21
BAR PLOT

A bar plot (or bar chart) is a graphical display of data using bars of
different heights. It is commonly used to compare quantities across
different categories. Here’s an explanation of how to create and
interpret a bar plot.

22
HISTOGRAM

A histogram is a type of bar chart that represents the distribution of a

dataset. It is used to show the frequency (or count) of data points that
fall within specified ranges (bins). Here’s a step-by-step explanation of
how to create and interpret a histogram.

23
LINE PLOT

A line plot (or line chart) is a type of chart used to display information
as a series of data points called 'markers' connected by straight line
segments. It is commonly used to visualize trends over time. Here’s a
step-by-step explanation of how to create and interpret a line plot.

24
COUNTER PLOT

A counter plot, often referred to as a count plot, is used to visualize the

count of observations in each category of a categorical variable. It is
particularly useful for understanding the distribution of categorical data
and comparing the frequencies of different categories.

25
CORRELATION HEAMAP

The image shows a correlation heatmap generated using the

sns.heatmap() function from the Seaborn library. Correlation heatmaps
are useful for visualizing the strength and direction of relationships
between pairs of variables in a dataset.

26
Test and Train Splitting

The x and y variables are separated into independent and dependent

values using `iloc`. Columns from index 10 to 16 (excluding 16) are
assigned to x, while the last column, which represents the weather
type, is assigned to y as the dependent variable.

27
LOGISTIC REGRESSOR

The x and y variables are further divided into `x_train`, `y_train`,

`x_test`, and `y_test`. The `x_train` and `y_train` subsets are used for
training the model, while `x_test` and `y_test` are used for evaluating
the model. The test size is set to 0.2, meaning 20% of the dataset will be
used for testing. The random state is set to 11 to ensure reproducibility
by controlling the selection of training rows.

ACCURACY AND PREDICION

28
Prediction: The act of using a model to forecast the outcomes based on
input data.

Accuracy: A measure of how many of those predictions were correct,

often used as a performance metric for classification models.

DECISION TREE

29
`DecisionTreeClassifier` is imported from `sklearn` and assigned to the variable `treemodel`,
with a maximum depth of 2. The model is then trained using `x_train` and `y_train`.

The max depth with value 2 graph is plotted using decision tree model. A tree plot visually
represents the structure of a decision tree, illustrating how decisions are made based on feature
values. It shows the tree's nodes, branches, and leaves, detailing the splits and outcomes at
each node.

RANDOM FOREST

30
The following libraries are imported: `roc_curve`, `auc`, `classification_report`,
`GridSearchCV`, and `RandomForestClassifier`. The `time` library is also imported.
A `RandomForestClassifier` model.

Parameters like `max_depth`, `bootstrap`, `max_features`, and `criterion` are set to optimize the
accuracy of the dataset. The best parameters are determined using `GridSearchCV`. The `cv_rf`
(a `GridSearchCV` instance) is used to fit the model on `x_train` and `y_train`.

31
The values of `x_test` are used to generate predictions, and these predictions are compared
with `y_test` to calculate the accuracy.

32
K - Nearest Neighbor

The `KNeighborsClassifier` is imported and instantiated with

`n_neighbors` set to 10, assigning it to the variable `knn`.

The `confusion_matrix` library is imported, and a confusion matrix is

generated using `y_test` and `y_prediction` to identify and analyse the
errors in the model's predictions. The values of `y_test` and
`y_prediction` are compared to determine the model's accuracy.

33
Gaussian Naive Bayes

The `GaussianNB` library is imported from `sklearn`, and the model is

fitted using `x_train` and `y_train`.

The predicted value of x_test is assigned to a variable named as pred.

34
A heatmap for true and predicted values visualizes the performance of a
classification model by showing how often each combination of actual and
predicted classes occurs. It helps in understanding the distribution of prediction
errors and correct classifications.

Using the `GaussianNB` model, the predicted values (`y_pred`) are

compared with the actual values (`y_test`).

35
Data Comparison

A table summarizing the different algorithms and their corresponding

accuracies.

36
END OF REPORT

Oxford University Press - Online Resource Centre - Multiple Choice Questions
100% (1)
Oxford University Press - Online Resource Centre - Multiple Choice Questions
3 pages
Descriptive Research Example PDF
No ratings yet
Descriptive Research Example PDF
62 pages
Binomial Logistic Regression Using SPSS
No ratings yet
Binomial Logistic Regression Using SPSS
11 pages
T04 PDF
No ratings yet
T04 PDF
3 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Report
No ratings yet
Report
11 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
House Report
No ratings yet
House Report
26 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (9)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
Group
No ratings yet
Group
43 pages
Report File
No ratings yet
Report File
40 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Presentation 21
No ratings yet
Presentation 21
9 pages
module_2
No ratings yet
module_2
35 pages
AIMLlatestmodule 2Notes Removed
No ratings yet
AIMLlatestmodule 2Notes Removed
33 pages
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
No ratings yet
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
156 pages
Module 2
No ratings yet
Module 2
20 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
data science
No ratings yet
data science
42 pages
Data Science & Machine Learning Using Python - CDR
No ratings yet
Data Science & Machine Learning Using Python - CDR
8 pages
DeepCreditRisk_TOC (1)
No ratings yet
DeepCreditRisk_TOC (1)
11 pages
Python Machine Learning - Session 2
No ratings yet
Python Machine Learning - Session 2
6 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Stat and Machine Learning Python PDF
No ratings yet
Stat and Machine Learning Python PDF
300 pages
Machine Learning With Python
100% (2)
Machine Learning With Python
137 pages
Python - Follow Dr. AngShu (@drangshu) For More
100% (1)
Python - Follow Dr. AngShu (@drangshu) For More
300 pages
Manoj 5th sem project report[1]
No ratings yet
Manoj 5th sem project report[1]
20 pages
CreditRisk TOC
No ratings yet
CreditRisk TOC
10 pages
Statistics Machine Learning Python
No ratings yet
Statistics Machine Learning Python
437 pages
Libro Nuevo ML
No ratings yet
Libro Nuevo ML
577 pages
Py Spark
83% (6)
Py Spark
195 pages
Python data science cookbook over 60 practical recipes to help you explore Python and its robust data science capabilities Subramanian download pdf
100% (6)
Python data science cookbook over 60 practical recipes to help you explore Python and its robust data science capabilities Subramanian download pdf
71 pages
Statistics and Machine Learning in Python
No ratings yet
Statistics and Machine Learning in Python
300 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
223 pages
Report Print
No ratings yet
Report Print
22 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Data Mining With Python (2024)
No ratings yet
Data Mining With Python (2024)
415 pages
Python Data Analysis Second Edition Armando Fandangoinstant download
100% (1)
Python Data Analysis Second Edition Armando Fandangoinstant download
54 pages
Statistics Machine Learning Python Draft
100% (1)
Statistics Machine Learning Python Draft
333 pages
DsNaIT v2.0
No ratings yet
DsNaIT v2.0
43 pages
ML-LAB-Manual
No ratings yet
ML-LAB-Manual
18 pages
Data Science Masters Pro 2024 ( Syllabus )
No ratings yet
Data Science Masters Pro 2024 ( Syllabus )
16 pages
Orange 3
100% (1)
Orange 3
46 pages
Pandas: Powerful Python Data Analysis Toolkit: Release 0.7.1
No ratings yet
Pandas: Powerful Python Data Analysis Toolkit: Release 0.7.1
283 pages
(Ebook) Effective Pandas: Patterns for Data Manipulation by Matt Harrison ISBN 9798772692936, 8772692936 - The full ebook version is ready for instant download
100% (2)
(Ebook) Effective Pandas: Patterns for Data Manipulation by Matt Harrison ISBN 9798772692936, 8772692936 - The full ebook version is ready for instant download
72 pages
Python data science cookbook over 60 practical recipes to help you explore Python and its robust data science capabilities Subramanian pdf download
No ratings yet
Python data science cookbook over 60 practical recipes to help you explore Python and its robust data science capabilities Subramanian pdf download
47 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Datascienceusing Python Training
No ratings yet
Datascienceusing Python Training
11 pages
CE880_Lecture3_slides
No ratings yet
CE880_Lecture3_slides
44 pages
DATASCIENCE_INTERNSHIP[1]
No ratings yet
DATASCIENCE_INTERNSHIP[1]
43 pages
Learningthepandaslibrary PDF
100% (1)
Learningthepandaslibrary PDF
233 pages
Statistics and Machine Learning in Python
No ratings yet
Statistics and Machine Learning in Python
218 pages
Statistics Machine Learning Python
No ratings yet
Statistics Machine Learning Python
399 pages
DMV-U4-RK
No ratings yet
DMV-U4-RK
16 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
StatisticsMachineLearningPythonDraft
No ratings yet
StatisticsMachineLearningPythonDraft
329 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
313 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Advanced NumPy Techniques: A Comprehensive Guide to Data Analysis and Computation
From Everand
Advanced NumPy Techniques: A Comprehensive Guide to Data Analysis and Computation
Peter Jones
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Instant download (Ebook) Introductory Statistics for the Life and Biomedical Sciences by Julie Vu, David Harrington pdf all chapter
100% (3)
Instant download (Ebook) Introductory Statistics for the Life and Biomedical Sciences by Julie Vu, David Harrington pdf all chapter
71 pages
Effect of Work Life Balance and Employee Productivity in Nigerian Organizations
No ratings yet
Effect of Work Life Balance and Employee Productivity in Nigerian Organizations
27 pages
Elliot M. Cramer R. Darrell Bock
No ratings yet
Elliot M. Cramer R. Darrell Bock
24 pages
Summer Internship Project.
No ratings yet
Summer Internship Project.
43 pages
Rules For Working On AMOS: Rule No.1:: Analysis of Moment Structure (Amos)
100% (1)
Rules For Working On AMOS: Rule No.1:: Analysis of Moment Structure (Amos)
18 pages
Data Visualization
No ratings yet
Data Visualization
55 pages
Keywords: Hostel Life, Academic Performance, Personality, Behavior, Social Environment
No ratings yet
Keywords: Hostel Life, Academic Performance, Personality, Behavior, Social Environment
11 pages
Effect of Relationship Marketing On Customer Satisfaction in Banks
No ratings yet
Effect of Relationship Marketing On Customer Satisfaction in Banks
6 pages
BRM Multi Var
No ratings yet
BRM Multi Var
38 pages
Goal Scoring in Elite Male Football A Systematic R
No ratings yet
Goal Scoring in Elite Male Football A Systematic R
14 pages
Applied Multivariate Statistical Analysis 6th Edition Johnson Solutions Manualpdf download
100% (4)
Applied Multivariate Statistical Analysis 6th Edition Johnson Solutions Manualpdf download
50 pages
Get Applied Biostatistics for the Health Sciences 2nd Edition Richard J. Rossi free all chapters
100% (1)
Get Applied Biostatistics for the Health Sciences 2nd Edition Richard J. Rossi free all chapters
47 pages
Explanation of The Conceptual Framework
No ratings yet
Explanation of The Conceptual Framework
1 page
CU-2020 B.A. B.Sc. (Honours) Economics Semester-V Paper-DSE-A-1P Practical QP
No ratings yet
CU-2020 B.A. B.Sc. (Honours) Economics Semester-V Paper-DSE-A-1P Practical QP
2 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
5 pages
Machine Learning Applications For Building Structural Design and Performance Assessment
No ratings yet
Machine Learning Applications For Building Structural Design and Performance Assessment
41 pages
Operations Management Reviewer
No ratings yet
Operations Management Reviewer
4 pages
CS3491 - Notes - Unit 3 - Supervised Learning
No ratings yet
CS3491 - Notes - Unit 3 - Supervised Learning
34 pages
Download full (Ebook) Primer of applied regression and analysis of variance by Glantz S.A., Slinker B.K., Neilands T.B ISBN 9780071822442, 9780071824118, 0071822445, 0071824111 ebook all chapters
100% (9)
Download full (Ebook) Primer of applied regression and analysis of variance by Glantz S.A., Slinker B.K., Neilands T.B ISBN 9780071822442, 9780071824118, 0071822445, 0071824111 ebook all chapters
65 pages
Unit_3_&_4[2]
No ratings yet
Unit_3_&_4[2]
63 pages
PR 2 Module 1
No ratings yet
PR 2 Module 1
27 pages
Research Question Qualitative & Quantitative Research
100% (1)
Research Question Qualitative & Quantitative Research
25 pages
Quantitative Research Cycle
No ratings yet
Quantitative Research Cycle
7 pages
Mastering Predictive Analytics with R 2nd edition Edition Forte All Chapters Instant Download
100% (4)
Mastering Predictive Analytics with R 2nd edition Edition Forte All Chapters Instant Download
81 pages
Bougheas et al. 2009 1-s2.0-S0378426608001830-main
No ratings yet
Bougheas et al. 2009 1-s2.0-S0378426608001830-main
8 pages
Assumption of Regresion
No ratings yet
Assumption of Regresion
18 pages

Project Report

Uploaded by

Project Report

Uploaded by

PROJECT REPORT

SUBMITTED BY : DIVYANSHU MISHRA

INSTITUTE OF TECHNOLOGY AND SCIENCE

3. Gathering Basic information 7-11

4. Identification of Outliers using Box Plot and 12-17

6. Dataset Trend Visualization 19-26

7. Test and Train Splitting 27

9. ACCURACY AND PREDICTION 29

10. DECISION TREE 30

11. RANDOM FOREST 31-32

12. K - Nearest Neighbor 33

13. Gaussian Naive Bayes 34-35

14. DATA COMPARISION 36

Why learn Python?

o Object-Oriented Language: It supports object oriented programming

Python Popular Frameworks and Libraries

o REST framework: a toolkit for building RESTful APIs

o MachineLearning – Numpy, Seaborn, Matplotlib etc.

Where is Python used?

Definition and Scope

1. Data Collection: Gathering large and diverse datasets.

2. Data Preprocessing: Cleaning and formatting data to be suitable for analysis.

3. Model Selection: Choosing an appropriate algorithm or model based on the task.

4. Training: Feeding the data into the model to learn patterns.

6. Deployment: Implementing the model in real-world applications.

Types of Machine Learning

o Support Vector Machines (SVM)

o Clustering (e.g., K-Means, Hierarchical Clustering)

o Association Rules (e.g., Apriori, Eclat)

o Principal Component Analysis (PCA)

o Markov Decision Processes (MDP)

o Deep Q-Networks (DQN)

df. shape returns a tuple representing the dimensionality of the Data

df. count () function is used to return the number of rows in each

df.max () function is used to return the maximum value from each

df. describe () function returns the Measure of Central Tendency and

df. head () function returns the top 5 rows of the dataset.

df. tail () function returns the bottom 5 rows of the dataset.

df.isnull().sum() is a chain function formed using isnull() and sum()

df. duplicated().sum() function returns the total duplicate values in the

df.drop_duplicates(inplace=True) function drop all the duplicate values.

→Figure represents the Boxplot for price.

The image shows a scatter plot generated using the sns.scatterplot()

A regression plot is a graphical representation of the relationship

A histogram is a type of bar chart that represents the distribution of a

A counter plot, often referred to as a count plot, is used to visualize the

The image shows a correlation heatmap generated using the

The x and y variables are separated into independent and dependent

The x and y variables are further divided into `x_train`, `y_train`,

ACCURACY AND PREDICION

Accuracy: A measure of how many of those predictions were correct,

The `KNeighborsClassifier` is imported and instantiated with

The `confusion_matrix` library is imported, and a confusion matrix is

The `GaussianNB` library is imported from `sklearn`, and the model is

The predicted value of x_test is assigned to a variable named as pred.

Using the `GaussianNB` model, the predicted values (`y_pred`) are

A table summarizing the different algorithms and their corresponding

You might also like