0% found this document useful (0 votes)

355 views

Report of Profit Prediction

This document describes a machine learning model that uses linear regression to predict company profit based on R&D spend, administration cost, and marketing spend. It discusses data science, machine learning, issues with existing methods, and describes the proposed model and algorithm in detail.

Uploaded by

Harsh Garg 24601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

355 views

Report of Profit Prediction

Uploaded by

Harsh Garg 24601

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

lOMoAR cPSD| 13119880

Exposys Data Labs

Bengaluru, Karnataka, 560064

Internship report on
PROFIT PREDICTION OF 50 COMPANIES

A Dissertation work submitted in partial fulfilment of the requirement for the award of the degree of
Internship
By

Name- Harsh Garg

College- Indian Institute of Technology (Indian School of Mines), Dhanbad

Under the guidance of

Exposys Data Labs

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

Abstract

In today's highly competitive business world, companies need

to optimize their resources to maximize their profits. This ML
model aims to predict the profit value of a company based on
its R&D Spend, Administration Cost, and Marketing Spend,
providing insights for decision-making processes. The model
employs a linear regression algorithm that analyzes the
relationship between the independent variables (R&D Spend,
Administration Cost, and Marketing Spend) and the dependent
variable (Profit) to generate accurate predictions. The model
has been trained on a large dataset and tested on a separate test
dataset, achieving a high level of accuracy. The results
demonstrate the potential of this model to aid companies in
making informed decisions about their resource allocation
strategies and achieving their financial goals.

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

Table of Contents

Abstract
1 Introduction 4-5
1.1 Data Science 4
1.2 Machine Learning 5
2 Existing Methods 6
2.1 Issues in existing Systems 6
3 Proposed method 7-9
3.1 Algorithm 7
4 Methodology 10-11
4.1 Data Collection 10
4.2 Data Preprocessing 11
4.3 Feature Selection 11
4.4 Split Data into Train and Test Set
4.5 Train the Model
4.6 Evaluate the Model
4.7 Optimize the Model
4.8 Deploy the Model
5 Implementation 12-13
5.1 Source Code 12-13
6 Conclusion 14
7 References 15

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

1. Introduction

1.1 Data Science

Data science is a multidisciplinary field that utilizes scientific methods,

processes, algorithms, and systems to extract knowledge and insights
from structured and unstructured data. It combines elements from
mathematics, statistics, computer science, and domain knowledge to
uncover patterns, trends, and relationships within vast amounts of
information. By employing techniques such as data mining, machine
learning, and predictive modeling, data scientists can identify valuable
insights and make informed decisions.

Data science plays a crucial role in various industries, including

finance, healthcare, marketing, and technology. It enables organizations
to leverage data-driven strategies, optimize operations, and improve
decision-making processes. Data scientists employ various tools and
programming languages, such as Python, R, and SQL, to collect, clean,
analyze, and visualize data.

Moreover, data science has the potential to address complex problems

and make significant contributions to society. From predicting disease
outbreaks and optimizing transportation systems to improving
renewable energy and enhancing customer experiences, data science
has the power to drive innovation and create positive societal impact.

As data continues to grow exponentially, data science will remain at the

forefront of technological advancements, driving innovation and
transforming industries across the globe.

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

1.2 Machine Learning

Machine learning is a branch of artificial intelligence that focuses on

developing algorithms and models capable of learning and making
predictions or decisions without being explicitly programmed. It
enables computers to learn from data and improve their performance
through experience.

Machine learning algorithms analyze vast amounts of data, identify

patterns, and make predictions or take actions based on those patterns.
The process involves training the algorithm on a labeled dataset, where
it learns to recognize patterns and make accurate predictions. The
algorithm's performance is then evaluated using test data to measure its
effectiveness.

Machine learning has applications in various fields, including

healthcare, finance, marketing, and robotics. It enables personalized
recommendations, fraud detection, image and speech recognition,
autonomous vehicles, and many other intelligent systems.

With the advancements in computing power, availability of large

datasets, and the development of sophisticated algorithms, machine
learning has gained significant attention and is poised to revolutionize
industries. It has the potential to unlock valuable insights from data,
automate processes, and drive innovation across sectors, making it a
crucial component of the technology-driven future.

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

2. Existing Methods

There may be several existing systems that attempt to predict the profit
value of a company based on its expenses such as R&D spend,
administration cost, and marketing spend. However, many of these
systems may rely on manual calculations or basic statistical techniques
that may not accurately capture the complex relationships between
these variables.

Machine learning models, on the other hand, can learn from data and
make accurate predictions based on patterns in the data. In this context,
linear regression models have been widely used for predicting
continuous target variables such as profit. The model estimates the
relationship between the independent variables and the dependent
variable by fitting a linear equation to the data.

However, many existing linear regression models may not be optimized

for the specific features of the data, and thus may not perform
optimally. Therefore, there is a need for an ML model that is
specifically designed to accurately predict the profit value of a company
based on its expenses, taking into account all relevant features of the
data.

2.1 Issues in existing systems

1. Limited Accuracy

2. Overfitting and Under fitting

3. Limited Scope

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

3. Proposed Method

The proposed system is an ML model that utilizes a linear regression

algorithm to predict the profit value of a company based on its R&D
Spend, Administration Cost, and Marketing Spend. The model takes in
a dataset of previous company financial records, which includes the
independent variables of R&D Spend, Administration Cost, and
Marketing Spend, and the dependent variable of Profit.

The proposed system addresses the drawbacks of the existing system by

incorporating a more accurate and efficient algorithm for prediction.
Additionally, the model includes data preprocessing steps, such as
normalization and feature scaling, to ensure the accuracy of the
prediction. The model is also evaluated using various performance
metrics, such as Mean Squared Error (MSE) and R-squared (R2), to
validate its accuracy.

The proposed system offers a more accurate and efficient method for
predicting company profits, which can be useful for businesses in
making informed financial decisions. The model can also be further
improved by incorporating additional relevant variables or using more
advanced algorithms, such as neural networks or decision trees.

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

3.1 Algorithm

1. Load the dataset containing the company's R&D Spend,

Administration Cost,
Marketing Spend, and Profit.

2. Split the dataset into training and testing sets.

3. Train the linear regression model on the training set.

4. Predict the profit values for the testing set using the trained model.

5. Evaluate the performance of the model using evaluation metrics such

as mean squared
error, mean absolute error, and R-squared score.

6. If the performance of the model is not satisfactory, tune the model by

adjusting the

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

The linear regression algorithm is a simple yet powerful algorithm that

can predict the target variable (Profit in this case) based on the input
variables (R&D Spend, Administration Cost, and Marketing Spend). It
works by fitting a straight line to the data that minimizes the sum of
squared errors between the predicted values and the actual values. The
line's equation is given by:

y = b0 + blxl+ b2x2 + b3*x3

where y is the predicted value of Profit, x1, x2, and x3 are the input
variables (R&D Spend, Administration Cost, and Marketing Spend),
and b0, b1, b2, and b3 are the coefficients that are learned during
training.

During training, the linear regression algorithm adjusts the coefficients

to minimize the sum of squared errors between the predicted values and
the actual values. This is done using an optimization algorithm called
gradient descent. Once the coefficients are learned, the model can be
used to predict the profit values for new companies based on their R&D
Spend, Administration Cost, and Marketing Spend.

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

Methodology

The methodology for building an ML model that can predict the profit
value using linear
regression can be broken down into the following steps:

4.1 Data Collection: Collect data from various sources such as

company financial records, public financial records, and other relevant
sources.

4.2 Data Preprocessing: Clean and preprocess the data to ensure it is

in a format suitable for training an ML model. This may include tasks
such as removing missing or inconsistent data, normalizing the data,
and encoding categorical variables.

4.3 Feature Selection: Determine which features are most relevant for
predicting the profit value of a company. In this case, the selected
features are R&D Spend, Administration Cost, and Marketing Spend.

4.4 Split Data into Train and Test Sets: Split the data into a training
set and a test set. The training set will be used to train the linear
regression model, while the test set will be used to evaluate the model's
performance

4.4 Split Data into Train and Test Sets: Split the data into a training
set and a test set.The training set will be used to train the linear
regression model, while the test set will be used to evaluate the model's
performance.

4.5 Train the Model: Train a linear regression model using the training
data.

10
Downloaded by Harsh Garg 24601 ([email protected])
lOMoAR cPSD| 13119880

4.6 Evaluate the Model: Evaluate the performance of the model using
the test data. This
may involve metrics such as mean squared error or R-squared.

4.7 Optimize the Model: Optimize the model by adjusting

hyperparameters such as
regularization strength or learning rate.

4.8 Deploy the Model: Once the model has been optimized, it can be
deployed for use in
predicting the profit value of a company based on R&D Spend,
Administration Cost, and
Marketing Spend.

11
Downloaded by Harsh Garg 24601 ([email protected])
lOMoAR cPSD| 13119880

Implementation

5.1 Source Code

Import the necessary libraries

import matplotlib.pyplot as plt

import pandas as pd
import seaborn as sns
import sklearn

Loading and Analyzing Data

dataset = pd.read_csv('50_Startups.csv')

dataset.head()
dataset.tail()
dataset.describe()
print('There are' , dataset.shape[0],'rows and' , dataset.shape[1],'columns
in the dataset')

print('There are' , dataset.duplicated().sum(),'duplicate values in the

dataset')

dataset isnull().sum()

dataset.info()

c=dataset.corr()
c

sns.heatmap+c,annot=True,cmap='Blues')
plt.show()
outliers = ['Profit']
plt.rcParams['figure.figsize'] =[8,8]
sns.boxplot(data=dataset[outliers], orient='v', palette = 'Set2' , width
=0.7)
12

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

plt.title ('Outliers Variables Distribution ')

plt.ylabel('Profit Range')
plt.xlabel('Continuous Variable ')
plt.show()

sns.distplot(dataset['Profit'], bins=5, kde=True)

plt.show()

sns.pairplot(dataset)
plt.show()

Model Development and Training

x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,3].values

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,train_size=0.7 ,
random_state =0)
x_train

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train,y_train)

Testing

y_pred = model.predict(x_test)

testing_data_model_score = model.score(x_test,y_test)

df = pd.DataFrame(data={'Predicted value' : y_pred.flatten(), 'Actual

value' : y_test.flatten()})

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

Model Evaluation

from sklearn.metrics import r2_score

r2_score = r2_score (y_pred,y_test)
print('R2 score of the model is' ,r2_score)

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_pred,y_test)
print('Mean squared error of the model is' ,mse)

import numpy as np
rmse = np.sqrt(mean_square_error(y_pred,y_test))
print(' Root mean squared error of the model is' ,rmse)

from sklearn.metrics import mean_absolute_error

mae = mean_absolute_error(y_pred,y_test)
print('Mean absolute error of the model is' ,mse)

Downloaded by Harsh Garg 24601 ([email protected])

lOMoAR cPSD| 13119880

Conclusion

In conclusion, the Linear Regression model developed in this project

can accurately predict the profit value of a company based on R&D
Spend, Administration Cost, and Marketing Spend. The model was
trained on a dataset containing information about several companies
and their respective profits. The model was evaluated using metrics
such as Mean Squared Error and R-squared, which showed that it is a
good fit for the data and can be used to make accurate predictions.

The proposed system has several advantages over the existing systems,
as it uses more relevant features and a better machine learning
algorithm. This model can be used by investors and businesses to make
more informed decisions about where to invest their money and how to
improve their profits.

Overall, the project has been successful in developing an ML model

that can predict the profit value of a company based on R&D Spend,
Administration Cost, and Marketing Spend with high accuracy, which
can have significant practical applications in the business world.

Downloaded by Harsh Garg 24601 ([email protected])

Hash (4) Cuckoo Hashing
No ratings yet
Hash (4) Cuckoo Hashing
15 pages
Dereja Academy: Worksheet Personal Vision Statement
No ratings yet
Dereja Academy: Worksheet Personal Vision Statement
2 pages
Distributed Systems Laboratory Manual
No ratings yet
Distributed Systems Laboratory Manual
97 pages
Information Systems blueprint (1)
No ratings yet
Information Systems blueprint (1)
26 pages
Chapter 2 Pointers
No ratings yet
Chapter 2 Pointers
36 pages
Lgebraic AND Ranscendental Quations: Chapt Er
No ratings yet
Lgebraic AND Ranscendental Quations: Chapt Er
18 pages
SAAD Lecture V - System Design
No ratings yet
SAAD Lecture V - System Design
63 pages
Seminar Information System
No ratings yet
Seminar Information System
18 pages
Chapter 4 Application and OS Security
No ratings yet
Chapter 4 Application and OS Security
49 pages
Mizan Tepi University
No ratings yet
Mizan Tepi University
14 pages
Chimdesa Gedefa Assignment #2 Causal and Entry Consistency
No ratings yet
Chimdesa Gedefa Assignment #2 Causal and Entry Consistency
15 pages
NOTIFICATION SYSTEM
No ratings yet
NOTIFICATION SYSTEM
5 pages
Chapter 5 - Security Mechanisms-Unlocked122
No ratings yet
Chapter 5 - Security Mechanisms-Unlocked122
8 pages
Genet
No ratings yet
Genet
14 pages
Design & Analysis of Algorithms Quiz
No ratings yet
Design & Analysis of Algorithms Quiz
2 pages
Mid Examination
No ratings yet
Mid Examination
2 pages
Applications of Embedded Systems
No ratings yet
Applications of Embedded Systems
13 pages
Mysql Database:: How To Connect To Databse
No ratings yet
Mysql Database:: How To Connect To Databse
9 pages
10 Emerging Wireless Networks: UWB, FSO, MANET, and Flash OFDM
No ratings yet
10 Emerging Wireless Networks: UWB, FSO, MANET, and Flash OFDM
39 pages
Last Submission
No ratings yet
Last Submission
106 pages
Inte 314 Advanced Internet Programming
No ratings yet
Inte 314 Advanced Internet Programming
3 pages
os final notes for exam preparation
No ratings yet
os final notes for exam preparation
7 pages
PLAN and Org
No ratings yet
PLAN and Org
20 pages
ENDATITEFA
No ratings yet
ENDATITEFA
69 pages
Blueprint, Software Engineering & Computing Technology at MoE
No ratings yet
Blueprint, Software Engineering & Computing Technology at MoE
17 pages
Lab 15 - Remote Method Invocation
No ratings yet
Lab 15 - Remote Method Invocation
12 pages
Project Proposal For Wollega University Network Monitoring Using Icinga Tool
100% (1)
Project Proposal For Wollega University Network Monitoring Using Icinga Tool
20 pages
You Are An IT Assistance in Commercial Bank of Ethiopia and The Following Problems Is Happened in The Morning
No ratings yet
You Are An IT Assistance in Commercial Bank of Ethiopia and The Following Problems Is Happened in The Morning
4 pages
Computer Security # CoSc4171 PDF
100% (1)
Computer Security # CoSc4171 PDF
123 pages
Data Communication Chapter 1 Test
100% (2)
Data Communication Chapter 1 Test
14 pages
Last Modified
No ratings yet
Last Modified
23 pages
MIS_Chapter_5
No ratings yet
MIS_Chapter_5
31 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
77 pages
Chapter Three: Data Encoding, Data Transmission and Multiplexing
No ratings yet
Chapter Three: Data Encoding, Data Transmission and Multiplexing
27 pages
Foundational Concepts In: Chapter Two MIS
No ratings yet
Foundational Concepts In: Chapter Two MIS
9 pages
Software Enginnering
No ratings yet
Software Enginnering
105 pages
Systems Planning and Selection
100% (1)
Systems Planning and Selection
11 pages
Final Municipality Document
0% (1)
Final Municipality Document
104 pages
3 MM Compression
100% (1)
3 MM Compression
35 pages
Addis Ababa University: College of Natural Sciences School of Information Science
No ratings yet
Addis Ababa University: College of Natural Sciences School of Information Science
113 pages
Data Communication Basics: What Is Data Communications
No ratings yet
Data Communication Basics: What Is Data Communications
14 pages
Arbaminch University: Criminal Record Management System
No ratings yet
Arbaminch University: Criminal Record Management System
57 pages
Hapter: Simple Sorting and Searching Algorithms
No ratings yet
Hapter: Simple Sorting and Searching Algorithms
27 pages
Chapter 3 - Simple Sorting and Searching
100% (1)
Chapter 3 - Simple Sorting and Searching
18 pages
Software Engineering
No ratings yet
Software Engineering
12 pages
Hawassa University Institute of Technology Department of Electrical and Computer Engineering, Computer Stream Internship Report
No ratings yet
Hawassa University Institute of Technology Department of Electrical and Computer Engineering, Computer Stream Internship Report
27 pages
CS95 Deductive Databases
No ratings yet
CS95 Deductive Databases
21 pages
BPR in Addis Ababa University
100% (1)
BPR in Addis Ababa University
21 pages
1 Dashen - ATM - Fasika Wondimu 2019
No ratings yet
1 Dashen - ATM - Fasika Wondimu 2019
95 pages
Human Resource Managmnet System
No ratings yet
Human Resource Managmnet System
63 pages
Exit Exam
No ratings yet
Exit Exam
100 pages
Chapter II. Process Management: 2.1 Overview
No ratings yet
Chapter II. Process Management: 2.1 Overview
17 pages
2.1. Process Description: Chapter Two Process Management
No ratings yet
2.1. Process Description: Chapter Two Process Management
21 pages
Chapter 3 - Basics of Search
No ratings yet
Chapter 3 - Basics of Search
81 pages
Bahir Dar University 1
No ratings yet
Bahir Dar University 1
7 pages
Chapter - 4 - Association Rule Mining
No ratings yet
Chapter - 4 - Association Rule Mining
86 pages
Principles of Information: Systems, Ninth Edition
No ratings yet
Principles of Information: Systems, Ninth Edition
59 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Tsion Adisu final thesis1 (1)
No ratings yet
Tsion Adisu final thesis1 (1)
88 pages
All Course Title For Cs
No ratings yet
All Course Title For Cs
50 pages
Swapnilreport
No ratings yet
Swapnilreport
42 pages
Project
No ratings yet
Project
22 pages
Industrial Management Question Paper
No ratings yet
Industrial Management Question Paper
8 pages
EndSem FPM
No ratings yet
EndSem FPM
2 pages
Project Description
No ratings yet
Project Description
6 pages
CHC203 Heat Transfer Introduction
No ratings yet
CHC203 Heat Transfer Introduction
16 pages
Data Sheet Servo Allen Bradley
No ratings yet
Data Sheet Servo Allen Bradley
354 pages
questions-PROB-22MT2005 2023-2024 ODD SEMESTER-CO4 QUIZ-20240316-1004
No ratings yet
questions-PROB-22MT2005 2023-2024 ODD SEMESTER-CO4 QUIZ-20240316-1004
6 pages
Aerowave Brochure READER HR 2 17 16
No ratings yet
Aerowave Brochure READER HR 2 17 16
4 pages
Claypoole-PrivacySocialMedia-2014
No ratings yet
Claypoole-PrivacySocialMedia-2014
5 pages
Ur E22rev3
No ratings yet
Ur E22rev3
25 pages
Advanced Steel: Exploration and Results
100% (1)
Advanced Steel: Exploration and Results
13 pages
Test - DataBase Basics - DE Courseware - Elearn
No ratings yet
Test - DataBase Basics - DE Courseware - Elearn
6 pages
Evaluation 4 1
No ratings yet
Evaluation 4 1
5 pages
Character - Dané
No ratings yet
Character - Dané
3 pages
Sales Sheet
No ratings yet
Sales Sheet
6 pages
List of New Visual Brand/product/services Indosat Ooredoo
No ratings yet
List of New Visual Brand/product/services Indosat Ooredoo
1 page
Req4. Get Started With Power BI Desktop
No ratings yet
Req4. Get Started With Power BI Desktop
185 pages
Common Emitter Amplifier: Sayed Taher Zewari ECE 334-201 Lab No. 7 09/28/00
No ratings yet
Common Emitter Amplifier: Sayed Taher Zewari ECE 334-201 Lab No. 7 09/28/00
4 pages
db2 Migrate
No ratings yet
db2 Migrate
37 pages
Stripe API
100% (1)
Stripe API
4 pages
4007Es/4007Es Hybrid Operating Instructions Following An Alarm, Supervisory, or Trouble Condition System Using Individual Acknowledge
No ratings yet
4007Es/4007Es Hybrid Operating Instructions Following An Alarm, Supervisory, or Trouble Condition System Using Individual Acknowledge
2 pages
Motionpro Double-Wishbone Suspension: - Kinematic Modeling, Simulation and Optimization of A
No ratings yet
Motionpro Double-Wishbone Suspension: - Kinematic Modeling, Simulation and Optimization of A
1 page
Solved Example PDF
No ratings yet
Solved Example PDF
41 pages
Database Assignment FA22 IET 001,003,021,023 1
No ratings yet
Database Assignment FA22 IET 001,003,021,023 1
8 pages
SQL - PDF Solution
36% (11)
SQL - PDF Solution
12 pages
C Program To Implement A Stack Using Linked List: Void Push Void Pop Void Display Int Main
No ratings yet
C Program To Implement A Stack Using Linked List: Void Push Void Pop Void Display Int Main
4 pages
CMS II-4: Band Theory, Band Structure, K-Point Sampling, and Density of State
No ratings yet
CMS II-4: Band Theory, Band Structure, K-Point Sampling, and Density of State
37 pages
What Is Cartography
No ratings yet
What Is Cartography
11 pages
Lesson Plan
No ratings yet
Lesson Plan
2 pages
Final Complete Report
No ratings yet
Final Complete Report
56 pages
KarthicK Resume
No ratings yet
KarthicK Resume
6 pages
Cannot Create Sip Trunk Group
No ratings yet
Cannot Create Sip Trunk Group
2 pages
Vi Cheat Sheet
No ratings yet
Vi Cheat Sheet
4 pages
Formal and Informal Email Writing
No ratings yet
Formal and Informal Email Writing
8 pages

Report of Profit Prediction

Uploaded by

Report of Profit Prediction

Uploaded by

lOMoAR cPSD| 13119880

Exposys Data Labs

Name- Harsh Garg

Under the guidance of

Downloaded by Harsh Garg 24601 ([email protected])

In today's highly competitive business world, companies need

Downloaded by Harsh Garg 24601 ([email protected])

Downloaded by Harsh Garg 24601 ([email protected])

1.1 Data Science

Data science is a multidisciplinary field that utilizes scientific methods,

Data science plays a crucial role in various industries, including

Moreover, data science has the potential to address complex problems

As data continues to grow exponentially, data science will remain at the

Downloaded by Harsh Garg 24601 ([email protected])

1.2 Machine Learning

Machine learning is a branch of artificial intelligence that focuses on

Machine learning algorithms analyze vast amounts of data, identify

Machine learning has applications in various fields, including

With the advancements in computing power, availability of large

Downloaded by Harsh Garg 24601 ([email protected])

However, many existing linear regression models may not be optimized

2.1 Issues in existing systems

2. Overfitting and Under fitting

Downloaded by Harsh Garg 24601 ([email protected])

The proposed system is an ML model that utilizes a linear regression

The proposed system addresses the drawbacks of the existing system by

Downloaded by Harsh Garg 24601 ([email protected])

1. Load the dataset containing the company's R&D Spend,

2. Split the dataset into training and testing sets.

3. Train the linear regression model on the training set.

5. Evaluate the performance of the model using evaluation metrics such

6. If the performance of the model is not satisfactory, tune the model by

Downloaded by Harsh Garg 24601 ([email protected])

The linear regression algorithm is a simple yet powerful algorithm that

y = b0 + blxl+ b2x2 + b3*x3

During training, the linear regression algorithm adjusts the coefficients

Downloaded by Harsh Garg 24601 ([email protected])

4.1 Data Collection: Collect data from various sources such as

4.2 Data Preprocessing: Clean and preprocess the data to ensure it is

4.7 Optimize the Model: Optimize the model by adjusting

5.1 Source Code

Import the necessary libraries

import matplotlib.pyplot as plt

Loading and Analyzing Data

print('There are' , dataset.duplicated().sum(),'duplicate values in the

Downloaded by Harsh Garg 24601 ([email protected])

plt.title ('Outliers Variables Distribution ')

sns.distplot(dataset['Profit'], bins=5, kde=True)

Model Development and Training

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

df = pd.DataFrame(data={'Predicted value' : y_pred.flatten(), 'Actual

Downloaded by Harsh Garg 24601 ([email protected])

from sklearn.metrics import r2_score

from sklearn.metrics import mean_squared_error

from sklearn.metrics import mean_absolute_error

Downloaded by Harsh Garg 24601 ([email protected])

In conclusion, the Linear Regression model developed in this project

Overall, the project has been successful in developing an ML model

Downloaded by Harsh Garg 24601 ([email protected])

You might also like