0% found this document useful (0 votes)
10 views

B5 Project Report Format SEM I 2022

The document is a project report on a 'Phishing Website Detector Using ML' submitted by students for their Bachelor of Technology in Computer Engineering. It outlines the project's objectives, methodology, and the use of machine learning algorithms like Decision Trees, Random Forest, and Support Vector Machines to detect phishing URLs. The report includes acknowledgments, a literature survey, and future scope for enhancing phishing detection techniques.

Uploaded by

sairaj jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

B5 Project Report Format SEM I 2022

The document is a project report on a 'Phishing Website Detector Using ML' submitted by students for their Bachelor of Technology in Computer Engineering. It outlines the project's objectives, methodology, and the use of machine learning algorithms like Decision Trees, Random Forest, and Support Vector Machines to detect phishing URLs. The report includes acknowledgments, a literature survey, and future scope for enhancing phishing detection techniques.

Uploaded by

sairaj jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 16

A

PROJECT PHASE-I REPORT


ON

Phishing Website Detector Using ML

SUBMITTED IN THE PARTIAL FULFILLMENT OF THE REQUIREMENTS


FOR THE AWARD OF THE DEGREE

OF

BACHELOR OF TECHNOLOGY (COMPUTER ENGINEERING)


SUBMITTED BY

Shoeb Akhtar Shah Exam No :(BCOB90)


Safin Tamboli Exam No :(BCOB81)
Sarthak Pawar Exam No :(BCOB85)
Sairaj Jadhav Exam No :(BCOB82)

Under the Guidance of


Mrs. Gayatree Bedre

DEPARTMENT OF COMPUTER ENGINEERING

G. H. RAISONI COLLEGE OF ENGINEERING AND MANAGEMENT


(An Autonomous Institute affiliated to SPPU)
WAGHOLI, PUNE-402207

SAVITRIBAI PHULE PUNE UNIVERSITY


2022 -2023
G. H. Raisoni College of Engineering and Management, Wagholi- Pune 402207
(An Autonomous Institute affiliated to SPPU)

CERTIFICATE

This is to certify that the project report entitles


“Phishing Website Detector Using ML”

Submitted by

Safin Tamboli Exam No :(BCOB81)


Sairaj Jadhav Exam No :(BCOB82)
Sarthak Pawar Exam No :(BCOB85)
Shoeb Akhtar Shah Exam No :(BCOB90)

are bonafide students of this institute and the work has been carried out by them under the
supervision of Mrs.Nivedita Kadam and it is approved for the partial fulfillment of the
requirement of Savitribai Phule Pune University, for the award of the degree of Bachelor of
Technology (Computer Engineering).

Mrs. Nivedita Kadam Mrs. Gayatree Bedre


Guide Project Coordinator

Dr. Simran Khiani


HOD Director
Place : Pune
Date :

ACKNOWLEDGEMENT

We would like to thank Dr. Simran Khiani, our Head of Department (CS), and Prof.
Nivedita Kadam for their support and guidance in completing our project Phishing
Website Detector using ML. We Safin Tamboli, Sarthak Pawar, Sairaj Jadhav, and
Shoeb Akhtar Shah would like to take this opportunity to express our gratitude to
each and every one of our group. The project would not have been successful
without their cooperation and input.

NAME OF THE STUDENTS & SIGNS


Safin Tamboli Exam No :(BCOB81)

………………………

Sairaj Jadhav Exam No :(BCOB82)

………………………

Sarthak Pawar Exam No :(BCOB85)

………………………

Shoeb Akhtar Shah Exam No :(BCOB90)

………………………
ABSTRACT

Phishing attack is a simplest way to obtain sensitive information from innocent


users. Aim of the phishers is to acquire critical information like username, password
and bank account details. Cyber security persons are now looking for trustworthy
and steady detection techniques for phishing websites detection. This project deals
with machine learning technology for detection of phishing URLs by extracting and
analyzing various features of legitimate and phishing URLs. Decision Trees, random
forest and Support vector machine algorithms are used to detect phishing websites.
TABLE OF CONTENTS

CHAPTER TITLE PAGE


NO.
1 Introduction 6
1.1 .Problem Statement
2 Literature Survey Table 7
3 Advantages & Disadvantages of existing 8
systems-
3.1 Problem Statement

4 Proposed System 9
4.1 Objective
4.2 Block Diagram
5. Relevant Mathematics associated with 10
the Project :
5.1 Expected Outcome
6 Methodology/Planning for Project- 11

7 Modules 12
8 Tools to Use 13
9 References 14
10 LIST OF ABBREVIATIONS 15
11 LIST OF TABLES 16
Introduction:

We live in a technical world and with more advancements in technology, we are


facing some major problems such as phishing websites or hackers getting their hands
on user’s or customer’s personal data by creating fake websites which are overall
identical to the original websites. These attackers are able to steal the banking
credentials and all the different formats of data associated with the mail and device
of a user. As phishing attacks are becoming more successful due to a lack of user
awareness, it is very difficult to counter them, so it is the very need to enhance
phishing detection techniques.

Problem Definition-
Phishing attacks are the simplest way to obtain sensitive information from innocent
users. The goal of the phishers is to acquire critical information like usernames,
passwords, and bank account details. Everyone is now looking for trustworthy and
steady detection techniques for phishing website detection. This project deals with
machine learning technology for the detection of phishing URLs by extracting and
analyzing various features of legitimate and phishing URLs. Decision Trees, random
forests, and Support vector machines are the algorithms used to detect phishing
websites. Support vector machines are the algorithms used to detect phishing
websites.

-
Literature Survey Tabel-
Sr. Paper title & its author Methodology Advantages Future Scope
No.
1. Title: Detecting The proposed The outcome of The future
phishing websites using framework employs this study reveals direction of this
machine learning RNN-LSTM to that the proposed study is to develop
technique identify the method presents an unsupervised
Author: Ashit Kumar properties Pm and Pl superior results deep learning
Dutta in an order to declare rather than the method to generate
an URL as malicious existing deep insight from a
or legitimate. learning methods URL.
2. Title: URLs of benign We have In future hybrid
Phishing Website websites were implemented technology will be
Detection using Machine collected from python program implemented to
Learning Algorithms www.alexa.com and to extract detect phishing
Authors: The URLs of features from websites more
Rishikesh Mahajan phishing websites URL. Below are accurately, for
were collected from the features that which random
www.phishtank.com. we have forest algorithm of
And Classifies using extracted for machine learning
Decision Tree as detection of technology and
Splitter. phishing URLs. blacklist method
will be used.
3. Title :Phishing Website To select features, In our approach, This is important,
Detection Using we used the Weka to find most as we hope with a
Machine Learning tool and its valuable features decrease in the
Classifiers Optimized by algorithms for we used multiple number of
Feature Selection feature selection. feature selection features, we
To perform phishing filters. The decreased time
Authors: Dželila websites detection, outputs of these needed to build a
Mehanović* | Jasmin in this work we filters are model.
Kevrić applied K-Nearest analyzed and
Neighbor (KNN) features that are
proposed as most
important.
4. Title: an approach for It consists of the Support vector In the future, we
detecting phishing parallel machine gives an can find a better
attacks decision tree which accuracy of way to find a
using machine learning take the input and 91.3% on test phishing website
techniques. produce a specific data set. This by using advanced
Author: class. Thus, n helps in features of the
K.Venkateshwara Rao number of trees providing URL.
produce different accuracy.
classes.
Advantages & Disadvantages of existing systems-
Advantages-
1. Secure Email Gateways

2. Detects threats or data hacking frauds

3. Minimizes risk of handing away user credentials

Disadvantages-
1. Weak antimalware

2. Users and customers inability to detect phishing emails and


messages

3.Insufficient communication between management organizations and user or


customers.

Problem Statement :
Nowadays , phishing is a main area of concern for security researchers because it is
not difficult to create a fake website that looks so close to a legitimate website.
Experts can identify fake websites but not all the users can identify the fake website
and such users become the victim of phishing attack. Main aim of the attacker is to
steal banks account credentials. Phishing attacks are becoming successful because
lack of user awareness. Since phishing attacks exploit users’ weaknesses, it is very
difficult to mitigate them, but it is very important to enhance phishing detection
techniques.
Proposed System-

Objective-
Machine learning technology consists of a many algorithms which requires past data
to make a decision or prediction on future data. Using this technique, the algorithm
will analyze various blacklisted and legitimate URLs and their features to accurately
detect phishing websites, including zero-hour ones.

Block Diagram-

Detecting Phishing Websites using ML Algorithms

Legitimate URLs

Phistank Feature
Malicious Extraction
URLs

Crawler
Data

Emails/SMS/
Enterprises RNN Training
Phase
RNN &
Random Forest
Testing Phase

Evaluating The Result


Relevant Mathematics associated with the Project :

Expected Outcome-

 System Description: Detecting Websites/URLs

 Input: URLs, Random websites, Transaction IDs, Suspicious Mails

 Output: Safe for Browsing (Continue) / Unsafe For Browsing (Block


Website)

 Possible Success Conditions: Developing a cautious way of browsing on


internet, checking random URLs forwarded on our mails or social media.

 Failure Conditions: New format of phishing websites may go undetected.


Methodology/Planning for Project-
1) Collection of Data-
Primarily we collected and acquired data about heterogenous phishing
websites and how they are made by the hackers or fake companies, we are utilizing
these data and are using it in the proposed project to create a phishing website
detector using machine learning approach. Phishing Websites being used are from
2) Processing of Data-
Our project uses classification technique to differentiate between a fake and a
genuine website by classifying the features of a website. Classification is a data
mining technique through which we analyse a set of data and generate a set of
grouping rules which can be used to classify future data using Decision trees,
Random Forest Algorithm, linear programming and statistics.
3) Algorithms and methods Used-
1. Random Forest Algorithm-
Random forest is one of the most powerful algorithms in machine
learning technology and it is based on the concept of the decision tree
algorithm. Random forest algorithm creates the forest with a number of
decision trees. The high number of trees gives high detection accuracy.
2. RNN (Recurrent Neural Networks)-
A recurrent neural network is one type of artificial neural network
which is commonly used in speech recognition and natural language
processing. Recurrent neural networks accept and predicts data’s
sequential characteristics and use patterns to predict the next likely
scenario.
Modules to be included in the project-

 Login- After successful registration, the user/admin may input his credentials
to login into the system.

 Add to Blacklist- Here, the system administrator adds the malicious website
to the blacklist.

 Check Website- Here, the user checks any blacklisted website by inputting
the URL.

 Change Password- Admin may change his password for security purposes by
inputting old and new passwords.
Tools to Use :

1) Hardware :
• Operating System : Windows 7 or Higher

• Processor : Core i3

• RAM : 1GB

2) Software :

1) Jupiter Notebook 2) Visual Studio


3) HTML & CSS

Future Scope-

1.Creating a safe user friendly environment which can detect illegitimate activities.

2.It is possible to report and block a hacker using phishing website URL and tracing
the location of such anonymous hackers.

3.Awareness can be created among users by displaying certain type of Phishing


URLs available or cause more harm to our system like zero hour phishing websites.
References-
1. Anti-Phishing Working Group (APWG),
https://docs.apwg.org//reports/apwg_trends_report_q4_2019. pdf

2. Jain A.K., Gupta B.B. “PHISH-SAFE: URL Features-Based Phishing


Detection System Using Machine Learning”, Cyber Security. Advances in
Intelligent Systems and Computing, vol. 729, 2018, https://doi.
org/10.1007/978-981-10-8536-9_44

3. Purbay M., Kumar D, “Split Behavior of Supervised Machine Learning


Algorithms for Phishing URL Detection”, Lecture Notes in Electrical
Engineering, vol. 683, 2021, https://doi.org/10.1007/978-981- 15-6840-4_40

4. Gandotra E., Gupta D, “An Efficient Approach for Phishing Detection using
Machine Learning”, Algorithms for Intelligent Systems, Springer, Singapore,
2021, https://doi.org/10.1007/978-981-15-8711-5_ 12.

5. Hung Le, Quang Pham, Doyen Sahoo, and Steven C.H. Hoi, “URLNet:
Learning a URL Representation with Deep Learning for Malicious URL
Detection”, Conference’17, Washington, DC, USA, arXiv:1802.03162, July
2017.

6. Hong J., Kim T., Liu J., Park N., Kim SW, “Phishing URL Detection with
Lexical Features and Blacklisted Domains”, Autonomous Secure Cyber
Systems. Springer, https://doi.org/10.1007/978-3-030-33432- 1_12.

7. J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran and B. S.


Bindhumadhava, “Phishing Website Classification and Detection Using
Machine Learning,” 2020 International Conference on Computer
Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1–6,
10.1109/ ICCCI48352.2020.9104161.
LIST OF ABBREVIATIONS
ABBREVIATION ILLUSTRATION

RNN Recurrent Neural Networks


SMS Short Message Services
URL Uniform Resource Locator
ML Machine Learning
LIST OF TABLES
TABLE ILLUSTRATION PAGE NO.

1 Literature Survey 7

You might also like