0% found this document useful (0 votes)
786 views

ML Report 9 PDF

The document describes a project to build a password strength checker using machine learning in Python. It discusses a dataset of 700,000 passwords labeled as weak, medium, or strong based on three commercial password checking algorithms. The project aims to develop a machine learning model that can more accurately evaluate password strength by analyzing various factors. Python is chosen as the programming language due to its versatility and machine learning libraries. The methodology involves analyzing the dataset using data visualization tools in Python before developing machine learning algorithms to classify new passwords by strength.

Uploaded by

Ronak Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
786 views

ML Report 9 PDF

The document describes a project to build a password strength checker using machine learning in Python. It discusses a dataset of 700,000 passwords labeled as weak, medium, or strong based on three commercial password checking algorithms. The project aims to develop a machine learning model that can more accurately evaluate password strength by analyzing various factors. Python is chosen as the programming language due to its versatility and machine learning libraries. The methodology involves analyzing the dataset using data visualization tools in Python before developing machine learning algorithms to classify new passwords by strength.

Uploaded by

Ronak Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

PASSWORD STRENGTH CHECKER

USING MACHINE LEARNING


A PROJECT REPORT
Submitted by
SASWAT KUMAR PANDEY (220301120423)
SOUMYA RANJAN BEHERA (220301130015)
N DIBYANSU DIBYARANJAN (220301120404)
PRIYANSU BARIK (220301120419)
in partial fulfilment for the award of the degree
of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE ENGINEERING

CENTURION UNIVERSITY OF TECHNOLOGY AND


MANAGEMENT
BHUBANESWAR, ODISHA
APRIL 2023-MAY 2023
DEPARTMENT OF COMPUTER SCIENCE
ENGINEERING
CENTURION UNIVERSITY OF TECHNOLOGY & MANAGEMENT
BHUBANESWAR 752050

BONAFIDE CERTIFICATE
Certified that this project report PASSWORD STRENGTH CHECKER USING MACHINE
LEARNING IN PYTHON is the Bonafede work of SASWAT KUMAR PANDEY (220301120423),
SOUMYA RANJAN BEHERA (220301130015), N DIBYANSU DIBYARANJAN (220301120404),
PRIYANSU BARIK (220301120419) who carried out the project work under my supervision. This is
to further certify to the best of my knowledge, that this project has not been carried out earlier in this
institute and the university.

SIGNATURE
(Prof SWARNA PRABHA JENA)
Professor of ECE Engg.
Certified that the above-mentioned project has been duly carried out as per the norms of the college
and statutes of the university.

SIGNATURE
(DR. RAJKUMAR MAHANTA)
Professor of CSE Engg.
ACKNOWLEDGEMENTS

I wish to express my profound and sincere gratitude to Prof. Prof. SWARNA PRABHA JENA
Department of ELECTRONICS AND COMMUNICATION Engineering, CUTM, BHUBANESWAR
who guided us into the intricacies of this project non-chalantly with matchless magnanimity.

We thank DR. RAJKUMAR MAHANTA, Head of the Dept. of ELECTRONICS AND


COMMUNICATION Engineering, CUTM, BHUBANESWAR, and DR. SUJATA CHAKRAVARTY,
DEAN, SOET CUTM for extending their support during Course of this investigation.

We would be failing in our duty if we don’t acknowledge the co-operation rendered during various
stages of image interpretation by Prof. SWARNA PRABHA JENA

We are highly grateful to Prof. SWARNA PRABHA JENA who evinced keen interest and invaluable
support in the progress and successful completion of our project work.

SASWAT KUMAR PANDEY (220301120423)


SOUMYA RANJAN BEHERA (220301130015)
N DIBYANSU DIBYARANJAN (220301120404)
PRIYANSU BARIK (220301120419)
Table Of Content

Chapter 1. INTRODUCTION: -

1.i- Background

Chapter 2. LITERATURE REVIEW

Chapter 3. METHODOLOGY

3.i- Dataset

3.ii- Data analysis

3.iii- library used

3.iv- Algorithms used

Chapter 4. RESULT

Chapter 5. CONCLUSION

REFERENCE
CHAPTER 1. INTRODUCTION
1.i- Background
A password strength checker is a tool that determines the security level of a password.
Passwords are used to protect sensitive information, and a strong password is essential for
keeping that information secure. Password strength is typically evaluated based on various
factors such as length, complexity, and uniqueness. A password strength checker can help users
to create strong passwords by providing feedback on the strength of their current passwords or
suggesting more secure alternatives. Password strength checkers use various techniques such
as rule-based systems, pattern matching, and machine learning algorithms to determine the
strength of a password. Machine learning-based password strength checkers use a dataset of
passwords and their corresponding strengths to train a model that can accurately predict the
strength of new passwords. These models can take into account a wide range of factors that
contribute to password strength, including character types, patterns, and length. Overall, a
password strength checker is an important tool for ensuring the security of sensitive
information. By providing users with feedback on the strength of their passwords, they can
encourage the use of stronger passwords that are less susceptible to hacking attempts.
CHAPTER 2: LITERATURE REVIEW
The motivation for building a password strength checker using Python and machine learning
is to create a more effective and efficient tool for evaluating the strength of passwords.
Password strength is an essential aspect of cybersecurity, and weak passwords can be easily
compromised, leading to significant security breaches. By using machine learning, a password
strength checker can analyse various factors that contribute to the strength of a password, such
as length, complexity, use of special characters, and other patterns. The machine learning model
can learn from a large dataset of password examples, which can include both strong and weak
passwords, to identify patterns and correlations that can help it accurately evaluate the strength
of a password. Python is a popular programming language for machine learning due to its
simplicity, versatility, and wide range of libraries and frameworks that support machine
learning tasks. Using Python, developers can build a password strength checker that can
evaluate passwords in real-time, providing immediate feedback to users and helping them to
choose stronger and more secure passwords. Overall, the motivation for building a password
strength checker using Python and machine learning is to enhance cybersecurity and protect
against potential security breaches caused by weak passwords. A password strength checker
using Python and machine learning could make a significant contribution in enhancing the
security of online accounts and reducing the likelihood of data breaches. Here are a few ways
in which such a tool could be beneficial, Improved password strength assessment: Machine
learning models could be trained on a large dataset of passwords to identify common patterns
and characteristics of weak and strong passwords. This information could then be used to
develop a more accurate password strength checker that can evaluate the strength of a password
based on various factors such as length, complexity, and uniqueness. Real-time password
strength feedback: With a machine learning-based password strength checker, users could
receive real-time feedback on the strength of their password as they type it in. This feedback
could include suggestions for how to make the password stronger, such as adding special
characters or increasing the length. Customized password recommendations: A machine
learning-based password strength checker could analyse a user's previous password choices
and provide customized recommendations for creating strong passwords that are more likely
to be remembered by the user. Security alerts: Machine learning algorithms could be trained to
recognize patterns of suspicious activity, such as multiple failed login attempts or attempts to
log in from unusual locations. If such activity is detected, the system could alert the user to
change their password to a stronger one. Overall, a password strength checker using Python
and machine learning could help improve the security of online accounts and reduce the risk
of data breaches by providing users with more accurate and personalized feedback on the
strength of their passwords.
CHAPTER3: METHODOLOGY
Dataset:-
The passwords used in our analysis are from 000webhost leak that is available online. How did
we figure out which passwords were stronger and which were weaker? Well, there is a tool
called PARS by Georgia Tech university which have all the commercial password meters
integrated into it. All I did was give that tool all the passwords and it gave me new files for
each commercial password strength meter. The files contained the passwords with one more
column i.e their strength based on the commercial password strength meters. The commercial
password strength algorithms I used are of Twitter, Microsoft and battle. How is this algorithm
different from these strength meters? First of all, it is entirely based on machine learning rather
than on rules. Secondly, I only kept those passwords that were flagged weak, medium and
strong by all three strength meters. This means that all the passwords were indeed either weak,
medium or strong.

I had a total of 3 million passwords but after taking the intersection of all classifications of
commercial meters, I was left with 0.7 million passwords. The reduction was because of the
fact that I only used passwords that were flagged in a particular category by all three algorithms.
Data Analysis

a
After analysing this data in Bar graph, we found that we have more than six lakhs’ data & we
have taken three level shown in the bars where: Zero indicates –
0 indicates- Easy password
1 indicates– Medium password
2 indicates – Strong password
Library used: -
 NumPy
 Pandas
 Matplotlib

Algorithms used:-
Random Forest Classifier

i- RandomForestClassifier- Random Forest Classifier is a popular machine


learning algorithm used for classification tasks. It is an ensemble learning method
that combines multiple decision trees and produces a more accurate and stable
prediction than a single decision tree.
In a random forest classifier, a set of decision trees are built using different subsets of the
training data and different subsets of the features. Each tree makes a prediction, and the final
prediction is made by taking the majority vote of all the trees. This approach reduces
overfitting and increases the accuracy and robustness of the model.
The random forest classifier is commonly used for a wide range of applications, such as in
finance for fraud detection, in medicine for disease diagnosis, in image recognition, and in
many other fields. It is a powerful and flexible algorithm that can handle both binary and
multi-class classification problems.
ii. TfidfVectorizer- TF-IDF (Term Frequency-Inverse Document Frequency)
vectorizer is a technique used to transform text data into a numerical format that
can be used for machine learning algorithms. It is a commonly used technique for
text classification, information retrieval, and natural language processing.
The TF-IDF vectorizer assigns a weight to each word in a document based on how
frequently it appears in that document (term frequency) and how often it appears in all other
documents (inverse document frequency). This weighting scheme helps to identify the most
important words in a document and reduces the importance of common words like "the" or
"and".
The TF-IDF vectorizer creates a vector for each document where the length of the vector is
the total number of unique words in the corpus, and each entry in the vector corresponds to
the TF-IDF weight of the corresponding word in the document. This vector representation
can then be used as input to machine learning algorithms.

The TF-IDF vectorizer is a widely used technique and is available in many popular machine
learning libraries like Scikit-learn and TensorFlow. It is particularly useful for tasks like
sentiment analysis, text classification, and topic modelling.

iii- Train_test_splitit- Train-Test Splitting is a technique used in machine learning to


evaluate the performance of a model. It involves splitting a dataset into two
separate sets: a training set and a testing set.
The training set is used to train the model, while the testing set is used to evaluate the model's
performance on unseen data. The goal is to build a model that can generalize well to new,
unseen data, and not just memorize the training data.
The train-test splitting process involves randomly dividing the dataset into two parts: the
training set and the testing set. The most common split is 80/20 or 70/30, where the training
set contains 70-80% of the data, and the testing set contains the remaining 20-30%.
The training set is used to fit the model to the data by optimizing the model parameters. The
testing set is then used to evaluate the performance of the model by calculating metrics such
as accuracy, precision, recall, and F1-score. These metrics provide insight into how well the
model is performing on unseen data.
It's important to note that the test set should only be used for evaluation purposes, and should
not be used for model training or parameter tuning. If the test set is used for these purposes, it
can lead to overfitting and an inaccurate evaluation of the model's performance on new,
unseen data.
In summary, train-test splitting is a crucial step in the machine learning pipeline, as it allows
us to estimate how well our model will perform on new, unseen data.
CHAPTER 4:RESULT: -

Sl No. Algorithm Accuracy


1. RandomForestClassifier 95.5%
CHAPTER 5: CONCLUSION: -
We compare the scores of 80% training data – 20% testing data. The author finds in comparison
of above six lakhs features that the accuracy by RandomForestClassifier.
The strength of password in machine learning depends on various factors such as the quality
of password, length of password, and the choice of machine learning model. In general, the
accuracy of the developed model can vary from 60% to 90%, depending on the complexity of
the problem and the quality of the data.
a password strength checker using machine learning in Python can be a useful tool for
evaluating the strength of passwords. By analyzing the features of a password, such as length,
character types, and patterns, a machine learning model can make predictions about the
password's strength.
To build a password strength checker using machine learning in Python, you would first need
to collect a dataset of password samples with known strengths. You could then pre-process the
data and extract relevant features before training a machine learning algorithm, such as a
decision tree or a neural network.
Once the model is trained, you can use it to predict the strength of new passwords that are
entered into the system. You could also use the model to suggest ways to improve weak
passwords or to enforce password strength requirements in your application or system.
It's important to note that no password strength checker can guarantee the security of a
password. However, a password strength checker can be a useful tool for encouraging users to
choose stronger passwords and for identifying weak passwords that may be vulnerable to
attacks.
REFERENCE: -
1. https://www.kaggle.com/datasets/bhavikbb/password-strength-classifier-dataset
2. https://youtu.be/BzManFSX5lg
3. https://youtu.be/x3GfMmzHJa8

You might also like