0% found this document useful (0 votes)
48 views

Project 2

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Project 2

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

CREDIT CARD FRAUD DETECTION

USING RANDOM FOREST AND CART


ALGORITHM
A Report Submitted in partial fulfillment of the requirements
for the award of the degree of

BACHELOR OF TECHNOLOGY
in

ELECTRICAL AND ELECTRONICS ENGINEERING


Submitted by,
B VAISHNAVI [23J41A0204]

B MANOJ KIRAN [23J41A0205]

B SIVA KRISHNA [23J41A0206]

Under the Supervision

of Dr.T.Rajesh
(Professor)

DEPARTMENT OF ELECTRICAL AND ELECTRONICS


ENGINEERING
MALLA REDDY ENGINEERING COLLEGE
(An Autonomous Institution)
Maisammaguda, Secunderabad, Telangana, India 500100

September – 2024
MALLA REDDY ENGINEERING COLLEGE
Maisammaguda, Secunderabad, Telangana, India 500100

CERTIFICATE

This is to certify that the “Credit Card Fraud Detection Using Random Forest
And Cart Algorithm” submitted by B VAISHNAVI [23J41A0204], B MANOJ
KIRAN [23J41A0205], B SIVA KRISHNA [23J41A0206] are work done by
him/ her and submitted during 2024 – 2025 academic year, in partial fulfillment
of the requirements for the award of the degree of BACHELOR OF
TECHNOLOGY in ELECTRICAL AND ELECTRONICS
ENGINEERING, at MALLA REDDY ENGINEERING COLLEGE An
Autonomous Institution, Maisammaguda, Secunderabad, Telangana, India
500100

SIGNATURE SIGNATURE
Dr.T.Rajesh
Dr.M.Kondalu
INTERNSHIP COORDINATOR
HOD
( Professor)
(Department of EEE)
Department of
EEE Malla Reddy Engineering
College Secunderabad, 500 100
Malla Reddy Engineering College
Secunderabad, 500 100
ACKNOWLEDGEMENT

First I would like to thank Dr.T.Rajesh, ( Professor) Place for giving me the opportunity to do
an internship within the organization. I am highly indebted to Dr.A.Ramaswamy Reddy,
(Principal) for the facilities provided to accomplish this internship. I would like to thank
Dr.M.Kondalu, (HOD) for his constructive criticism throughout my internship.

It is indeed with a great sense of pleasure and immense sense of gratitude that I acknowledge
the help of these individuals.

I am extremely great full to my department staff members and friends who helped me in
successful completion of this internship.

B VAISHNAVI [23J41A0204]

B MANOJ KIRAN [23J41A0205]


B SIVA KRISHNA [23J41A0206]

1
ABSTRACT

The project is mainly focussed on credit card fraud detection in real world. A
phenomenal growth in the number of credit card transactions, has recently led to a
considerable rise in fraudulent activities. The purpose is to obtain goods without paying,
or to obtain unauthorized funds from an account. Implementation of efficient fraud
detection systems has become imperative for all credit card issuing banks to minimize
their losses. One of the most crucial challenges in making the business is that neither
the card nor the cardholder needs to be present when the purchase is being made. This
makes it impossible for the merchant to verify whether the customer making a purchase
is the authentic cardholder or not. With the proposed scheme, using random forest
algorithm the accuracy of detecting the fraud can be improved can be improved.
Classification process of random forest algorithm to analyse data set and user current
dataset. Finally optimize the accuracy of the result data. The performance of the
techniques is evaluated based on accuracy, sensitivity, and specificity, and precision.
Then processing of some of the attributes provided identifies the fraud detection and
provides the graphical model visualization. The performance of the techniques is
evaluated based on accuracy, sensitivity, and specificity, and precision.

2
INDEX

S.No. Contents Page No.


ACKNOWLEDGEMENT 1

ABSTRACT 2

1. Objective of the Internship 4

2. Technical Observations and Learnings form Internship Program 5

3 Outcome of the Internship 11

4 Conclusion 13

5 Appendices 17

6 References 18

3
1. OBJECTIVES OF THE INTERNSHIP

 Explore Career Alternatives: Gain hands-on experience in the field of data


science and machine learning, specifically in credit card fraud detection, to
explore potential career paths before graduation.
 Integrate Theory and Practice: Apply the theoretical knowledge acquired from
coursework to a real-world project, enhancing understanding of machine
learning algorithms like Random Forest and CART.
 Assess Interests and Abilities: Evaluate personal strengths, interests, and skills
in data analysis, model building, and algorithm optimization, particularly within
the finance and security sector.
 Understand the Economic Role of Work: Recognize the significance of
effective fraud detection systems in maintaining financial stability and their
broader impact on the economy.
 Develop Professional Work Habits: Cultivate essential work habits such as
time management, attention to detail, and problem-solving, which are critical for
success in professional settings.
 Enhance Communication and Interpersonal Skills: Improve the ability to
communicate technical concepts clearly and collaborate effectively with team
members, supervisors, and stakeholders.
 Build Professional Experience: Document the practical experience gained
during the internship, which can be highlighted in a professional portfolio or
resume.
 Establish Professional Contacts: Network with industry professionals and
acquire contacts that could lead to full-time employment opportunities post-
graduation.

4
2. TECHNICAL OBSERVATIONS AND LEARNINGS FORM
INTERNSHIP PROGRAM

 Process Overview: The credit card fraud detection system involves several
stages starting from data collection to model deployment.

 Data Collection: Gathering anonymized transaction data from historical


records.
 Data Preprocessing: Cleaning, normalizing, and transforming data for analysis.
 Feature Engineering: Selecting and creating relevant features that help identify
fraudulent patterns.
 Model Development: Implementing and training the Random Forest and CARD
algorithms.
 Model Testing and Validation: Assessing the performance of the models using
cross-validation and testing datasets.
 Deployment: Integrating the finalized model into the production environment
for real-time fraud detection.

Materials and Technologies Used

 Materials: The primary data used consisted of anonymized credit card


transaction records, including features such as transaction amount, merchant
category, and cardholder location.
 Technologies: The organization utilized Python, Scikit-learn, and SQL for data
handling and model development. Additionally, cloud-based platforms were
used for storing and processing large datasets.

5
REQUIREMENT ANALYSIS

The project involved analyzing the design of few applications so as to make the
application more users friendly. To do so, it was really important to keep the
navigations from one screen to the other well ordered and at the same time reducing the
amount of typing the user needs to do. In order to make the application more accessible,
the browser version had to be chosen so that it is compatible with most of the Browsers.

REQUIREMENT SPECIFICATION

Functional Requirements

 Graphical User interface with the User.

Software Requirements

For developing the application the following are the Software Requirements->Python

Operating Systems supported Windows

Technologies and Languages used to Develop Python

HARDWARE REQUIREMENTS

For developing the application the following are the Hardware Requirements:

 Processor: Pentium IV or higher


 RAM: 256 MB
 Space on Hard Disk: minimum 512MB

6
SYSTEM SPECIFICATION:
HARDWARE REQUIREMENTS:
 System : Pentium IV 2.4 GHz.

 Hard Disk : 40 GB.

 Floppy Drive : 1.44 Mb.

 Monitor : 14’ Colour Monitor.

 Mouse : Optical Mouse.

 Ram : 512 Mb.

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.

 Designing : Html,css,javascript.

 Data Base : MySQL.

7
Credit Card Fraud Detection Using Random Forest and CART :

Introduction

 Credit card fraud is a significant concern for financial institutions and customers
alike, with billions lost annually due to fraudulent activities. Detecting fraud is
challenging due to the evolving nature of fraudulent tactics and the need to
distinguish between legitimate and fraudulent transactions. Machine learning
algorithms, particularly Random Forest and CART (Classification and Regression
Trees), have shown to be effective in identifying suspicious transactions based on
historical data patterns.

What is Random Forest?

 Random Forest is a versatile machine learning algorithm used for both


classification and regression tasks. It is an ensemble learning method that builds
multiple decision trees during training and merges their outputs to make the final
prediction. The primary goal of Random Forest is to reduce overfitting and improve
accuracy.

 How Random Forest Works:

1. Ensemble of Decision Trees: Random Forest consists of many decision trees


(hence the name “forest”). Each tree is trained on a random subset of the data and a
random subset of features. 2. Bootstrap Aggregation (Bagging): It uses a technique
called bootstrapping, where multiple subsets of data are created by sampling with
replacement. Each subset is used to train a different decision tree. 3. Random
Feature Selection: At each split in a tree, a random selection of features is chosen to
determine the best split. This reduces correlation between trees and prevents
overfitting. 4. Voting Mechanism: For classification tasks like fraud detection, each
tree votes for a class (fraud or no fraud), and the class with the majority votes is
chosen as the final prediction.

8
 Advantages of Random Forest for Fraud Detection

predictive accuracy. • Robust to Overfitting: The use of random samples and features
makes Random Forest robust to overfitting, especially in high-dimensional datasets
typical in fraud detection. • Feature Importance: It provides insights into which features
(like transaction amount, frequency, location, etc.) are most important in predicting
fraud.

 What is CART?

CART (Classification and Regression Trees) is a decision tree algorithm used for
creating a model that predicts the value of a target variable by learning simple decision
rules inferred from the data features. CART builds binary decision trees where each
node represents a decision on a single feature.

 How CART Works:

1. Splitting Criteria: The CART algorithm splits the dataset into two subsets based on the
value of a specific feature that results in the maximum information gain (or minimum
Gini impurity). 2. Recursive Binary Splits: This process is repeated recursively for each
subset, creating a binary tree structure. Each node represents a feature and a decision
rule, while each branch represents an outcome of that rule. 3. Stopping Criteria: The tree
continues to split until it reaches a stopping criterion (e.g., maximum tree depth,
minimum number of samples per leaf, or purity of nodes).

 How Random Forest and CART Work for Credit Card Fraud Detection Credit
card fraud detection

It involves classifying transactions as either “fraudulent” or “legitimate” based on


patterns learned from historical data. Both Random Forest and CART algorithms are
particularly effective in this domain due to their ability to handle large datasets, manage
imbalanced data, and provide high accuracy.

..

9
Quality Planning and Control Activities

 Quality Assurance: The organization has a rigorous quality control process,


especially in the deployment of fraud detection models. Regular model
validation, performance monitoring, and updates are conducted to ensure the
system's accuracy and reliability.
 Control Activities: Continuous monitoring of the model's performance in
production, including tracking false positives and adjusting the model as needed.
My Role and Experiences

 Role: My team was responsible for implementing and optimizing the Random
Forest algorithm for detecting fraudulent transactions. Additionally, I contributed
to the integration of the CART algorithm to enhance anomaly detection.
 Experiences: Through this internship, I gained hands-on experience with
machine learning model development, data preprocessing, and the practical
challenges of deploying models in a real-world environment.
Comparison of Theory and Practice

 Theory: In academic settings, we learn about machine learning algorithms in a


controlled environment with well-prepared datasets.
 Practice: In the workplace, I encountered challenges such as handling imbalanced
data, real-time processing requirements, and the need for continuous model
improvement, which are often not emphasized in classroom settings.
Work Samples

1
0
UML
DIAGRAMS

Class diagram

User

UploadCreditCardDataset()
GenerateTrainAndTestModel()
RunRandomForestTree()
DetectFraudFrom TestData()
CleanAndFraudTransactionGraph()
Exit()

Use case Diagram

Upload Credit Card Dataset

Generate Train and Test Model

Run Random Forest Algorithm

User

Detect Fraud From Test Data

Clean And Fraud Transaction


Graph

Exit

1
1
Sequence Diagram:

1. Upload Credit Card Dataset


Us Applicatio
er n
2.Generate Train And Test Model

3.Run Random Forest Tree Algorithm

Detect Fraud From Test Data

Clean And Fraud Transaction Detection Graph

6.Exit

Collaboration Diagram

1: 1.Upload Credit Card Dataset


2: 2.Generate Train And Test Model
3: 3.Run Random Forest Tree Algorithm 4:
4.Detect Fraud From Test Data
5: 5.Clean And Fraud Transaction Detection Graph
User Applicati
on
6: 6.Exit

 Model Performance: I developed a graph comparing the precision, recall,


and F1- score of the Random Forest model before and after tuning
hyperparameters.
 Process Diagram: Included is a process diagram of the fraud detection
system that highlights each stage from data collection to model deployment.
 Data Visualization: I created visualizations to showcase the distribution
of fraudulent vs. legitimate transactions in the dataset.

10
3. OUTCOME OF THE INTERNSHIP

A. Skills and Qualifications Gained

During the internship, We acquired a range of technical and professional skills that
have significantly enhanced my qualifications:

 Machine Learning Expertise: I gained practical experience in implementing and


optimizing machine learning models, particularly using Random Forest and
CART (Classification and Regression Trees) algorithms for fraud detection. This
experience has deepened my understanding of these algorithms and their
application in the financial sector.
 Data Handling and Preprocessing: I developed advanced skills in data
preprocessing, including managing large datasets, dealing with imbalanced data,
and performing feature engineering. These skills are critical for ensuring the
accuracy and effectiveness of machine learning models.
 Technical Proficiency: My proficiency in Python, Scikit-learn, and SQL
improved significantly, enabling me to efficiently handle data and develop robust
models.
 Professional Skills: Beyond technical skills, I enhanced my problem-
solving abilities, critical thinking, and teamwork, which are essential for
success in a professional environment.

11
B.Responsibilities Undertaken

Throughout the internship, I was entrusted with several important responsibilities:

 Model Development and Optimization: I was primarily responsible for


developing and optimizing the Random Forest model for fraud detection.
Additionally, I integrated the CART algorithm to enhance the detection accuracy.
 Data Management: I handled the preprocessing of transaction data, ensuring
its quality and readiness for model training. This included tasks such as data
cleaning, normalization, and feature selection.

 Collaboration and Communication: I actively participated in team meetings,


collaborated with data scientists and engineers, and contributed to the decision
making process. I also documented the development process and findings for future
reference.
 Quality Assurance: I was involved in the evaluation and validation of the
models, ensuring they met the required performance standards before
deployment.

12
C.Influence on Future Career Plans

The internship has had a significant impact on my future career aspirations:

 Career Direction: The hands-on experience in credit card fraud detection has
reinforced my interest in pursuing a career in data science, particularly within the
financial technology sector. I now feel more confident in my ability to contribute
meaningfully to this field.
 Professional Growth: The skills and experience gained during this internship will be
instrumental in advancing my career. I am now better prepared to take on more
challenging roles in data science and machine learning.
 Networking Opportunities: The connections and relationships I built during the
internship will be valuable as I navigate my career path, providing opportunities
for mentorship and job opportunities in the future.

D.Correlation with Classroom Knowledge

 Application of Theory: Concepts such as machine learning algorithms, data


preprocessing techniques, and model evaluation metrics, which were introduced in
academic courses, were directly applied during the internship. This hands-on
application helped solidify my understanding of these concepts.
 Real-World Challenges: The internship exposed me to challenges that are not
typically encountered in a classroom setting, such as handling real-time data,
optimizing models for deployment, and ensuring model accuracy in a live
environment. These experiences provided a more comprehensive understanding of the
complexities involved in data science projects.

13
4. APPENDICES
 To run project double click on ‘run.bat’ file to get below screen

 In above screen click on ‘Upload Credit Card Dataset’ button to upload dataset

14
 After uploading dataset will get below screen

 Now click on ‘Generate Train & Test Model’ to generate training model for Random
Forest Classifier

15
 In above screen after generating model we can see total records available in dataset and
then application using how many records for training and how many for testing. Now click
on “Run Random Forest Algorithm’ button to generate Random Forest model on train and
test data

 In above screen we can see Random Forest generate 99.78% percent accuracy while
building model on train and test data. Now click on ‘Detect Fraud From Test Data’ button
to upload test data and to predict whether test data contains normal or fraud transaction

16
 In above screen I am uploading test dataset and after uploading test data will get below
prediction details

 In above screen beside each test data application will display output as whether transaction
contains cleaned or fraud signatures. Now click on ‘Clean & Fraud Transaction Detection
Graph’ button to see total test transaction with clean and fraud signature in graphical
format. See below screen

 In above graph we can see total test data and number of normal and fraud transaction
detected. In above graph x-axis represents type and y-axis represents count of clean and
fraud transaction

17
5. CONCLUSION AND FUTURE SCOPE

The Random forest algorithm will perform better with a larger number of training data,
but speed during testing and application will suffer. Application of more pre-processing
techniques would also help. The SVM algorithm still suffers from the imbalanced dataset
problem and requires more preprocessing to give better results at the results shown by
SVM is great but it could have been better if more preprocessing have been done on the
data.

The integration of more advanced preprocessing techniques to further improve model


accuracy, particularly in handling imbalanced datasets. Optimizing the Random Forest
algorithm for real- time fraud detection and exploring hybrid models with other machine
learning techniques could enhance performance. Additionally, focusing on scalability,
model interpretability, and deployment in cloud-based environments will ensure that the
system remains robust and effective as transaction volumes grow. These advancements
will help create a more reliable and efficient fraud detection system capable of adapting
to evolving challenges in the financial sector.

18
6. REFERENCES

 Sudhamathy G: Credit Risk Analysis and Prediction Modelling of Bank Loans


Using R, vol. 8, no-5, pp. 1954-1966.
 LI Changjian, HU Peng: Credit Risk Assessment for ural Credit Cooperatives
based on Improved Neural Network, International Conference on Smart Grid and
Electrical Automation vol. 60, no. - 3, pp 227-230, 2017.
 Wei Sun, Chen-Guang Yang, Jian-Xun Qi: Credit Risk Assessment in
Commercial Banks Based On Support Vector Machines, vol.6, pp 2430-2433,
2006.
 Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE,
“BLAST-SSAHA Hybridization for Credit Card Fraud Detection”, vol. 6, no. 4
pp. 309-315, 2009.
 Y. Sahin and E. Duman, “Detecting Credit Card Fraud by Decision Trees and
Support Vector Machines, Proceedings of International Multi Conference of
Engineers and Computer Scientists, vol. I, 2011.

19

You might also like