0% found this document useful (0 votes)
57 views

Classification Using Desiccion Tree On Audit Dataset Through R

This document describes using decision tree and Bayesian classification algorithms on an audit dataset in R. It discusses applying decision tree classification to the audit data to predict fraudulent firms. The methodology section describes using the R tool to test the audit dataset and output a decision tree. The dataset contains 777 firms' data from 46 cities across 14 sectors. Key attributes include risk scores and financial values. The conclusion is that decision tree was able to classify the data but may not be stable, so neural networks could be explored in the future.

Uploaded by

Anushka Jangir
Copyright
© © All Rights Reserved
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Classification Using Desiccion Tree On Audit Dataset Through R

This document describes using decision tree and Bayesian classification algorithms on an audit dataset in R. It discusses applying decision tree classification to the audit data to predict fraudulent firms. The methodology section describes using the R tool to test the audit dataset and output a decision tree. The dataset contains 777 firms' data from 46 cities across 14 sectors. Key attributes include risk scores and financial values. The conclusion is that decision tree was able to classify the data but may not be stable, so neural networks could be explored in the future.

Uploaded by

Anushka Jangir
Copyright
© © All Rights Reserved
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 9

Classification Using

Decision Tree & Bayesian


Classification on Audit
Dataset Through R

Submitted TO: Submitted By:


Dr. Monika Rathore Anushka Jangir
(Associate Professor) (Under Guidance of:

Dr. Monika Rathore)


Content

Abstract

Introduction

Methodology

Dataset Description

Procedure (Coding Snapshot)

Result Snapshot

Conclusion

References
1. Abstract:
Classification is type of supervised learning. it is a data mining technique that specifies the
class to which data elements belong to. And predicts a class for an input variable as well. It is
mainly used when the output has finite and discrete values. Classification is used to
divine(predict) group membership for data instance within a given time. It has a large range
of application like Medical Disease Diagnosis, Credit Card Rating, Artificial Intelligence, and
Document Categorization etc.

Decision Tree Algorithm comes under category of supervised learning that is part of machine
learning. It is a tree structured classic algorithm used in machine learning for classification
and regression purposes. It can be used for both categorical and numerical data. Categorical
data represent name, gender, etc whereas numerical data represent mobile number, age,
temperature, etc.
For decision Tree input is given to the particular algorithm and Cor-responding answer
containing a tree structure is generated, which helps in decision making. It is basically easy
to understand and useful in data exploration and less data cleaning required in it.

In this research paper uses the Audit Dataset for classification. Basically, apply the Decision
Tree Algorithm on the Audit Dataset to fine out the results and gain the Decision Tree of
given Dataset. Decision Tree Algorithm are suitable for both categorical and numerical data.
So, this research mainly Focus on the Decision Tree Algorithm of classification which is
applied on Audit dataset.

Keywords:
Data mining, Naïve Bays, Decision tree, Classification, Classification Techniques.

2. Introduction:
In order to discover useful knowledge from the given dataset, data miner apply the data
mining algorithms. Data miner is able to find out various kind of information underlying the
data. A Decision Tree is flowchart-Like tree structure, where each internal node represents a
test and each external node represent the outcome of the test. Given tuple X, the attribute
values of the tuple are tested against the decision tree. A path is traced from root node to
leaf node to predict the result of the given dataset. It is easy to convert decision tree into a
classification rule. It is a predictive model which maps observations about an item to
conclusions about the items target value.
There are two different type of Decision Tree first one is

 Binary Variable Decision Tree and


 Continuous Variable Decision Tree.
Technology related to Decision Tree:
Some basic terms that are used in decision tree are-
 Root Node: It represent Entire Population
 Splitting: Process of dividing a node into two or more nodes
 Decision Node: Sub-node that are splatted into more sub nodes, are called Decision
node
 Leaf/Terminal Node: Nodes that does not spit further, called Leaf node

 Pruning: The process of removing Sub-Nodes from the tree is called pruning
 Branch/ Sub-Node: A sub section of entire tree is called branch or sub tree
 Parent and Child Node: A node, which divides itself into sub nodes is called parent
node and child node are child or a parent node.

3. Methodology:

This research paper was implemented by using tool R on decision tree to predict the result
set of the given dataset. it tests the data of Audit dataset and find the result of the dataset.
4. Dataset Description:

Dataset name is “audit_data.csv”.


There are total 777 firm’s data from 46 different cities of a state that are listed by the
auditors for targeting the next field-audit work. The target offices are listed from 14
different sectors.

PARA_A
Score_B
Numbers
Money Value
District Loss
History
SCORE
Detection Risk

Columns:

Sector Score
Score_A
Risk_B
Score_C
Score_MV
Prob
PROB
Inherent Risk
Audit Risk

Location ID
Risk_A
Total
Risk_C
Risk_D
Risk_E
Risk_F
Control Risk
Risk

Some snapshot of Audit Dataset:

Remaining Columns of dataset:


5. Procedure(Coding Snapshot)

6. Result Snapshot:
7. Conclusion:
This research paper brings one of the case studies of an Audit Company of India. This case
study determines the application of machine learning techniques (classification) to predict
the fraudulent firm in the time of Audit Planning. Simply it applies the Decision Tree
Algorithm on the Audit dataset and get the Tree as output of given dataset.
Decision Tree algorithm is not stable because small changes in data can cause large changes
in Tree Structure so, this paper will use neural Network in future.

8. References:

[1] Hooda, Nishtha, Seema Bawa, and Prashant Singh Rana. "Fraudulent
Firm Classification: A Case Study of an External Audit." Applied Artificial Intelligence
32.1(2018).48-64.

[2] Liaw, Andy, and Matthew Wiener. "Classification and regression by


randomForest." R news 2.3 (2002): 18-22.

[3] Haykin, Simon, and Neural Network. "A comprehensive foundation."


Neural networks 2.2004 (2004): 41.
[4] Hearst, Marti A., et al. "Support vector machines." IEEE Intelligent
Systems and their applications 13.4 (1998): 18-28.

[5] Collins, Michael, Robert E. Schapire, and Yoram Singer. "Logistic


regression, AdaBoost and Bregman distances." Machine Learning 48.1-
3 (2002): 253-285.

[6] Zhao, Yongheng, and Yanxia Zhang. "Comparison of decision tree


methods for finding active objects." Advances in Space Research 41.12
(2008): 1955-1959.

[7] Rish, Irina. "An empirical study of the naive Bayes classifier." IJCAI
2001 workshop on empirical methods in artificial intelligence. Vol. 3.
No. 22. New York: IBM, 2001.

You might also like