0% found this document useful (0 votes)
28 views

Rule Extraction Algorithm For Deep Neural A Review

Uploaded by

david teacher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Rule Extraction Algorithm For Deep Neural A Review

Uploaded by

david teacher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 14, No. 7, July 2016

Rule Extraction Algorithm for Deep Neural


Networks: A Review
Tameru Hailesilassie
Department of Computer Science and Engineering
National University of Science and Technology (MISiS)
Moscow, Russia
[email protected]

few [2]. The ability of high noise tolerance, adaptive learning,


Abstract—Despite the highest classification accuracy in wide fault tolerance, a highly accurate classification for the big
varieties of application areas, artificial neural network has one dataset, and self-organization makes a neural network fit
disadvantage. The way this Network comes to a decision is not
easily comprehensible. The lack of explanation ability reduces the
easily in a range of application areas. Application areas
acceptability of neural network in data mining and decision include, but not limited to, industrial process control, sensory
system. This drawback is the reason why researchers have data recognition, medical diagnosis, sales forecasting,
proposed many rule extraction algorithms to solve the problem. customer research, pattern recognition, and so on.
Recently, Deep Neural Network (DNN) is achieving a profound
result over the standard neural network for classification and In the case of recognition, to achieve excellent
recognition problems. It is a hot machine learning area proven performance, we need a large dataset to train a neural network.
both useful and innovative. This paper has thoroughly reviewed However, training requires a model with high learning ability
various rule extraction algorithms, considering the classification
and deep processing capacity. Research has shown that deep
scheme: decompositional, pedagogical, and eclectics. It also
presents the evaluation of these algorithms based on the neural processing also takes place in the human brain [3].
network structure with which the algorithm is intended to work. Consequently, there is a need for deep processing. Deep neural
The main contribution of this review is to show that there is a network (DNN) is intended for this need. Feedforward
limited study of rule extraction algorithm from DNN. network has at most one or two hidden layers, but DNN has a
stack of more than two hidden layers as depicted in Figure 1.
The deep depth, number of hidden layers, in DNN is useful to
Keywords- Artificial neural network; Deep neural network; work with complex problems more efficiently than the shallow
Rule extraction; Decompositional; Pedagogical; Eclectic. artificial neural network. Recently, DNN has drawn
researchers' attention. Especially, it becomes dominant for
video, audio, and Image classification and retrieval [4], [5],
I. INTRODUCTION [6]. As well as, the significant result has been achieved for

T he dramatic advances in technology nowadays require object detection [7], [8].


defined computing features that are not available on
Von Neumann computers. Those characteristics include TABLE I. COMPARISON OF STANDARD COMPUTER
(CIRCA 1994) AND HUMAN BRAIN [9]
learning ability, generalization ability, and adaptivity
[1]. Researchers are doing much work to meet this Computer Human Brain
demand by mimicking biological neural system to model a
modern computer system which is much closer to a human Computational units 1 CPU, 10 Gates 10 Neurons
nervous system. Digital computers surpass human in solving
computational problems. However, a person can solve 10 bits RAM, 10 Neurons, 10
complex perceptual problems. Accordingly, understanding the Storage units
10 bits disk Synapses
way the brain solves such problems benefit computing system
enormously. Table I shows the comparison of a digital
computer and the human brain. Cycle time 10 Sec 10 sec

Artificial neural networks are the inspiration of human Bandwidth 10 bits/sec 10 bits/Sec
nervous system formed by an interconnection of neurons, a
single processing unit. Thousands and ten thousands of
Neuron updates/sec 10 10
transistors in a computer are analogous to neurons in the
neural network. Nevertheless, neural networks have a
significant number of connections, whereas computers have

376 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
Notwithstanding neural network attains high classification B. to analyze the applicability of the existing rule extraction
performance, they are not easily understandable [10]. All the algorithms for DNN
promising results achieved by the neural networks are within a
C. to address challenges of rule extraction from DNN and to
black box approach which is not comprehensible by the
provide future research direction
human. The shallow artificial neural network and DNN have
many applications in safety and mission critical systems. II. RULE EXTRACTION ALGORITHM
Some of theme are: industrial process control and fault
detection, power generation and transmission, aircraft icing ( According to [14] rule extraction is defined as "…given a
weather forecasting to assist pilots), consumer products such trained neural network and the data on which it was trained,
as LogiCook (the first neural network micro oven), medical produce a description of the network's hypothesis that is
diagnosis, vehicle health monitoring, and many others [11]. comprehensible yet closely approximates the network's
However, safety-critical systems require reliability validation, predictive behavior." Rule extraction algorithm is useful for
otherwise it leads to danger [12]. Hence, to change neural experts to verify and cross-check neural network systems.
network black box system into white box system, rule Extracted rules have different forms. We present an overview
extraction techniques from artificial neural network are of some of the logical rules below.
studied in depth for the previous two decades. Rule extraction
is an approach to reveal the hidden knowledge of the network IF-THEN rules: It is a conditional statement model, which is
and help to explain the process how neural network comes to a easily comprehensible. The general form of IF-THEN rule is:
final decision. So that user can understand it better. Moreover,
extracted rules can be used for hazard mitigation, traceability, IF X THEN Y=y (1)
system stability and fault tolerance, operational verification
and validation, and more [13]. If the given condition is true, in this case if X is a member of
S then the output will be labeled to a particular class. As a
Deep learning is showing astonishing result lately. simple example, a single neuron in a neural network with a
Researchers are reporting that DNN outperforms over the linear activation function can be modeled by IF-THEN logical
standard artificial neural network in several areas of Machine rule. The weighted sum of neuron is calculated as:
leraning. Additionaly, extracting a comprehensible rule from
DNN enhance the power and acceptability of DNN products, =∑ ∗ (2)
that comprises both comprehensibility and accuracy. Many
rule extraction algorithms have proposed by researchers. Our
Where, is an input and is a corresponding weight of
review focuses on rule extraction algorithms from the neural
network. In this work, we do not discuss algorithms that utilize connection between and neuron. The output Y of a
neural network for rule extraction as a tool. neuron is a function of the weighted sum given as:

Y=f( ) (3)

So that, the output can be expressed by simple IF-THEN rule


as follow:

IF Y=f( ) α THEN Y=1 ELSE Y=0,

Where α is a Threshold value.

M-of-N rules: It searches for rules with a Boolean expression.


The expression is satisfied when M of N sets are satisfied. The
rule has the following form:

IF M of {N} THEN Z
Figure 1. Sample DNN architecture

The aim of this paper is: This method is efficient and general [15]. It can also easily
converted to a simple IF-THEN rule.
A. to discuss various rule extraction techniques from neural
network Decision tree: It is a most widely used tree structure classifier
in Machine learning and Data mining. This model classifies an
instance starting at the root of the tree and follow down to the
branches till the end. Decision tree uses a white box system

377 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
that is easy to explain. A simple Decision tree diagram is discrete inputs requires discretization of continuous attributes
shown in Figure 2. to comprise continues data which in turn affects the accuracy
of extracted rule. Thus, CRED does not employ discretization.
Andrew, Diederich, and Tickle proposed a Also, the authors have proposed an algorithm for rule
multidimensional taxonomy for the various rule extraction simplification based on J-measure. This algorithm showed a
algorithm [12].The first dimension of classification is based on good result tested on UCI database. Even though this
the expressive power of the extracted algorithm, which refers algorithm is independent of network structure and training
to the form of the rule (e.g. IF-THEN rule and Fuzzy rule). algorithm, it is not possible to apply the algorithm directly to
The second scheme considers the relationship between the DNN. FERNN [18] is an algorithm proposed to generate both
extracted rule and the architecture of trained neural network. M-of-N and IF-THEN rule from feedforward neural network
Accordingly, there are three categories. Rule extraction with a single hidden layer. Besides, there is no pruning and
algorithms that work on Neuron- level, rather than the whole retraining of the network. As a result, the speed of extraction
neural network architecture-level are called decompositional process is fast. The applicability of the algorithm to DNN is
techniques. If artificial neural network is considered as a black not even mentioned. Fu has proposed an algorithm called KT
box irrespective of the architecture, then these algorithms fall [19] that extract IF-THEN rule. It follows layer by layer
into a pedagogical category. The third is the combination of approach. This algorithm applies a tree search to find the rule.
both decompositional and pedagogical approaches. It is called In spite of the fact that the extracted rules are robust, in some
eclectics. cases even better than Neural network itself, DNN is not taken
into consideration. Tsukimoto [20] has introduced a rule
extraction algorithm that works with different types of neural
network, such as recurrent neural network and multilayer
perceptron (MLP), whose activation function is monotonic. It
extracts IF-THEN rules that are applicable for both continuous
and discrete values. Also, the computational complexity of this
algorithm is polynomial, whereas KT is in the order of
exponential.

B. PEDAGOGICAL APPROACH

Neural network is treated as a black box system in


pedagogical rule extraction algorithms. The focus of this
approach is finding an output for a corresponding input. The
weight of internal structure of artificial neural network is not
subjected to analysis [21]. There are different techniques in
this approach. Some of them are validity interval analysis
Figure 2. Sample Decision Tree (VIA) [22], sampling approach, and reverse engineering the
neural network.
A. DECOMPOSITIONAL APPROACH
Craven and Shavlik [23] have proposed a pedagogical
Decomposition algorithms work by splitting the network algorithm called TREPAN. It extracts M-of-N split point and
into neuron level. The result obtained from each neuron then decision tree from ANN by utilizing query and sampling
aggregated to represent the network as a whole. Özbakır, approach. Learning with queries is also introduced with the
Baykasoğlu, and Kulluk [16] have introduced an algorithm aim of information retrieval, which is essential for the learning
called DIFACON-miner that can generate an IF-THEN rule process. The architecture of neural network used for
from artificial neural network. Differential evolution (DE) and experiment has only one hidden layer. Saad and Wunsch [24]
touring ant colonization (TACO) algorithm are used for have introduced a pedagogical approach method HYPINV,
training and rule extraction respectively. The rule generation based on network inversion technique. This algorithm is
takes place in each iteration of DE. Before this proposed capable of extracting hyperplane rule. The rule extraction is in
algorithm, rule extraction was a sequential process. What the form of conjunction and disjunction of hyperplanes. The
makes this work different is that neural network training and authors used a standard MLP for the experiment.
rule generation can be performed simultaneously. It saves Notwithstanding that the algorithm is independent of the
additional quite a long time spent for rule extraction. architecture of MLP, DNN is not contemplated. Taha and
However, the algorithm addresses only a feedforward artificial Ghosh [25] have proposed three rule extraction algorithms.
neural network. This algorithm does not consider rule The first one is a pedagogical approach named as BIO-RE. It
extraction from DNN. An algorithm called CRED [17] extract extracts a binary rule from a neural network which is trained
both continuous and discrete rules by using decision tree from with binary input. However, the algorithm is tested with a
pre-trained ANN. Algorithms that specifically work with shallow MLP of four input, six hidden and three output node.

378 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
Sethi, D. Mishra, and B. Mishra [26] have proposed high level of accuracy. The author said it might be the first
KDRuleEx that can generate a two-dimensional matrix of a rule extraction algorithm from DNN.
decision table. Training example set and trained artificial
neural network are input for this algorithm. It can work with III. FUTURE DIRECTION
both discrete and continuous inputs. Also, unlike most other The review presented in the previous section clearly
algorithms, it can handle non-binary inputs. The authors did demonstrates that there have been a lot of works devoted to
not report the architecture of artificial neural network used for rule extraction from artificial neural network. Table II shows
the test. Moreover, DNN is not mentioned at all. A reverse the summary of rule extraction algorithms along with the form
engineering approach algorithm called RxREN which is of extracted rule and used neural network type for the
proposed by Augusta and Kathirvalavakumar [27] extracts an experiment.
IF-THEN rule from a neural network. Reverse engineering
techniques are the analysis of output, to trace back TABLE II. SUMMARY OF ALGORITHMS
components that cause the final result. The authors reported
Used ANN Algorithm Extracted
that this algorithm is fast to search the rules. Furthermore, only Algorithm
a conventional feedforward neural network is used for an type Type Rule form
experiment. Schmitz, Aldrich, and Gouws [28] have proposed DIFACON- Standard
ANN-DT. This algorithm uses the sampling technique. It Decompositional IF-THEN
miner MLP
extracts a binary decision tree from a feedforward neural
network with both discrete and continuous data. Standard
CRED Decompositional Decision tree
MLP
In contrast with pedagogical algorithms, a decompositional Standard M-of-N ,IF-
approach is much translucent. However, the decompositional FERNN Decompositional
MLP THEN
technique works layer by layer. As a result, this method may
be tedious and time-consuming. Regarding computational Standard
KT Decompositional IF-THEN
limitation and execution time, pedagogical approach is better MLP
than decompositional [29]. Also, it has an advantage of Standard
flexibility in terms of ANN architecture. Tsukimoto’s
MLP and Decompositional IF-THEN
Algorithm
C. ECLECTIC APPROACH RNN

Hruschka and Ebecken [30] have proposed an algorithm M-of-N


Standard
which is based on an algorithm called RX proposed by Lu, TREPAN Pedagogical spilit,
MLP
Setiono, and Liu [31]. The technique incorporates both decision tree
decompositional and pedagogical approach. Clustering genetic
Standard Hyperplane
algorithm (CGA) is employed for the purpose of hidden unit HYPINV Pedagogical
clustering. Subsequently, a logical rule extraction is based on MLP rule
the relationship between input and generated cluster. This Standard
algorithm is designed for shallow MLP neural network. An BIO-RE Pedagogical Binary rule
MLP
eclectic approach algorithm that uses artificial immune system
Standard
(AIS) is proposed by Kahramanli and Allahverdi [32]. It can KDRuleEX Decision tree
Pedagogical
generate a rule from a trained shallow feed forward neural MLP
network. Also, they reported that it has high accuracy value. Standard
RxREN Pedagogical IF-THEN
D. RULE EXTRACTION FROM DEEP NEURAL NETWORK MLP

Zilke [33] have proposed a rule extraction algorithm called Standard Binary
ANN-DT Pedagogical
DeepRED from DNN by extending a decompositional MLP Decision tree
algorithm CRED. The proposed algorithm has additional
Standard
decision trees as well as intermediate rules for every hidden RX Eclectic IF-THEN
layer. Rule extraction from DNN is a step-wise process. It is a MLP
divide and conquer method that describes each layer by the Kahramanli
previous layer. Accordingly, each result is merged to get the and Standard
final rule that explains the whole DNN. The author has found Eclectic IF-THEN
Allahverdi’s MLP
the divide and conquer approach helpful to reduce memory
Algorithm
usage and computational time. Moreover, to facilitate the
process of rule extraction, FERNN is used for pruning less
DeepRED DNN Decompositional IF-THEN
important components of a neural network. Also, it is reported
that the experimental test with different datasets showed a

379 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
From Table II, we can understand that the target of most [7] C. Szegedy, A. Toshev and D. Erhan, "Deep Neural Networks
rule extraction algorithms is standard MLP neural network. So for Object Detection", in Advances in Neural Information
far there is a limited study concerning DNN. The challenge of Processing Systems 26, 2013, pp. 2553—2561
[8] Szegedy C, Reed S, Erhan D, Anguelov D. Scalable, high-
DNN is the complexity of its hidden layers. However, one
quality object detection. arXiv preprint arXiv:1412.1441. 2014
could develop an algorithm to extract a comprehensible rule Dec 3.
from DNN by taking advantage of Pedagogical approach. [9] S. J. Russell and P. Norvig, Artificial intelligence: A modern
Unlike decompositional approach, a pedagogical technique is approach. United Kingdom: Prentice Hall, 1994.
not affected by the number of hidden layers. The recent [10] W. Duch, R. Setiono, and J. M. Zurada, "Computational
astonishing ability of DNN to solve a variety of complex intelligence methods for rule-based data understanding,"
problems can be further improved by extracting Proceedings of the IEEE, vol. 92, no. 5, pp. 771–805, May 2004.
understandable rules. Consequently, we can use DNN for [11] P. Lisboa, Industrial use of safety-related artificial neural
networks. HSE Books, 2001.
different problem domains where validation is essential. This
[12] R. Andrews, J. Diederich and A. Tickle, "Survey and critique of
paper argues that rule extraction from DNN is important to techniques for extracting rules from trained artificial neural
take advantage of the performance. Besides, the existing work networks", Knowledge-Based Systems, vol. 8, no. 6, pp. 373-
is not adequate. Therefore, this area still requires particular 389, 1995.
attention. [13] B. J. Taylor and M. A. Darrah, "Rule extraction as a formal
method for the verification and validation of neural networks,"
IV. CONCLUSION Proceedings. 2005 IEEE International Joint Conference on
Neural Networks, 2005.
This paper attempts to provide a review of several rule [14] M. W. Craven, "Extracting Comprehensible Models from
extraction algorithms from an artificial neural network. Some Trained Neural Networks", Ph.D. dissertation, Department of
of the state-of-the-art algorithms are discussed from each Computer Sciences, University of Wisconsin-Madison, 1996.
category named as Decompositional, Pedagogical, and [15] G. G. Towell and J. W. Shavlik, "Extracting refined rules from
knowledge-based neural networks," Machine Learning, vol. 13,
Eclectics. Currently, Deep Learning provides an acceptable no. 1, pp. 71–101, Oct. 1993.
solution for lots of problems. It is a new machine learning area [16] L. Özbakır, A. Baykasoğlu, and S. Kulluk, "A soft computing-
which is believed to move machine learning a step ahead. The based approach for integrated training and rule extraction from
review implies that, surprisingly, little work has done targeting artificial neural networks: DIFACONN-miner," Applied Soft
DNN. It is still a black box system. Even though DNN Computing, vol. 10, no. 1, pp. 304–317, Jan. 2010.
architecture is complex, a pedagogical algorithm can be used [17] M. Sato and H. Tsukimoto, "Rule extraction from neural
as an advantage irrespective of the number of hidden layer. networks via decision tree induction", in International Joint
Pedagogical algorithms do not depend on the architecture of Conference On Neural Network, Washington, DC, 2001, pp.
algorithm. Thus, they might fill this gap. Extracting a 1870 - 1875 vol.3.
[18] R. Setiono and W. K. Leow, "FERNN: An algorithm for fast
comprehensible rule from DNN enhance the real world
extraction of rules from neural networks," Applied Intelligence,
usability of the promising solutions of DNN. Also, it can vol. 12, no. 1/2, pp. 15–25, 2000.
remove uncertainty problems associated with neural network [19] L. Fu, "Rule generation from neural networks," IEEE
software. Transactions on Systems, Man, and Cybernetics, vol. 24, no. 8,
pp. 1114–1124, 1994
REFERENCES [20] H. Tsukimoto, "Extracting rules from trained neural networks,"
IEEE Transactions on Neural Networks, vol. 11, no. 2, pp. 377–
[1] A. K. Jain, J. Mao, and K. M. Mohiuddin, "Artificial neural 389, Mar. 2000.
networks: A tutorial," Computer, vol. 29, no. 3, pp. 31–44, Mar. [21] K. KumarSethi, D. Kumar Mishra, and B. Mishra, "Extended
1996. Taxonomy of rule extraction techniques and assessment of
[2] P. Domingos, The master algorithm: How the quest for the KDRuleEx," International Journal of Computer Applications,
ultimate learning machine will remake our world. United States: vol. 50, no. 21, pp. 25–31, Jul. 2012.
Basic Civitas Books, 2015. [22] Thrun S. Extracting rules from artificial neural networks with
[3] M. Grégoire, "On layer-wise representations in deep neural distributed representations. Advances in neural information
networks", Ph.D. dissertation, Technische Universität, Berlin, processing systems. 1995:505-12.
2013. [23] M. Craven and J. Shavlik, "Using Sampling and Queries to
[4] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Extract Rules from Trained Neural Networks", in Machine
Classification with Deep Convolutional Neural Networks", in Learning: Proceedings of the 11th International Conference, San
Advances in Neural Information Processing Systems 25, 2012, Francisco, CA, 1994.
pp. 1106--1114. [24] E. W. Saad and D. C. Wunsch, "Neural network explanation
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, using inversion," Neural Networks, vol. 20, no. 1, pp. 78–93,
D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with Jan. 2007
convolutions", in 2015 IEEE Conference on Computer Vision [25] I. A. Taha and J. Ghosh, "Symbolic interpretation of artificial
and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1 - 9. neural networks," IEEE Transactions on Knowledge and Data
[6] H. Lee, P. Pham, Y. Largman and A. Y. Ng, "Unsupervised Engineering, vol. 11, no. 3, pp. 448–463, 1999
feature learning for audio classification using convolutional deep [26] K. Sethi, D. Mishra and B. Mishra, "KDRuleEx: A novel
belief networks", in Advances in Neural Information Processing approach for enhancing user Comprehensibility using rule
Systems 22 (NIPS 2009), 2009, pp. 1096-1104. extraction", in KDRuleEx: A Novel Approach for Enhancing

380 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
User Comprehensibility Using Rule Extraction, Kota Kinabalu,
2012, pp. 55 - 60.
[27] M. Augasta and T. Kathirvalavakumar, "Reverse Engineering
the Neural Networks for Rule Extraction in Classification
Problems", Neural Process Lett, vol. 35, no. 2, pp. 131-150,
2011.
[28] G. P. J. Schmitz, C. Aldrich, and F. S. Gouws, "ANN-DT: An
algorithm for extraction of decision trees from artificial neural
networks," IEEE Transactions on Neural Networks, vol. 10, no.
6, pp. 1392–1401, 1999.
[29] M. Augasta and T. Kathirvalavakumar, "Rule extraction from
neural networks — A comparative study", in International
Conference on Pattern Recognition, Informatics and Medical
Engineering (PRIME-2012), Salem, Tamilnadu, 2012, pp. 404 -
408.
[30] E. R. Hruschka and N. F. F. Ebecken, "Extracting rules from
multilayer perceptrons in classification problems: A clustering-
based approach," Neurocomputing, vol. 70, no. 1-3, pp. 384–
397, Dec. 2006.
[31] H. Lu, R. Setiono, and H. Liu, "Effective data mining using
neural networks," IEEE Transactions on Knowledge and Data
Engineering, vol. 8, no. 6, pp. 957–961, 1996.
[32] H. Kahramanli and N. Allahverdi, "Rule extraction from trained
adaptive neural networks using artificial immune systems,"
Expert Systems with Applications, vol. 36, no. 2, pp. 1513–1522,
Mar. 2009.
[33] J. Zilke, "Extracting Rules from Deep Neural Networks", M.S.
thesis, Computer Science Department, Technische Universität
Darmstadt, 2015.

AUTHORS PROFILE

Tameru Hailesilassie received a BSc


degree in Electrical and Computer
Engineering: Computer Engineering
focus area from University of Gondar,
Institute of Technology with honors in
2015. He is pursuing Master’s degree at
the Department of Computer Science
and Engineering, National University of
Science and Technology (MISiS). His research interest
includes Software Engineering, Computer Engineering,
Machine Learning, and Computer vision.

https://sites.google.com/site/ijcsis/
ISSN 1947-5500

You might also like