Rule Extraction Algorithm For Deep Neural A Review
Rule Extraction Algorithm For Deep Neural A Review
Artificial neural networks are the inspiration of human Bandwidth 10 bits/sec 10 bits/Sec
nervous system formed by an interconnection of neurons, a
single processing unit. Thousands and ten thousands of
Neuron updates/sec 10 10
transistors in a computer are analogous to neurons in the
neural network. Nevertheless, neural networks have a
significant number of connections, whereas computers have
376 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
Notwithstanding neural network attains high classification B. to analyze the applicability of the existing rule extraction
performance, they are not easily understandable [10]. All the algorithms for DNN
promising results achieved by the neural networks are within a
C. to address challenges of rule extraction from DNN and to
black box approach which is not comprehensible by the
provide future research direction
human. The shallow artificial neural network and DNN have
many applications in safety and mission critical systems. II. RULE EXTRACTION ALGORITHM
Some of theme are: industrial process control and fault
detection, power generation and transmission, aircraft icing ( According to [14] rule extraction is defined as "…given a
weather forecasting to assist pilots), consumer products such trained neural network and the data on which it was trained,
as LogiCook (the first neural network micro oven), medical produce a description of the network's hypothesis that is
diagnosis, vehicle health monitoring, and many others [11]. comprehensible yet closely approximates the network's
However, safety-critical systems require reliability validation, predictive behavior." Rule extraction algorithm is useful for
otherwise it leads to danger [12]. Hence, to change neural experts to verify and cross-check neural network systems.
network black box system into white box system, rule Extracted rules have different forms. We present an overview
extraction techniques from artificial neural network are of some of the logical rules below.
studied in depth for the previous two decades. Rule extraction
is an approach to reveal the hidden knowledge of the network IF-THEN rules: It is a conditional statement model, which is
and help to explain the process how neural network comes to a easily comprehensible. The general form of IF-THEN rule is:
final decision. So that user can understand it better. Moreover,
extracted rules can be used for hazard mitigation, traceability, IF X THEN Y=y (1)
system stability and fault tolerance, operational verification
and validation, and more [13]. If the given condition is true, in this case if X is a member of
S then the output will be labeled to a particular class. As a
Deep learning is showing astonishing result lately. simple example, a single neuron in a neural network with a
Researchers are reporting that DNN outperforms over the linear activation function can be modeled by IF-THEN logical
standard artificial neural network in several areas of Machine rule. The weighted sum of neuron is calculated as:
leraning. Additionaly, extracting a comprehensible rule from
DNN enhance the power and acceptability of DNN products, =∑ ∗ (2)
that comprises both comprehensibility and accuracy. Many
rule extraction algorithms have proposed by researchers. Our
Where, is an input and is a corresponding weight of
review focuses on rule extraction algorithms from the neural
network. In this work, we do not discuss algorithms that utilize connection between and neuron. The output Y of a
neural network for rule extraction as a tool. neuron is a function of the weighted sum given as:
Y=f( ) (3)
IF M of {N} THEN Z
Figure 1. Sample DNN architecture
The aim of this paper is: This method is efficient and general [15]. It can also easily
converted to a simple IF-THEN rule.
A. to discuss various rule extraction techniques from neural
network Decision tree: It is a most widely used tree structure classifier
in Machine learning and Data mining. This model classifies an
instance starting at the root of the tree and follow down to the
branches till the end. Decision tree uses a white box system
377 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
that is easy to explain. A simple Decision tree diagram is discrete inputs requires discretization of continuous attributes
shown in Figure 2. to comprise continues data which in turn affects the accuracy
of extracted rule. Thus, CRED does not employ discretization.
Andrew, Diederich, and Tickle proposed a Also, the authors have proposed an algorithm for rule
multidimensional taxonomy for the various rule extraction simplification based on J-measure. This algorithm showed a
algorithm [12].The first dimension of classification is based on good result tested on UCI database. Even though this
the expressive power of the extracted algorithm, which refers algorithm is independent of network structure and training
to the form of the rule (e.g. IF-THEN rule and Fuzzy rule). algorithm, it is not possible to apply the algorithm directly to
The second scheme considers the relationship between the DNN. FERNN [18] is an algorithm proposed to generate both
extracted rule and the architecture of trained neural network. M-of-N and IF-THEN rule from feedforward neural network
Accordingly, there are three categories. Rule extraction with a single hidden layer. Besides, there is no pruning and
algorithms that work on Neuron- level, rather than the whole retraining of the network. As a result, the speed of extraction
neural network architecture-level are called decompositional process is fast. The applicability of the algorithm to DNN is
techniques. If artificial neural network is considered as a black not even mentioned. Fu has proposed an algorithm called KT
box irrespective of the architecture, then these algorithms fall [19] that extract IF-THEN rule. It follows layer by layer
into a pedagogical category. The third is the combination of approach. This algorithm applies a tree search to find the rule.
both decompositional and pedagogical approaches. It is called In spite of the fact that the extracted rules are robust, in some
eclectics. cases even better than Neural network itself, DNN is not taken
into consideration. Tsukimoto [20] has introduced a rule
extraction algorithm that works with different types of neural
network, such as recurrent neural network and multilayer
perceptron (MLP), whose activation function is monotonic. It
extracts IF-THEN rules that are applicable for both continuous
and discrete values. Also, the computational complexity of this
algorithm is polynomial, whereas KT is in the order of
exponential.
B. PEDAGOGICAL APPROACH
378 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
Sethi, D. Mishra, and B. Mishra [26] have proposed high level of accuracy. The author said it might be the first
KDRuleEx that can generate a two-dimensional matrix of a rule extraction algorithm from DNN.
decision table. Training example set and trained artificial
neural network are input for this algorithm. It can work with III. FUTURE DIRECTION
both discrete and continuous inputs. Also, unlike most other The review presented in the previous section clearly
algorithms, it can handle non-binary inputs. The authors did demonstrates that there have been a lot of works devoted to
not report the architecture of artificial neural network used for rule extraction from artificial neural network. Table II shows
the test. Moreover, DNN is not mentioned at all. A reverse the summary of rule extraction algorithms along with the form
engineering approach algorithm called RxREN which is of extracted rule and used neural network type for the
proposed by Augusta and Kathirvalavakumar [27] extracts an experiment.
IF-THEN rule from a neural network. Reverse engineering
techniques are the analysis of output, to trace back TABLE II. SUMMARY OF ALGORITHMS
components that cause the final result. The authors reported
Used ANN Algorithm Extracted
that this algorithm is fast to search the rules. Furthermore, only Algorithm
a conventional feedforward neural network is used for an type Type Rule form
experiment. Schmitz, Aldrich, and Gouws [28] have proposed DIFACON- Standard
ANN-DT. This algorithm uses the sampling technique. It Decompositional IF-THEN
miner MLP
extracts a binary decision tree from a feedforward neural
network with both discrete and continuous data. Standard
CRED Decompositional Decision tree
MLP
In contrast with pedagogical algorithms, a decompositional Standard M-of-N ,IF-
approach is much translucent. However, the decompositional FERNN Decompositional
MLP THEN
technique works layer by layer. As a result, this method may
be tedious and time-consuming. Regarding computational Standard
KT Decompositional IF-THEN
limitation and execution time, pedagogical approach is better MLP
than decompositional [29]. Also, it has an advantage of Standard
flexibility in terms of ANN architecture. Tsukimoto’s
MLP and Decompositional IF-THEN
Algorithm
C. ECLECTIC APPROACH RNN
Zilke [33] have proposed a rule extraction algorithm called Standard Binary
ANN-DT Pedagogical
DeepRED from DNN by extending a decompositional MLP Decision tree
algorithm CRED. The proposed algorithm has additional
Standard
decision trees as well as intermediate rules for every hidden RX Eclectic IF-THEN
layer. Rule extraction from DNN is a step-wise process. It is a MLP
divide and conquer method that describes each layer by the Kahramanli
previous layer. Accordingly, each result is merged to get the and Standard
final rule that explains the whole DNN. The author has found Eclectic IF-THEN
Allahverdi’s MLP
the divide and conquer approach helpful to reduce memory
Algorithm
usage and computational time. Moreover, to facilitate the
process of rule extraction, FERNN is used for pruning less
DeepRED DNN Decompositional IF-THEN
important components of a neural network. Also, it is reported
that the experimental test with different datasets showed a
379 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
From Table II, we can understand that the target of most [7] C. Szegedy, A. Toshev and D. Erhan, "Deep Neural Networks
rule extraction algorithms is standard MLP neural network. So for Object Detection", in Advances in Neural Information
far there is a limited study concerning DNN. The challenge of Processing Systems 26, 2013, pp. 2553—2561
[8] Szegedy C, Reed S, Erhan D, Anguelov D. Scalable, high-
DNN is the complexity of its hidden layers. However, one
quality object detection. arXiv preprint arXiv:1412.1441. 2014
could develop an algorithm to extract a comprehensible rule Dec 3.
from DNN by taking advantage of Pedagogical approach. [9] S. J. Russell and P. Norvig, Artificial intelligence: A modern
Unlike decompositional approach, a pedagogical technique is approach. United Kingdom: Prentice Hall, 1994.
not affected by the number of hidden layers. The recent [10] W. Duch, R. Setiono, and J. M. Zurada, "Computational
astonishing ability of DNN to solve a variety of complex intelligence methods for rule-based data understanding,"
problems can be further improved by extracting Proceedings of the IEEE, vol. 92, no. 5, pp. 771–805, May 2004.
understandable rules. Consequently, we can use DNN for [11] P. Lisboa, Industrial use of safety-related artificial neural
networks. HSE Books, 2001.
different problem domains where validation is essential. This
[12] R. Andrews, J. Diederich and A. Tickle, "Survey and critique of
paper argues that rule extraction from DNN is important to techniques for extracting rules from trained artificial neural
take advantage of the performance. Besides, the existing work networks", Knowledge-Based Systems, vol. 8, no. 6, pp. 373-
is not adequate. Therefore, this area still requires particular 389, 1995.
attention. [13] B. J. Taylor and M. A. Darrah, "Rule extraction as a formal
method for the verification and validation of neural networks,"
IV. CONCLUSION Proceedings. 2005 IEEE International Joint Conference on
Neural Networks, 2005.
This paper attempts to provide a review of several rule [14] M. W. Craven, "Extracting Comprehensible Models from
extraction algorithms from an artificial neural network. Some Trained Neural Networks", Ph.D. dissertation, Department of
of the state-of-the-art algorithms are discussed from each Computer Sciences, University of Wisconsin-Madison, 1996.
category named as Decompositional, Pedagogical, and [15] G. G. Towell and J. W. Shavlik, "Extracting refined rules from
knowledge-based neural networks," Machine Learning, vol. 13,
Eclectics. Currently, Deep Learning provides an acceptable no. 1, pp. 71–101, Oct. 1993.
solution for lots of problems. It is a new machine learning area [16] L. Özbakır, A. Baykasoğlu, and S. Kulluk, "A soft computing-
which is believed to move machine learning a step ahead. The based approach for integrated training and rule extraction from
review implies that, surprisingly, little work has done targeting artificial neural networks: DIFACONN-miner," Applied Soft
DNN. It is still a black box system. Even though DNN Computing, vol. 10, no. 1, pp. 304–317, Jan. 2010.
architecture is complex, a pedagogical algorithm can be used [17] M. Sato and H. Tsukimoto, "Rule extraction from neural
as an advantage irrespective of the number of hidden layer. networks via decision tree induction", in International Joint
Pedagogical algorithms do not depend on the architecture of Conference On Neural Network, Washington, DC, 2001, pp.
algorithm. Thus, they might fill this gap. Extracting a 1870 - 1875 vol.3.
[18] R. Setiono and W. K. Leow, "FERNN: An algorithm for fast
comprehensible rule from DNN enhance the real world
extraction of rules from neural networks," Applied Intelligence,
usability of the promising solutions of DNN. Also, it can vol. 12, no. 1/2, pp. 15–25, 2000.
remove uncertainty problems associated with neural network [19] L. Fu, "Rule generation from neural networks," IEEE
software. Transactions on Systems, Man, and Cybernetics, vol. 24, no. 8,
pp. 1114–1124, 1994
REFERENCES [20] H. Tsukimoto, "Extracting rules from trained neural networks,"
IEEE Transactions on Neural Networks, vol. 11, no. 2, pp. 377–
[1] A. K. Jain, J. Mao, and K. M. Mohiuddin, "Artificial neural 389, Mar. 2000.
networks: A tutorial," Computer, vol. 29, no. 3, pp. 31–44, Mar. [21] K. KumarSethi, D. Kumar Mishra, and B. Mishra, "Extended
1996. Taxonomy of rule extraction techniques and assessment of
[2] P. Domingos, The master algorithm: How the quest for the KDRuleEx," International Journal of Computer Applications,
ultimate learning machine will remake our world. United States: vol. 50, no. 21, pp. 25–31, Jul. 2012.
Basic Civitas Books, 2015. [22] Thrun S. Extracting rules from artificial neural networks with
[3] M. Grégoire, "On layer-wise representations in deep neural distributed representations. Advances in neural information
networks", Ph.D. dissertation, Technische Universität, Berlin, processing systems. 1995:505-12.
2013. [23] M. Craven and J. Shavlik, "Using Sampling and Queries to
[4] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Extract Rules from Trained Neural Networks", in Machine
Classification with Deep Convolutional Neural Networks", in Learning: Proceedings of the 11th International Conference, San
Advances in Neural Information Processing Systems 25, 2012, Francisco, CA, 1994.
pp. 1106--1114. [24] E. W. Saad and D. C. Wunsch, "Neural network explanation
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, using inversion," Neural Networks, vol. 20, no. 1, pp. 78–93,
D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with Jan. 2007
convolutions", in 2015 IEEE Conference on Computer Vision [25] I. A. Taha and J. Ghosh, "Symbolic interpretation of artificial
and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1 - 9. neural networks," IEEE Transactions on Knowledge and Data
[6] H. Lee, P. Pham, Y. Largman and A. Y. Ng, "Unsupervised Engineering, vol. 11, no. 3, pp. 448–463, 1999
feature learning for audio classification using convolutional deep [26] K. Sethi, D. Mishra and B. Mishra, "KDRuleEx: A novel
belief networks", in Advances in Neural Information Processing approach for enhancing user Comprehensibility using rule
Systems 22 (NIPS 2009), 2009, pp. 1096-1104. extraction", in KDRuleEx: A Novel Approach for Enhancing
380 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No. 7, July 2016
User Comprehensibility Using Rule Extraction, Kota Kinabalu,
2012, pp. 55 - 60.
[27] M. Augasta and T. Kathirvalavakumar, "Reverse Engineering
the Neural Networks for Rule Extraction in Classification
Problems", Neural Process Lett, vol. 35, no. 2, pp. 131-150,
2011.
[28] G. P. J. Schmitz, C. Aldrich, and F. S. Gouws, "ANN-DT: An
algorithm for extraction of decision trees from artificial neural
networks," IEEE Transactions on Neural Networks, vol. 10, no.
6, pp. 1392–1401, 1999.
[29] M. Augasta and T. Kathirvalavakumar, "Rule extraction from
neural networks — A comparative study", in International
Conference on Pattern Recognition, Informatics and Medical
Engineering (PRIME-2012), Salem, Tamilnadu, 2012, pp. 404 -
408.
[30] E. R. Hruschka and N. F. F. Ebecken, "Extracting rules from
multilayer perceptrons in classification problems: A clustering-
based approach," Neurocomputing, vol. 70, no. 1-3, pp. 384–
397, Dec. 2006.
[31] H. Lu, R. Setiono, and H. Liu, "Effective data mining using
neural networks," IEEE Transactions on Knowledge and Data
Engineering, vol. 8, no. 6, pp. 957–961, 1996.
[32] H. Kahramanli and N. Allahverdi, "Rule extraction from trained
adaptive neural networks using artificial immune systems,"
Expert Systems with Applications, vol. 36, no. 2, pp. 1513–1522,
Mar. 2009.
[33] J. Zilke, "Extracting Rules from Deep Neural Networks", M.S.
thesis, Computer Science Department, Technische Universität
Darmstadt, 2015.
AUTHORS PROFILE
https://sites.google.com/site/ijcsis/
ISSN 1947-5500