Classification of Holy Quran Translation
Classification of Holy Quran Translation
methods of feature reduction should be applied to remove Stop words handler: There are many words in the verses
the irrelevant features from the initial feature set in order are repeated frequently and essentially do not carry any
to enhance the effectiveness of the NN classifier by information (Aggarwal and Zhai, 2012). Thus, their
increasing the susceptibility of this classifier, through presence in verses classification will present a lack of
feeding these attributes to the classifier in order to lead to understanding properly to the content of the verses.
more understanding of the learning process of classifying
verses. One of the methods of the feature reduction is the Stemmer: The aim of stemming is to minimize the words
feature selection method which is performed by applying to their roots which could be easily used to differentiate
the feature weighting scheme. the words.
The feature weighting scheme that applied by this
research is Term Frequency (TF) technique (Babu et al., Term Frequency (TF) transform: This technique
2014) in order to rank the features in the initial feature represents the feature selection method which is
vector and then choose a number of the high scoring responsible for generating the features set of the initial
features as a new feature subset (vector representation of features set depending on the word frequencies by
verses) which is considered the distinctive attributes in calculating their weight based on the following Eq. 1:
classifying the verses. These feature vectors are then
used to train the neural network. Based on the outcomes (1)
of the conducted experiments in this study and according
to the best result from these outcomes that attained in
classifying of the verses, the following tasks in the string where, fij is the frequency of word i in verse j (instance).
to word vector filter have been adjusted such as According to the significance of the terms in the verses,
illustrated in Fig. 2. this process includes assigning a weight to each term
which indicates the relative importance of the term in
Tokenizer: Tokenization is the process of dividing a verse and this depends on the word frequency in the
sequence of text in the verses into words, phrases, based verses. Therefore, the most repeated words in the
on n-gram technique. document (its term frequency is high) are regarded a more
significant in this document (Wang et al., 2012). If a
Lower case tokens: All the word tokens are converted to word (except stop words) appears within a particular
lowercase before being added to the feature set. Based on category, the word should be considered as a feature
the experiments the normalization technique improved the or discriminator of this category. For example, “Fasting”
classification accuracy (Patki and Kelkar, 2013; Uysal and or “Ramadan” frequently appears in the Fasting
Gunal, 2014). category.
4471