0% found this document useful (0 votes)
38 views8 pages

Classification of Holy Quran Translation

The document summarizes techniques for feature reduction when classifying text documents using neural networks. It discusses removing stop words, stemming words to their roots, and using term frequency to assign weights and select important features. Term frequency counts word frequencies and regards more frequent words as more significant features for classification. Feature reduction techniques like this can enhance neural network effectiveness by increasing susceptibility and leading to better understanding of the classification learning process.

Uploaded by

khalidaaboud72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views8 pages

Classification of Holy Quran Translation

The document summarizes techniques for feature reduction when classifying text documents using neural networks. It discusses removing stop words, stemming words to their roots, and using term frequency to assign weights and select important features. Term frequency counts word frequencies and regards more frequent words as more significant features for classification. Feature reduction techniques like this can enhance neural network effectiveness by increasing susceptibility and leading to better understanding of the classification learning process.

Uploaded by

khalidaaboud72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

J. Eng. Applied Sci.

, 13 (12): 4468-4475, 2018

Fig. 2: String to word vector filter

methods of feature reduction should be applied to remove Stop words handler: There are many words in the verses
the irrelevant features from the initial feature set in order are repeated frequently and essentially do not carry any
to enhance the effectiveness of the NN classifier by information (Aggarwal and Zhai, 2012). Thus, their
increasing the susceptibility of this classifier, through presence in verses classification will present a lack of
feeding these attributes to the classifier in order to lead to understanding properly to the content of the verses.
more understanding of the learning process of classifying
verses. One of the methods of the feature reduction is the Stemmer: The aim of stemming is to minimize the words
feature selection method which is performed by applying to their roots which could be easily used to differentiate
the feature weighting scheme. the words.
The feature weighting scheme that applied by this
research is Term Frequency (TF) technique (Babu et al., Term Frequency (TF) transform: This technique
2014) in order to rank the features in the initial feature represents the feature selection method which is
vector and then choose a number of the high scoring responsible for generating the features set of the initial
features as a new feature subset (vector representation of features set depending on the word frequencies by
verses) which is considered the distinctive attributes in calculating their weight based on the following Eq. 1:
classifying the verses. These feature vectors are then
used to train the neural network. Based on the outcomes (1)
of the conducted experiments in this study and according
to the best result from these outcomes that attained in
classifying of the verses, the following tasks in the string where, fij is the frequency of word i in verse j (instance).
to word vector filter have been adjusted such as According to the significance of the terms in the verses,
illustrated in Fig. 2. this process includes assigning a weight to each term
which indicates the relative importance of the term in
Tokenizer: Tokenization is the process of dividing a verse and this depends on the word frequency in the
sequence of text in the verses into words, phrases, based verses. Therefore, the most repeated words in the
on n-gram technique. document (its term frequency is high) are regarded a more
significant in this document (Wang et al., 2012). If a
Lower case tokens: All the word tokens are converted to word (except stop words) appears within a particular
lowercase before being added to the feature set. Based on category, the word should be considered as a feature
the experiments the normalization technique improved the or discriminator of this category. For example, “Fasting”
classification accuracy (Patki and Kelkar, 2013; Uysal and or “Ramadan” frequently appears in the Fasting
Gunal, 2014). category.

4471

You might also like