0% found this document useful (0 votes)
12 views

Trend Analysis in Machine Learning Research

Uploaded by

Sugnanarao M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Trend Analysis in Machine Learning Research

Uploaded by

Sugnanarao M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

Trend Analysis in Machine Learning Research Using


Text Mining
Deepak Sharma1, Bijendra Kumar1, Satish Chand2
1
Department of Computer Engineering,
Netaji Subash Institute of Technology, New Delhi, India
2
School of Computer & Systems Sciences,
Jawaharlal Nehru University, New Delhi, India
{deepak.btg,bizender,schand20}@gmail.com

Abstract—This paper aims to identify the trends in machine data suitable for data mining for thoroughinvestigation
learning research using text mining. The researcharticles contain [9].Similar works on various research areas have been
significant knowledge and research results. However, they are performed using text mining. In [10], the text mining has been
long and have many noisy results such that it takes a lot of applied to understand the trend analysis of consumer policy. In
human efforts to analyze them. Text mining can be used to
analyze and extract useful information froma large number
[11], the text mining has employed for identifying the primary
ofresearch articles quickly and automatically. Text mining is the trends on Big Data in marketing. The research outcome helped
method of defininginnovative, and unseenknowledge to progress more direct efforts in the direction of business for
fromunstructured, semi-structured and structured textual data. Big Data in the marketing arena. In [12], the text mining
This knowldege contributed to very important information that applied for knowledge discovery in academic research.
can derive from textual data. In this paper, text mining This paper proposes to identify the trends in research of
methodsareapplied to detect trends of termsthat occur in the Machine Learning. The research articles available in well-
research articles and how they varies over time. We established mainstream journals overthe past three decades,
collected21,906 scientific papers from six top journals in the field i.e., 1988~2017. In this work, prominent journals included are
of machine learning published in period 1988-2017 and analyzed
them usingtext mining. Our result analysis shows a changing
IEEE Transactions on Pattern Analysis and Machine
trend ofvarious terms in Machine learning researchin three Intelligence (IEEE-PAMI), Journal of Machine Learning
decades. The analysis of our study helps the upcoming Research (JMLR), ScienceDirect Pattern Recognition (ScD-
researchers to explore the significant research area of machine PR), IEEE Transactions on Neural Networks (IEEE-NN),
learning. Springer Machine Learning (Sp-ML), and ScienceDirect
Neural Networks (ScD-NN) as primary data source.In this
Keywords—text mining; machine learning; research trend paper, text mining techniques employed in a framework for
analysis; data analysis determining the trends of Machine learning research articles
I. INTRODUCTION published in three decades. These articles include thetitle,
abstract, and complete contents of the articles [13].This
Text mining denotes to a process of mining meaningful, approach may be helpful to new researchers for further
non-trivial patterns or knowledge from a set of unstructured explorationof theirresearch area. The data used for processing
texts [1]. It is an essential task to uncover trends from large in this study were only the title and abstracts of the research
volume of textual data [2]. In particular, the advent of high- articles. Analyzing the title and abstract of a research article is
speed internet generates large amounts of textual data in a relevant as it comprises the comprehensive objectiveof a
variety of forms [3]. As an aspect of this trend, research research articleand prunedunneeded components of article
utilizing text mining technique is actively being carried out to i.e.figures and tables[13]. The remaining of the paperarranged
find patterns and extract implicit data from the large volume as follows:Section 2 provides the data collection and
of data in various fields such as academic article information preprocessing steps. Section 3 presents the result analysis.
and news article information [1,4,5].The goal of text mining is Finally, section 4 concludes the paper.
to determine hiddenknowledge which was not known
ealier.[6]. In [7],text mining referred as agroup of
techniquesemployed to identifytrends and produceknowledge II. METHODOLOGY
from data. In this section, we discuss the method of data preparation,
Text mining techniquesare derived the frequenciesof description of the corpus, data preprocessing for corpus before
important terms in thecontent of thetextual data such as applying text transformation in text mining. Fig. 1, shows the
internet chat rooms, articles, or web pages and methodology for trend analysis in machine learning.
classifyassociationsbetweenfeatures [8]. Text mining every so
ofteninterpretsunorganized text into a effectivecollection of

ISBN: 978-1-5386-4119-4/18/$31.00 ©2018 IEEE 136


International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

B. Description of Corpus

Data is gathered by preparing a list of relevantarticles from


well-established journals of machine learning. Table II shows
the dispersion ofa count of research articles in our study in
each decade. The corpus has prepared by collecting articles. It
has divided into three datasets. The first dataset (a.k.a.
Decade1), second dataset (a.k.a. Decade2), and third dataset
(a.k.a. Decade3) consist data for the periodsfrom 1988~1997,
1998~2007, 2008~2017 respectively.The titles and abstracts of
the research articles have extracted from the above-mentioned
journals.
C. Text Preprocessing
The preprocessing stagecomprises the removal of
Fig. 1. Methodology for trend analysis
unwanted words/characters from the corpus, and it performed
by executing the following steps. Initially, the titles and
A. Data Preparation abstracts of the articlesare converted into tokens. The
producedtokenstransformed into lowercase wordsfor each
The research data collected from various well-established
document. The removal of commas,exclamation points,
journals published with high-quality research articles in
quotation marks,punctuation characters, apostrophe, question
machine learning. We include the established journals like
marks, and hyphen performed. Further, the numerical values
IEEE Transactions on Pattern Analysis and Machine
are eliminated to keep only the textual tokens. Then, the
Intelligence (IEEE-PAMI), Journal of Machine Learning
standard English words as specified in nltk python package
Research (JMLR), ScienceDirect Pattern Recognition (ScD-
[14] and customized stop-word list [15] with the phrases are
PR), IEEE Transactions on Neural Networks (IEEE-NN),
removed from the literature dataset. Afterwards, for preparing
Springer Machine Learning (Sp-ML), and ScienceDirect
a useful literature dataset, the wordforms are stemmed to their
Neural Networks (ScD-NN). The data used for processing in
original root form by using the Porter Stemmer algorithm [16].
this study were only the title and abstracts of the research
Stemming is performed for tokens for each document and
articles from the mentioned journal. Recognizing significant
converts them into their root term. Finally, we are
contribution to research, we have included journal articles
transforming documents into sparse vectors.The text files in a
only for our work. The results are extracted from the time
corpus contain titles and abstracts of articles. The bag-of-
periodfrom 1988 to 2017. Table Ishows the count of journal
words document representations used for converting the
articles included in this studyas perthe selected journals.Each
documents into vectors. In this representation, each
dataset has considered a separate corpus.
articlerepresented by one vector, where each vector element
TABLE I. THE NUMBER OF ARTICLES INCLUDED IN THIS STUDY depicts a pair of word-wordcount. The mapping between the
words and their word count is called a dictionary. The sparse
#Numbers #Articles vectors are created by counting merely the number of
S.No. Journal Name Duration
of Years Published
1 IEEE-PAMI 1988~2017 30 4,630 occurrences of each distinct word and convert each word to its
2 IEEE-NN 1990~2017 28 4,349 integer word_id.The above steps are used to transform a
3 JMLR 2000~2017 18 1,755 corpus into vector representation for text mining.
4 ScD-PR 1988~2017 30 6,567
5 ScD-NN 1988~2017 30 3,294 D. Text Transformation
Springer-Machine
6 Learning 1988~2017 30 1,311 In this phase, the sparse vectors are transformed to TfIdf
Total 21,906 (termfrequency–inverse document frequency) vector. The
transformation of the articles from one vector representation
TABLE II. DISTRIBUTION OF NUMBER OF ARTICLES IN EACH DECADE
into another serves two purposes; first, it brings out the hidden
Journals/Year 1988~199 1998~2007 2008~2017 structure in the corpus to discover the relationships between
7 the words and describe the documents more semantically.
IEEE-PAMI 1,245 1,509 1,876
IEEE-NN 878 1,476 1,995 Second, it makes the document representation more compact.
JMLR 0 474 1,281 The terms are filtered by using two parameters such as word
ScD-PR 1,267 2,105 3,195 frequency and inverse document frequency. The terms having
ScD-NN 797 1,077 1,420 low occurences in the corpus and lowoccurences in the each
Springer-ML 293 443 575
#ArticlesCount 4,480 7084 10,342
document removed. In Bags of words representation, each
=21906 wordrepresented as a separate variable having numeric
weight.In this step, the sparse vectors of the corpus are

137
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

converted to TfIdf vectors using Eq. (1) as a formula for TfIdf 1998~2007) and decade3 (i.e., 2008~2017). The research
weights of term 𝑖, in document 𝑗, in a corpus of D documents. contribution in each decade is calculated using Eq. (2).
𝑎𝑐𝑡
𝐶𝐸𝐷(𝑡,𝑑) = 𝑛 (2)
D 𝑡=1 𝑎𝑐𝑡
𝑤𝑒𝑖𝑔𝑕𝑡 𝑖, 𝑗 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑖,𝑗 ∗ log 2 (1)
d𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑓𝑟𝑒𝑞 𝑖
where 𝐶𝐸𝐷(𝑡,𝑑) is the research contribution for each
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑖,𝑗 is the number of word occurrences in a descriptive term 𝑡 in each decade 𝑑, 𝑎𝑐𝑡 is the article count for
document (term frequency); d𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑓𝑟𝑒𝑞 𝑖 represents the each descriptive term t, 𝑛 is the total number of descriptive
𝑛
terms and 𝑡=1 𝑎𝑐𝑡 is sum of article count for all descriptive
number of documents containing the word (document
terms in each decade.
frequency); Drepresents the count of all documents;
𝑤𝑒𝑖𝑔𝑕𝑡 𝑖, 𝑗 is the relative significance of the word in the TABLE III. ARTICLE CLASSIFICATION WITH TERMS FROM 1988~2017
document.
S.
Descriptive 1988~ 1998~ 2008~
E. Feature and Attribute Selection N
Terms
Terms
1997 2007 2017
o.
In this phase, a subgroup of the features was picked to Artificial belief, boltzmann,
depict a text document. The selected featuresproduced an Neural convolutional, deep,
enhanced textual description,ascompared to several features 1 Network and forward, learning, logic, 1459 1052 2703
that have very few information regarding data. The number of Deep network, propagation,
learning recurrent
indicator variables reducedby eliminating a list of stop words. base, knowledge,
Stemming is performed on the terms, which are converted into average, bayesian,
Bayesian
root form. The terms are filtered by using two parameters such 2
statistics
dependence, estimator, 410 635 945
as word frequency and inverse document frequency. The terms gaussian, multinomial,
naive, network
having low occurences in the corpus and lowoccurences in the binary, classifier,
each document removed.Features were elected based on discriminant,
classification,andeliminate the fewinsignificant attributes. 3 Classifiers hierarchical, linear, 560 1020 1363
machine, multi, naive,
III. RESULT AND DISCUSSION probability, support
birch, dbscan, fuzzy,
In this section, we discuss the research article classification Cluster hierarchical, mean,
4 721 1214 1711
of the dataset used for study and articles contribution in each analysis algorithm, cluster, group,
decade. Finally, we have presented the trend analysis in optics, expectation
c4.5, c5.0, decision,
machine learning research. 5
Decision tree
detect, id3, iterative, 659 860 1135
algorithm
A. Research Article Classification random, sliq, stump, tree
component, correlation,
The Gensim package is used to perform text mining on the Dimension- discriminant, extraction,
titles and abstract of the collected articles. It based on the idea 6 ality factor, feature, least, 754 1572 2432
of handling on substantial unstructured text corpora, document reduction mapping, principal,
stochastic
after document, in a memory-independent fashion. Also, it ada, aggregate, average,
implements the Vector Space Model (VSM) algorithms [17] Ensemble boost, ensemble, forest,
7 248 464 714
and includes corpus transformations such as TfIdf, LSI, learning gradient, machine,
Random projection, etc. For experimental purpose, Gensim as random, tree
algorithm, base, learn,
a python library used for implementing the trend analysis and Instance-
map, near, object,
document streaming [18]. The articles classified into 14 8 based 309 926 1082
organize, quant, self,
learning
machine learning areas by applying the steps discussed in vector
Section 2. Table III shows the classification produced by the adapt, least, linear,
Regression logistic, multi, ordinary,
term frequencies and weights. The descriptive terms (1-14) 9
Analysis regression, spline, step,
158 457 800
represented as a classification labels with their corresponding variable
terms. Also, the table shows the count of articles retrieved in absolute, angle, elastic,
each decade as per the terms identified using TfIdf model. 10
Regularizatio least, net, operator,
88 271 526
n algorithm regression, ridge, select,
B. Research Contribution In Each Decade square
action, advance,
In this subsection, we discuss the contribution of Reinforceme algorithm, automata,
descriptive terms in each decade. Fig. 2, shows the percentage 11 201 269 372
nt learning difference, learn, prior,
of research contribution of descriptive terms (1-14)in each reward, state, temporal
decade from 1988~2017. The donut shape for each of the active, density, generate,
Semi-
graph, learn, method,
descriptiveterms represents the percentage contribution for 12 supervised
model, separate, train,
577 723 713
decade1 (i.e., from 1988~1997), decade2 (i.e., from learning
trans
13 Supervised algorithm, annova, boost, 1720 2347 3052

138
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

learning classify, hidden, learn, from 1998~2007) and green color for decade3
model, near, support, (i.e.,2008~2017). The research contribution across decades
target
expect, maximize, calculated using Eq. (3).
𝐶𝐸𝐷 𝑡,𝑑
Unsupervise algorithm, generate, map, 𝐶𝐴𝐷(𝑡,𝑑) = (3)
14 503 649 600 𝑑
d learning method, text, mine, 𝑘=1 𝐶𝐸𝐷 𝑡,𝑘
group, vector where 𝐶𝐴𝐷(𝑡,𝑑) is the research contribution for each
descriptive term 𝑡 across decade 𝑑, 𝐶𝐸𝐷(𝑡,𝑑) is the research
The percentage contribution of each descriptive term’s contribution for each descriptive term 𝑡 in each decade 𝑑, and
result shown in the donut. The top five research areas in 𝑑
𝑘=1 𝐶𝐸𝐷(𝑡,𝑘) is sum of each decade contribution for
decade1 were the supervised learning, artificial neural
network, dimensionality reduction, cluster analysis, and descriptive term 𝑡.
decision tree algorithm. Similarly, in deacde2 the top five
areas were supervised learning, dimensionality reduction,
cluster analysis, artificial neural network, and classifiers.
Finally, in the decade3top, five areas were supervised
learning, artificial neural network and deep learning,
dimensionality reduction, cluster analysis, and classifiers.

(a)

(b)

(c)

Fig. 2. (a-c) Percentage of research contributed for each descriptive terms in


each decade

C. Research Contribution Across Decades


In this subsection, we discuss the contribution of
descriptive terms across decades. Fig. 3, shows the percentage
of research contribution of descriptive terms (1-14) across
decade from 1988~2017. The donut shape for each descriptive
termsrepresents the percentage contribution in red color for
decade1 (i.e., from 1988~1997), blue color for decade2 (i.e.,

139
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

Fig. 5. Trend analysis of descriptive terms (8-14) in each decade


The following research areas showed the consistent increase in
their research trends since decade1 is Bayesian statistics,
cluster analysis, dimensionalityreduction, ensemble learning,
regression analysis, and regularization algorithm.Also, the
research areas that showed the consistent decrease in their
research trends since decade1 are decision tree algorithm,
reinforcement learning, semi-supervised learning, supervised
Fig. 3. Percentage of research contributed for each descriptive terms across learning and unsupervised learning. Finally, the research areas
decades
which showed increasing trend from decade1 to decade 2 and
decreasing trends in next decade are classifiers and instance-
The percentage contribution of each descriptive terms based learning.
results shown as the top five research areas across decade1
Fig. 6, shows the percentage increase of each descriptive term
was the artificial neural network, semi-supervised learning,
between decade1 (1988~1997) and decade2 (1998~2007).
unsupervised learning, decision tree algorithm, and supervised
During this period, the research in artificial neural network
learning. Similarly, across deacde2 the top five areas were
area decreased significantly. The research in regularization
instance-based learning, regression analysis, classifiers,
dimensionality reduction, and unsupervised learning. Finally, algorithm,instance-based learning, and regression analysis
increased substantially in decade2 as compared to decade1.
across the decade3top, five areas are regularization algorithm,
regression analysis, dimensionality reduction, ensemble
learning, and deep learning.
D. Trend Analysis
Trend analysis employed for detecting trends in research
articlesaccumulated over a period [13].It is an essential task to
uncover trends from large volume of textual data in several
fields [2,19]. The appearances of specific terms across the two
decades are used to understand the trends and research
patterns of research areas under study. The research trends
representedby two figures for excellent visibility of each
research areas. Fig. 4, and Fig. 5, show the research trends of
descriptive terms (1-7) and (8-14) respectively. The research
trends of artificial intelligence were very high in decade1 due Fig. 6. Percentage increase of each descriptive term between Decade2 and
to the advent of a backpropagation algorithm for training the Decade1
neural network which went down in decade2 due to deficiency
of computational resources and further rose in decade3 due to Fig. 7, shows the percentage increase of each descriptive
the evolution of deep learning and availability of high term between decade2 (1998~2007) and decade3
computation graphical processing unit (GPU). (2008~2017).

Fig. 4. Trend analysis of descriptive terms (1-7) in each decade


Fig. 7. Percentage increase of each descriptive term between Decade3 and
Decade2
During this period, the research in unsupervised learning, and
semi-supervised learning decreased significantly. The studyof
artificial neural network showed the highestincrease amongst
rest of the descriptive terms during this period. Fig. 8, showed
the percentage increase of each descriptive term over three

140
International Conference on Advances in Computing, Communication Control and Networking (ICACCCN2018)

decades. Each descriptive term increased significantly from [4] S. Lee et al., "Using Patent Information for New Product Development:
Keyword-Based Technology Roadmapping Approach," 2006
earlier decades, but the regularization algorithm and Technology Management for the Global Future - PICMET 2006
regression analysis showed the highestrise in decade3. Conference, Istanbul, 2006, pp. 1496-1502.
[5] A. Balahur and R. Steinberger, “Rethinking Sentiment Analysis in the
News: from Theory to Practice and back,” Proceedings of the 1st
Workshop on Opinion Mining and Sentiment Analysis, University of
Sevilla, pp. 1-12, 2009.
[6] V. Gupta and G. S. Lehal, “A survey of text mining techniques and
applications,” Journal of emerging technologies in web intelligence, vol.
1, no. 1, pp. 60-76, 2009.
[7] Louise Francis, and Matt Flynn, “Text Mining Handbook,” Casualty
Actuarial Society E-Forum, Spring, pp. 1-61, 2006.
[8] Louise Francis, “Taming Text: An Introduction to Text Mining,”
Casualty Actuarial Society Forum, Winter, pp. 51-88, 2010.
[9] P. Cerrito, “Inside text mining. Text mining provides a powerful
diagnosis of hospital quality rankings,” Health management technology,
vol. 25, no. 3, pp. 28-31,2004.
[10] M.-J. Kim, K. Ohk, and C.-S. Moon, “Trend Analysis by Using Text
Mining of Journal Articles Regarding Consumer Policy,” New Physics:
Fig. 8. Percentage increase of each descriptive term over three decades Sae Mulli, vol. 67, no. 5, pp. 555–561, 2017.
[11] A. Amado, P. Cortez, P. Rita, and S. Moro, “Research trends on Big
Data in Marketing: A text mining and topic modeling based literature
Thus, this section concludes the result analysis of machine analysis,” European Research on Management and Business Economics,
learning research trends using text mining. vol. 24, no. 1, pp. 1–7, 2018.
[12] A. K.Ojo and A. B. Adeyemo, “Knowledge Discovery In Academic
IV. CONCLUSION Electronic Resources Using Text Mining,” International Journal of
In this paper, text mining technique is utilized to perform Computer Science and Information Security, vol. 11, no. 2, pp. 1-10,
2013.
the trend analysis in the research area of machine learning in
[13] Z. Shaik, S. Garia and G. Chakraborty, “SAS® Since 1976: An
three decades. The content collection is prepared from the Application of Text Mining to Reveal Trends,” Proceedings of the
published research articles in sixwell-established SAS.Global Forum 2012 Conference, Data Mining and Text Analytics,
journals.Torealize the scholar'sresearch interest in descriptive SAS Institute Inc., Cary.
termsover previous30 years, the datasetsplit into three sets for [14] S. Bird, “NLTK: the natural language toolkit,” In Proceedings of the
COLING/ACL on Interactive presentation sessions, Association for
the time span of 1988~1997, 1998~2007, 2008~2017.This Computational Linguistics, pp. 69-72, 2006.
study can be useful to the upcoming researchers in the area of [15] Available for download from
machine learning to get an intuition of trends to their area of ftp://ftp.cs.cornell.edu/pub/smart/english.stop.
interest. This framework can also be applied to identifytrends [16] M. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3,
in research areasassociated to other field of study. pp.130-137, 1980.
[17] G. Salton, “A vector space model for automatic indexing,”
REFERENCES Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975.
[1] J.-L. Hung and K. Zhang, “Examining mobile learning trends 2003– [18] R. Řehůřek and P. Sojka, “Software Framework for Topic Modeling
2008: a categorical meta-trend analysis using text mining techniques,” with Large Corpora,” In Proceedings of LREC workshop New
Journal of Computing in Higher Education, vol. 24, no. 1, pp. 1–17, Challenges for NLP Frameworks, Valletta, Malta: University of Malta,
Oct. 2011. pp. 46-50, 2010.
[2] A. Kao and S. R. Poteet, Natural language processing and text mining.
London: Springer, 2010.
[3] I. O. R. Patterns and T. T. Mining, “Identification of Research Patterns
and Trends through Text Mining,” International Journal of Information
and Education Technology, vol. 2, no. 3, pp. 233–235, 2012.

141

You might also like