Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
Strategic decision-making in business is improved through gain new insights in their businesses, through various
business analytics, which can be descriptive, diagnostic, statistical methods & technologies using historical data
predictive or prescriptive. Business Analytics uses statistical
methods on historic data to gain new perceptions for better This paper considers the problem of modelling digital
policymaking. Topic modelling is a statistical, text-mining, economy corpora which has nearly 2000 art icles. Latent
unsupervised machine learning model, that can decipher themes Dirichlet Allocation (LDA) based topic modelling is used to
from a corpus such as social media posts, annual reports, social cluster the topics by calculating the probability distribution
media posts, news covers, related articles, trends in the domain, over a set of words. The result is represented in the form of
etc., In this research, Topic modelling is applied to Business word clouds for various topics showing the statistical
Digital Economy Dataset which hosts around 2400 titles, relationships that are useful for various tasks such as
abstracts, keywords from different authors related to various cataloguing, innovation, summarizat ion, & understanding the
topics on digital economy. Topic modelling has various importance of the topics.
approaches of which, Latent Dirichlet Allocation (LDA) is most
widely used. This paper explores the research articles related to The rest of the paper is structured as follows: Section 2,
emerging trends in business economy to extract the concealed provides literature survey on business analytics, Language
semantic structures and generate word clouds. The research processing & text mining, language modelling and topic
procedu re comprises, introduction of dataset, data pre- modelling in the field of business analytics. A detailed
processing, building and visualizing the model. description of proposed methodology is discussed in Section
3. Section 4 demonstrates our approach and reports empirical
Keywords—: Business Analytics, Topic modelling, Latent
Dirichlet Allocation (LDA) findings. Conclusion & future work is detailed in Section 5..
1818
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
recognition, audio to text conversion, sentiment analysis, and system enabled approaches. There is an increase in the
document summarization. Language Modelling is of two trend of BI researchers & applications and concepts in BI.
types: 1) statistical Model, which is probabilistic model to
Meaningful insights from a corpus of data can be
predict the next word in the sequence. 2) Neural language
Model, based on neural networks for speech recognition or retrieved using a natural language model-based information
retrieval. Joby P.P. [14] uses latent semantic analysis to
machine translation
extract substantial information fro m the questionary from
user or corpus.
Topic modelling methods are powerful s mart techniques For the reason in increase of data at larger dimensions
that widely applied in natural language processing to topic due to advancement in computer & web technologies, [15]
discovery and semantic min ing fro m unordered documents. Kherwa et al stated that Topic modelling method can be used
Blei et al [11] proposed that, It is a frequently used to extract hidden concepts, latent variables, protuberant
unsupervised text-mining tool to discover hidden semantic features of data, based on the framework.
structures or topics in large corpus of texts. Fig. 1 illustrates
the idea of Topic modelling strategy. Unsupervised document classification with web
scrapping using support vector machines & LDA topic
modelling was proposed by Thielmann et al [16] y ielding
accurate results for different data sets.
1819
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
high frequency words in the form of Word Clouds as shown θ is Document-topic distribution
in figure 3. Fro m Fig.3, the word that has bigger parameter
representation has higher prominence and more frequency. θ d,k is the topic proportion for topic k in
The next step is, building N-gram language model at document d
word level i.e., bigram and trigram models, to find the z is the word-topic assignment parameter
probability distribution over word sequences. This process zd,n is the topic assignment for the n th word
removes any kind noise (spelling errors, special characters, in document d
non-standard word forms, grammar mistakes, and so on) by
w is the observed word
finding the probabilities fro m the text such as, invalid
dictionary words based on their probabilities. To provide wd,n is the n th word in document d
more focus on the words which relate more meaning to the With these notations, the joint distribution of hidden
topics, commonly used words also known as Stopwords, & observed variables of LDA is as follows [8]:
such as “a”, “an”, “the”, “is”,” was”, etc., are removed. This
is followed by stemming & lemmatization, for word or text
normalization to prepare them for further processing. Fig. 4
shows the words after this process. For these words’ corpora
1820
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
1821
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
1822
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
[4] Cheng, C., & Havenvid, M. (2017). Investigating strategy tools from
IV. CONCLUSION & FUT URE W ORK an interactive perspectiveThe IMP Journal, 11(1), 127–149.
In this research study, a framework is created to enable [5] Laamanen, T ., Mantere, S., & Vaara, E. (2018). Strategy Processes
and Practices: Dialogues and IntersectionsStrategic Management
scholars to use topic modelling, do a related literature Journal, 39.
review, reducing the necessity of manually reading articles, [6] G.G. Dess, A.B. Eisner, G.T. Lumpkin, strategic Management: Text
analyse large corpus in a faster & higher reliable manner. and cases, 4th ed., McGraw-Hill/Irwin, New York, NY, 2008.
The framework is based on LDA topic modelling is [7] Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang,
Yanchao Li, & Liang Zhao. (2018). Latent Dirichlet Allocation
implemented on business economy dataset to categorize (LDA) and T opic modeling: models, applications, a survey.
articles under various topics by identifying them. The [8] Ponweiser, M. (2012). Latent Dirichlet allocation in R, Vienna
framework includes Pre-processing of data for cleaning University of Business and Economics.
&cross-validating the data, LDA Topic Modelling & Post- [9] Tong, Z., & Zhang, H. (2016). A text mining research based on LDA
processing to create various topics along with their frequency topic modelling. In International Conference on Computer Science,
of occurrence. The model is analyzed by calculating the Engineering and Information Technology (pp. 201–210).
perplexity & coherence score between the topics inferred. [10] https://insights.daffodilsw.com/blog/what-are-language-models-in-nlp
[11] Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation
Future work can include an improved framework that can Journal of machine Learning research, 3(Jan), 993–1022.
be applied to different corpuses to obtain practical insights [12] Blei D M. Probabilistic topic models[J]. Communications of the
for further development. Also, integrating the proposed ACM, 2012, (pp. 77-84).
framework of LDA topic modelling with various classifiers [13] Rouhani, S., Asgari, S., & Mirhosseini, S. (2012). Review study:
such as SVM, KNN for more optimal classification Business intelligence concepts and approachesAmerican Journal of
Scientific Research, 50, 62-75.
[14] Joby, P. P. "Expedient Information Retrieval System for Web Pages
REFERENCES Using the NaturalmLanguage Modeling." Journal of Artificial
Intelligence 2, no. 02 (2020): 100-110.
[1] Pröllochs, N., & Feuerriegel, S. (2020). Business analytics for [15] Kherwa, Pooja and Bansal, Poonam (2020) Topic Modeling: A
strategic management: Identifying and assessing corporate challenges Comprehensive Review. EAI Endorsed Transactions on Scalable
via topic modelling Information & Management, 57(1), 103070. Information Systems, 7 (24): e2.
[2] Qiang, J., Qian, Z., Li, Y., Yuan, Y., & Wu, X. (2020). Short text [16] Thielmann, A., Weisser, C., Krenz, A., & Säfken, B. (2020).
topic modeling techniques, applications, and performance: a Unsupervised Document Classification integrating Web Scraping,
surveyIEEE Transactions on Knowledge and Data Engineering. One-Class SVM and LDA T opic Modelling
[3] Ayitey, W. (2010). A Simple Approach to Strategic Management, [17] http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV10
Methodist Book Depot Ltd, Available from 11/oneata.pdf
https://www.researchgate.net/publication/279958992_A_Simple_App
roach_to_Strategic_Management
.
1823
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 07,2021 at 23:32:41 UTC from IEEE Xplore. Restrictions apply.