Data Mining Applications in Tourism: Key Word Analysis
Data Mining Applications in Tourism: Key Word Analysis
Faculty of Economics & Business, University of Zagreb Trg J.F. Kennedyja 6, 10000 Zagreb, Croatia
{mpejic}@efzg.hr
Faculty of Organization and Informatics, University of Zagreb Pavlinska 2, 42000 Varadin, Croatia
{markus.schatten}@foi.hr
c
Abstract. The paper reviews applications of data mining in tourism, in particular from the perspective of the tourism, tourism intermediaries, and tourism suppliers. Web of science, SCOPUS, and in particular key tourism journals have been searched with the usage of appropriate keywords. Literature searching revelaed 88 papers that present applications of data mining in tourism. Keyword and conceptual network analysis was conducted with the usage of Wordl and LaNet-vi tools. Papers from tourism related journals and ICT related journals were analysed separately. In order to detect historic trends, anyalsis was conducted separately for the two periods before 2005, and since 2006. The conclusion of the paper is that tourism steps on a path to evolve to being both people-driven and data-driven, thus utilizing data mining approach as a leverage towards increased competitiveness and profitability. Keywords. data mining, conceptual network, review tourism, keywords,
application is large and it usually results in specific knowledge that is a basis for action. Examples are segmentation, lifetime value, credit scoring and churn prediction. Data mining in most cases explores data generated transactions, such as sales transactions, web logs, and process generated transactions with the usage of machine learning and statistical techniques [3]. Modern technology had great impact to tourism since the 90s, when Internet technologies became the most dominant communication channels [2]. Changes occured both on the demand and the supply side. Tourists' expectations are related not only to tourims services, but also to technology. Tourists expect that tourist companies actively implement modern technologies as the part of the value chain. Modern trends and technologies generated a whole set of new tools, such as recommendation systems [5]. In addition, new technologies help tourist companies to establihs one-to-one connection with the tourist, thus increasing the cusomters loyalty [8]. Number of review papers have investigated usage of modern technologies in tourims, but to our knowledge, attempts to review data mining and tourism are rare. Goal of this paper is to review data mining applications in tourism based on the keywords analysis.
1 Introduction
Data mining emerged in the 80's from the fields of machine learning, statistics, and databases, and has eventually taken place as one of the most important tools to get additional value from information gathered in organizational databases [1]. Its
2 Methodology
Following steps were performed in order to collect literature that investigates applications of data mining in tourism. First, Scopus and Web of Science databases were searched with the usage of following keywords: data mining, knowledge discovery in data bases, tourism, tourist, destination, travel and hotel. Since words tourism and tourist, have many derivatives, a lemmatization feature of Web of Science was turned-on in order to include similar words. We set the timeframe to the period from 1995 to 2013. In addition, tourism journals included in Web of Science and Scopus were aslo searched with the usage of above keywords. We limited the results of the search only to peer-reviewed journals. Only several articles that contain above keywords were found that actually do not tackle the issue of the paper, and were excluded from the anaylsis. Based on such approach, we found 88 papers in journals indexed in Web of Science and/or Scopus that deal with data mining applications in tourism. In order to conduct keyword and conceptual network analysis, we used Wordl and LaNet-vi tools. Three goals of the research were set: (i) to detect main concepts in tourism data mining applications, (ii) to detect historic trends, (iii) to detect differences between ICT and tourism related journals. Keywords were first analysed with the usage of conceptual network visualization using LaNet-vi. Then, keyword analysis were conducted for two periods (before 2005, and since 2006) and for two groups of journals (ICT and tourism related).
Figure 1. Conceptual network visualization The size of the node (shown on the left node size scale on the image) visualizes the number of connections a node has, whilst the color of the node depicts its interconnection with other nodes in the same core (depicted in the right scale of node colors in the image). As can be seen on the image there are 6 cores representing each a certain area of mainstream research as well as an outer sparsely connected residue representing non-mainstream areas of research. The six cores, from inside to outside, can be recognized as: (i) data mining and forecasting; (ii) machine learning and personalization; (iii) tourism management; (iv) tourism systems (recommender systems, geographic informations systems, mobile systems etc.); (v) segmentation, and (vi) advanced techniques (support vector regression, multi agent systems, particle swarm optimization etc.).
3 Results
3.1 Conceptual network visualisation
In order to gain insight into the mutual interconnection between the various keywords in papers that investigate data mining applications in tourism, we constructed a conceptual network based on co-affiliation; e. g. two keywords are connected if they are used on the same article which is similar to the approaches given in [4,6,7]. The conceptual network has been visualized using LaNet-vi 1 and is shown on figure 1.
1 http://lanet-vi.soic.indiana.edu/
2 http://www.wordle.net/
keywords in papers published prior to 2005 were tourism, segmentation, neural, information, mining, marketing, networks and forecasting.
Figure 4. Visualization of keywords in ICT related publications The visualization of keywords in tourism related publications on the other hand is quite different (figure 5.). Here most important keywords include segmentation, analysis, mining, tourism, and data.
Figure 2. Visualization of keywords until 2005 Figure 3. depicts the visualzation of keywords in papers published since 2006. Most important keywords here were tourism, segmentation, analysis, systems, data, mining, fuzzy, forecasting, travel, and recommender.
Figure 5. Visualization of keywords in tourism related publications When compared it becomes obvious that ICT related publications are especially focused on various types of (software) systems and their implementation while tourism related publications are more interested in analysis of actual data.
Our research opens an interesting area of data mining applications in tourism with keywords analysis based on the investigation of Web of Science and Scopus journal articles. However, this also generates the limitations of our study. First, more elabore analysis with the in-depth analysis of the articles should be conducted in order to get more insight into what decisions in tourism are being driven by the use of data mining, and what methodology, sample and time are being used in the process. In addtion, usage of additional sources such as case study reports and papers from highly respected, practicaly oriented conferences, should also be considered as candidates for future studies.
References
[1] Chou, D.C.; Chou, A.Y. A Managers Guide to Data Mining. Information Systems Management, 16(4): 33-41, 1999. [2] Hjalager, A.M.; Nordin, S. User-driven Innovation in TourismA Review of Methodologies, Journal of Quality Assurance in Hospitality & Tourism, 12(4):289-315, 2011. [3] Lee, S.J.; Siau, K. A review of data mining techniques, International Management and Data Systems, 101(1): 4146, 2001.
[23] Garca-Crespo,
A; Lpez-Cuadrado, JL; Colomo-Palacios, R; Gonzlez-Carrasco, I; Ruiz-Mezcua, B. Sem-Fit: A semantic based expert system to provide recommendations in the tourism domain. Expert Systems with Applications, 38(10):13310-13319, 2011.
[32] Hsu,
CHC; Kang, SK. CHAID-based segmentation: international visitors' trip characteristics and perceptions. Journal of travel research, 46(2): 207-216, 2008.
A web based planner for tourism and leisure. Expert Systems with Applications, 38(8):1008510093, 2011.
[74] Sung,
HH. Classification of Adventure Travelers: Behavior, Decision Making, and Target Markets. Journal of Travel Research, 42:343-356, 2004.
[77] Uysal,
M. Advancement in Computing: Implications for Tourism and Hospitality. Scandinavian Journal of Hospitality and Tourism, 4(3):208-224, 2004.