Data Preprocessing and Apriori Algorithm Improvement in Medical Data Mining
Data Preprocessing and Apriori Algorithm Improvement in Medical Data Mining
Abstract— In recent years, various medical and health updating at an unforeseen rate every year. Data mining
information systems have been widely used, and a large amount applications have also been extended to all walks of life in
of medical-related data has been accumulated in the hospital. The society. People use and process data stored in various servers
rise of mobile medicine has made medical information more and or data warehouses to realize trends such as trend analysis,
2021 6th International Conference on Communication and Electronics Systems (ICCES) | 978-1-6654-3587-1/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCES51350.2021.9489242
more digitized, and the medical industry has entered a veritable status quo interpretation and even disease classification
era of big data. These medical data are extremely valuable for the prediction in the medical field. At present, the most
diagnosis and treatment of diseases and medical research. representative types of data mining research directions are
Unfortunately, most hospitals currently only complete the cluster analysis, decision tree classification, time series
collection and storage of patient medical data, but lack in-depth
sequence prediction, feature extraction, and association rule
analysis and utilization of them. The method of data mining is
used to discover the laws from the massive medical data, which
mining. Among them, time series sequence analysis and
provides a novel method for medical personnel to acquire association rule mining are the most widely used in the
knowledge. Among medical data mining methods, intelligent medical industry [ 13-16].
methods such as association rules, artificial neural networks, and Association rule mining can analyze the frequency
rough set theory show unique advantages. Among them, relationship of the data item set in the transaction set from a
association rule mining can analyze the frequency relationship of given set of data items and transaction sets. Using association
the data item set in the transaction set from a given set of data
rule mining technology to analyze the medical data of patients
items and transaction sets.
with chronic diseases and find out the risk factors related to
Keywords— Medical Image, Data Mining, Preprocessing, chronic diseases, patients can do a good job in the prevention
Association rules, Medical Diagnosis of chronic diseases in time and enhance their own health
management capabilities. Association rule algorithm is the
I. INTRODUCTION focus of association rule mining, and its performance directly
affects the result of mining. Therefore, it is necessary to carry
In the past few years, information technology has
out research on association rule mining algorithm.
developed rapidly, and people can collect a large amount of
data through various modern data collection tools. At the same In view of the relatively high degree of medical
time, various industries in various social fields where people standardization in foreign developed countries, medical
live have collected a variety of information on the industry's information technology is relatively leading [17-20]. The
production, management, operation or sales, scientific Apriori algorithm is a classic algorithm in association rule
research, etc., which has led to a continuous increase in the mining technology. It has been continuously studied and
amount of data storage on a global scale[1-4] . People are improved during the development of data mining technology
becoming more and more rich in data generation and for so many years. It is the mainstream of research by many
collection methods, which has led to an explosive growth in scholars and people who love data mining technology, and is
the amount of data. Traditional data processing methods widely used in many In the business decision of the enterprise.
cannot meet people’s higher demand for data processing. Regarding the research on the placement of "beer-diapers" in
Therefore, discovering accurate data that people are concerned supermarkets in 1993, R. Agrawal et al. first proposed the
about from a large amount of data and discovering the internal concept of association rules and used them in the user's
relationship between data and transaction phenomena are commodity transaction database. In the following years, this
faced in the process of processing data. The problem [5-8]. algorithm gradually became the most core and classic
algorithm in the field of association rule mining due to its
On the other hand, as the concept of "Internet +" continues
simple, intuitive and easy-to-understand characteristics. With
to deepen, our country is also beginning to slowly move closer
the birth of high-performance concurrent processing systems,
to high-end medical systems such as smart medical care.
the efficient processing of massive amounts of data has
Because of this, while continuing to deepen digitization and
become faster and more convenient. In the field of medical
informatization, a large amount of medical data of patients
and health care, association rule algorithms are constantly
with chronic diseases has been generated, including patient
being improved and applied by scholars and researchers, and
physical examination data, medical diagnosis data, treatment
satisfactory results have been achieved. This fully confirms
medication lists, and so on. Many important information is
that the application of data mining technology in the field of
often hidden behind these massive medical data of patients
disease diagnosis and chronic disease prediction has broad
with chronic diseases [9-12]. Since the term KDD was
application prospects [21-24].
proposed, data mining technology has been developing and
C Px C
making any predictions . Each method of data mining has its n