0% found this document useful (0 votes)
9 views

Research On Data Mining System Based

Uploaded by

caoviettungt64
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Research On Data Mining System Based

Uploaded by

caoviettungt64
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of Intelligent & Fuzzy Systems 40 (2021) 6731–6742 6731

DOI:10.3233/JIFS-189507
IOS Press

Research on data mining system based


on artificial intelligence and improved
genetic algorithm
Shi Ruifeng∗
College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot,
Inner Mongolia, China

Abstract. Due to the explosive increase of data scale, the traditional database management technology can no longer satisfy
and analyze these data. Data acquisition technology is a tool that can process data effectively. The research of data acquisition
has produced many new concepts and methods, which enrich and improve the data acquisition technology and establish the
theoretical system. The relevant extraction criteria are an important branch of data extraction and one of the most important
research fields. The use of genetic algorithms to mine related standards has been widely used, but traditional genetic algorithms
are easy to be used. Therefore, under the best conditions, the application of better genetic algorithm to mine the relevant
standards is a key problem to be dealt with in this paper.

Keywords: Artificial intelligence, improved genetic algorithm, genetic BP neural network, data mining

1. Introduction techniques can not satisfy and analyze the hidden


knowledge of these data because of their explosive
With the rapid development of information tech- expansion. it is obvious that database management
nology, the ability to collect, store and process data technology is no longer able to process large data sets.
will increase. As a result, the database will gradu- therefore, data acquisition technology is the effective
ally expand and diversify, its scope of application tool we want to process large volume data. we hope
is wider. The maturity of data management tech- that this will help us to better understand the status
nology contributes to the computerized development of the data.
of society and the modernization of public business Data extraction is usually defined as a process
services, including the rapid growth of information through which a large amount of data is collected to
on the network, resulting in a large amount of data. obtain information. At present, data acquisition plays
Therefore, in order to analyze the large amount of an increasingly important role in all sectors, not only
data generated, the database has been expanded and to describe the development process of past data, but
widely used in business and science. With the devel- also to determine the development model that can
opment of database technology, database technology predict the results. In fact, from any point of view,
has also increased. Traditional database management knowledge is implicit information, it is necessary to
classify the data when there is data. Therefore, with
∗ Corresponding author. Shi Ruifeng, College of Computer and the development of science and technology and the
Information Engineering, Inner Mongolia Agricultural University, increase of information, the classification of massive
Hohhot, Inner Mongolia, China, E-mail: [email protected]. data is becoming more and more important.

ISSN 1064-1246/$35.00 © 2021 – IOS Press. All rights reserved.


6732 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

The research of data acquisition has produced progressive method based on competitive selection,
many new concepts and methods, which enrich and and the genetic algorithm is improved by optimizing
improve the data acquisition technology and estab- the particle mass to create new individuals [7] litera-
lish the theoretical system. the collection of these ture, the algorithms of particle mass, synthetic mass,
standards is an important branch of data acqui- ant and artificial immunity are analyzed, including:
sition research. using genetic algorithms to mine genetic algorithm combined with particle mass algo-
relevant standards has been widely used, but tradi- rithm, artificial immune algorithm, formed a hybrid
tional genetic algorithms are often the most suitable genetic algorithm, the advantage is that the polymer-
conditions. therefore, better application of genetic ization speed is not easy to decline, and a test function
algorithms to mine relevant standards is a key issue is used to verify the effectiveness of particle mass
to be addressed in this paper. and artificial immunity literature [8] explores the rela-
tionship between crossover and variable probability
and individual adaptation, and designs crossover and
2. Related work variable probability functions for individuals. dur-
ing the whole operation, highly adapted individuals
About the improved genetic algorithm, the liter- are protected from damage and the results are good
ature [1] analyzed the phenomenon of convergence through the test [1] literature has improved the for-
of traditional genetic algorithms. the reasons are mula of crossover and variable probability to solve its
mainly related to the selection of genetics, popu- shortcomings and adapt to the change of crossover
lation distribution and problem allocation. it lists and variable probability, and the formula of varia-
some individuals who can have a significant impact tion probability evolves relatively slowly in the early
on convergence of genetic algorithms and combine stage of development [9] literature describes other
the concept of the joint role of these individuals genetic activities and macro-operation of biomes
and environmental evolution literature [2] analysis and genetic algorithms, including mutations, visible
combines traditional genetic algorithms into static genes, diploid and polyploid structures, etc.
optimization research through algorithm selection, As to the research of the data acquisition system
intersection and mutation. the model theorem can of the relevant standards, the literature [1] focuses on
not guarantee that traditional genetic algorithms fol- the use of the relevant standard algorithms in large
low the optimal approach when solving optimization databases. These algorithms are successfully applied
problems [3] literature, the inherent algorithm paral- to the extraction of massive data through analysis
lelism of genetic algorithms is studied, and the choice and improvement. Literature [10] the basic concepts
to achieve this is summarized. Parallel genetic algo- and processes of data extraction, as well as the usual
rithms discuss the possible fields of general algorithm methods and techniques. Document [11] describes
parallelism. Document [4] designed an arithmetic the programming of computer standard algorithms
option with control parameters, introduced the con- and how to extract data about standards from com-
cept and mathematical basis of the design, analyzed puters literature [12] provides detailed information
the relationship between arithmetic selection and on the use of WEKA software (a set of machine
control parameters, compared the weak defects of learning algorithms for data extraction tasks) for
adjustment scale and global convergence used in pre-data processing, classification, regression, and
the selection of arithmetic, and adopted the “power grouping. Literature [13] introduces the methods
gradient method” to control the selection of selec- and techniques used in data acquisition, and ana-
tive arithmetic [5] the literature, some suggestions lyzes these methods in combination with biological
for improvement are put forward, which are based data acquisition cases. Document [14] provides a
on the traditional genetic algorithm accumulated in vision: an overview of new data acquisition research
the early stage, which recognizes the direction of areas and a detailed description of the data acqui-
evolution and uses this index to guide gene chro- sition process and related data acquisition methods.
mosome adaptation. A modified gene algorithm [6] Information [1] literature has studied the method of
developed in literature. This algorithm proposes a mining related standards by using genetic algorithm
particle swarm optimization method and combines to improve the structure and data coding of adaptive
it with a digital coding gene algorithm. Using a function to avoid premature variation adaptability [1]
series of chaos to produce the initial population, the literature, a mining algorithm based on multi-level
selection process uses nonlinear classification, the correlation genetic algorithm is proposed. according
S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm 6733

to the common characteristics of multi-level data, a Table 1


preliminary self-definition of critical value is pro- Basic concepts of association rules
posed to improve mining efficiency and improve Concept Analysis
the accuracy of mining a better simulated annealing Item A field in each transaction
genetic algorithm has been merged [1] the literature. database, each item is an
attribute value
in order to be applicable to the relevant standards, Items set An item set is a set of items that
a new mining algorithm is proposed according to are an attribute value
the improved simulated annealing gene algorithm. Support Support is used to measure the
frequency of an item set
within the framework of this algorithm, the inter-
Confidence levels One conditional probability
section and mutation possibility of the algorithm are reflects the conditional
selected by dynamic adjustment [15] literature sug- probability of B transactions in
gests improving the gene algorithm and improving A transactions.
Minimum support Indicates that the user is only
the efficiency of the gene algorithm [16] literature interested in itemsets and rules
has incorporated simulated annealing algorithms into not less than that threshold
genetic algorithms and cited a new adaptive function Minimum confidence Indicates that the user is only
for standard algorithms to build partnerships to deter- interested in certain rules,
which have a high probability
mine support standards and reliability [17] literature, Frequent itemsets The minimum degree of support
the degree index of interest in the connection stan- is used as the threshold to
dard algorithm is adopted, which effectively avoids measure the frequency of
the error standard, and makes appropriate dynamic itemsets
Strong association rules The association rule that satisfies
adjustments to the crossover probability and variable the minimum support degree
probability. and the minimum confidence
In short, with the increase of data in the database, degree
the simple application of connection rules is inef-
ficient. Search by genetic algorithm can greatly
increase the efficiency. Almost all researchers theory to find the most informative attribute field in
improve the overall search ability by studying genetic database for searching, which is a very important clas-
algorithm. And it is used in the mining of relevant sification and mining method, among which the most
standards. famous methods include: ID3¡¢IBE and so on.
(4) Fuzzy mathematical method: fuzzy mathemat-
ics is an important new mathematical thing, in the
3. Basic analysis of traditional algorithm field of data extraction, it is mainly comprehensive
distinction fuzzy, connection fuzzy, grouping fuzzy
3.1. Analysis of common algorithm for data
and fuzzy classification.
mining
(5) Genetic algorithms: algorithms that simu-
Data mining methods are as follows: late the natural selection of biological communities,
(1) Statistical methods: statistical analysis of scale mainly by selecting operators’ cyclic calculations,
data in databases using statistical principles. Statistics cross accounts and variables, which have overcome
play a very important role in the whole data collection the problem of nonlinear optimization and polariza-
process and provide a series of traditional methods tion in some places.
for data use, including regression analysis (multiple
regression, spontaneous regression, etc.), differen- 3.1.1. An overview of association rule algorithm
tial analysis (BAYES standards, fishery standards, For a long time, the link standard has become an
non-standardization, etc.), group analysis (system- important field of data extraction research, because
atic grouping, dynamic grouping, etc.), exploratory it can find the relationship between attributes in the
analysis (meta-analysis, correlation analysis), etc. database, and it is one of the first problems to be con-
(2) Neural network method: a nonlinear prediction sidered in the extraction process. As shown in Table 1,
model based on neural structure, based on research. the scope of application of connection rules is not
The usual algorithms include front-end neural net- only limited to sales data, but also increasingly uses
work (BP algorithm), self-organized neural network. connection rules to discover the relationship between
(3) Tree decision method: Tree decision method various things and their meaning see Table 1 for
is to use the information advantage in information details.
6734 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

Where the support degree of the item set X={A,B} is as follows:


is the ratio of the number of transactions containing 
both the transaction A and the B to the total number X = [x0 , x1 , . . . , xn ] (4)
of transactions. Use Sup (X) to represent the number
of occurrences of itemsets X. That is, probability P Implicit layer: is an effective network process-
(A∪B). ing layer, which usually adjusts the input layer by
the activity function in a specific region, receives
Sup(X) Sup(A ∪ B)
Suppot(A => B) = PA ∪ B) = = weighted data from the input layer, and processes
|D| |D|
(1) the data through a threshold to obtain more data. An
expression of the neural i of the hidden layer is as
The ratio of the number of transactions containing follows:
A and B to the number of transactions containing A, 
that is, the confidence of the A=>B is {the degree of xi = X W i (5)
support of the A} divided by the degree of support of Connection weights between the hidden layer neu-
the item set {A,B}. That is, probability P (A |B). ron i and the input layer are as follows:
Sup(A ∪ B) ⎡ ⎤
Confidence(A => B) = P(A |B ) = wi0
Sup(A) ⎢ ⎥
(2) ⎢ wi1 ⎥
⎢ ⎥
For the A=>B of this rule, the degree of interest is ⎢ wi2 ⎥
 ⎢ ⎥
defined as: W i == ⎢ ⎥ (6)
⎢ · ⎥
⎢ ⎥
P(A |B) Sup(AB) ⎢ . ⎥
Inte(A => B) = = ⎣ ⎦
P(A)P(B) Sup(A)Sup(B) win
Confidence(A => B)
= (3) And the expression of neuron i is:
Support(B)
Formula (3) when the interest is 1, it shows that xi = [x0 wi0 , x1 wi1 , . . . , xn win ] (7)
the transaction A does not affect the transaction B,
When the neuron i get the input data, it will be
and the interest is more than 1, then the transaction A
constrained to the expected value by the activation
will lead to the transaction B, which is a rule. When
function. The activation function is S type activation
interest is less than 1, the transaction A prevents the
function, as shown below.
B. of the transaction
1
f (x) = (0 < f (x) < 1) (8)
3.1.2. BP neural networks 1 + e−ax
BP learning process of the pattern is as follows: So the output value of the neuron i after the acti-
provide training samples for the network pattern and vation function F(·) constraint is:
disseminate data, that is, the difference between the

actual output and the actual output. Through learn- oi = f (xi ) = f (XW i ) (9)
ing, the network is gradually stable and the network
weight is changed one after another. The actual out- Hence the output vector of the whole hidden layer
put of the network will tend to the ideal output and is :(where k is the number of neurons in the hidden
will be kept within the prescribed limit. layer)
BP network is a network composed of access, out-

put and intermediate networks, usually with only one O = [o1 , o2 , . . . , ok ] (10)
default layer, in special cases, increasing the number
of hidden layers as needed. The real data processing Output layer: the input of the output layer is the
is as follows: the input vector of the input layer is weighted sum of the output of the hidden layer, and
usually the value of the sample information charac- the received data is processed by the activation func-
teristic, and after the input layer receives the data, the tion, and then the actual output of the whole network
data is transmitted directly to the ground, which is a is obtained. An input expression j the output layer
set of data in the sample received. Because no special neuron is as follows:
processing of data is required. Thus, the expression  
x j = OW j (11)
S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm 6735

Output layer is also a S activation function, which And after substitution, the above formula is sim-
constrains the data of data output layer. and the output plified as follows:
of the neuron j is:
Wjk (t + 1) = Wjk (t) + Wjk
  ∂E
oj = f (xj ) = f (OW j ) (12) Wjk = −η (16)
∂Wjk
So the output vector of the whole network is: δk = (tk − Ok )Ok (1 − Ok )

−→ The expected output of the general hidden layer is


OO = [o0 , o1 , . . . , om ] uncertain, and the adjustment formula of the weight
(13)
  matrix of the hidden layer is as follows:
= F2 (F1 (XW 1 )W 2 )

m
δj = ohj ∗ (1 − ohj ) ∗ (wji ∗ δio )
First, the mean square error function of the network
i=1
is calculated, and its mathematical expression is as (17)
follows: vij = vij + vij

1  vij = lr ∗ δj ∗ χi
E= (tpk − opk )2 (14)
2P The formula is simplified as follows:
P K
Wij (t + 1) = Wij (t) + Wij
When the error of the network is obtained, the error
first reaches the output layer and adjusts the weight ∂E
Wij = −η = ηδj xi
of the output layer. The mathematical expression of ∂Wij (18)
weight matrix adjustment is as follows: 
δj = δk Wjk Oj (1 − Oj )Oi
k
wij = lr ∗ δ ∗ ohi
Learn for the second time according to the next set
wij = wij + wij (15)
of sample data and repeat until the network perfor-
δj = ooj ∗ (1 − ooj ) ∗ (yi − ooj ) mance is optimal see Fig. 1 for details.

Fig. 1. BP Network training process.


6736 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

Fig. 2. Schematic diagram of genetic algorithm.

3.2. Basic principles of genetic algorithms of individual selection is the cumulative probability is
QI, and the cumulative probability is compared with
GA encode each possible solution as a vector, the R[0.1] random average generated by the proba-
each chromosome vector element is called a gene. bility. Determine which individual replicates in the
all chromosomes are evaluated according to the next generation.
expected objective function of each chromosome and
the fitness values are assigned according to their 
i
f (i)
respective characteristics. starting from the random Pi = N
, Qi = Pj (j = 1, 2, · · · , i) (19)
generation of certain chromosomes, their adaptabil- f (i) j=1
ity, chromosome selection, exchange, and mutation i=1
are calculated by the elimination of low-adaptive
chromosomes and the retention of high-adaptive Therefore, the probability reflects the proportion of
chromosomes; in general, the new chromosome individual adaptation in the whole group adaptation
group is larger than the generating group. By anal- and the greater the individual adaptation, the greater
ogy, until the optimization goal is achieved, Fig. 2 the possibility of selection: conversely, the higher the
illustrates the basic principles of genetic algorithms: probability of selecting each individual in the group.
see Fig. 2 for details. (2) Cross operator: Cross chromosomes, called
(1) Selecting operators: The basic operation of “recombination and pairing”, are between two paired
genetic algorithms includes the selection of screen- chromosomes, exchanging some of their genes in one
ing, intersection and mutation operators, also known way or another, resulting in two new chromosomes.
as operators, whose function is to determine whether The effectiveness of genetic algorithms mainly comes
the individual will be eliminated or retained in the from the selection of cross operations, which play a
next generation, from which the best parents are central role and determine the overall search ability
selected according to their merit. In general, three dif- of general algorithms.
ferent types of specific choices in the field are most The first is to randomly select the two chains of the
common when there is a mix of clear options and father’s generation, and then randomly determine the
options: intersection point; finally, the intersection point is L,
The specific population is N, individual adaptation the length of the chain is L, the intersection point is
is F (i), the individual adaptation is I, the probability L-1, the result is see Fig. 3 for details.
S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm 6737

(2) Temperature: Temperature is an important


parameter in the simulation cooling algorithm,
because the cooling process of solid flame changes
with its composition. The distance between the
new solution produced by the simulated annealing
algorithm and the existing solution is controlled.
Secondly, the possibility of accepting the new expla-
Fig. 3. Examples of single-point crossover.
nation by the simulated annealing algorithm is
determined. The objective function values of these
explanations are lower than the current objective
Table 2
Comparison of similarities
function values.
(3) Annealing schedule: Annealing schedule
Combination optimization problem Solid
involves the use of algorithms to reduce temperature.
Solution Particle state
Optimal solution The lowest energy state
The slower the temperature decreases, the slower the
Set initial temperature Melting process annealing decreases. The simulated annealing algo-
Metropolis sampling process Isothermal process rithm is the best solution found at present. The time
Decline in control parameters Cooling schedule includes parameters such as initial temper-
Objective function Energy
ature and control temperature function.
(4) Metropolis criterion: Metropolis criterion is
a method explanation of simulated annealing algo-
(3) Variant: The so-called variant, which includes rithm. This paper is used to optimize the selection
the selection and intersection of most of the search of target function to optimize objective function. The
functions of genetic algorithms, replaces some possibility of new solution is:

genetic value of each chromosome with other chro- ⎨1 f1+1 ≥ f1
mosomes, thus creating a new individual, is the best P= −f  (20)
measure against the general algorithm. ⎩exp − k(f1+1
T
1 )
f1+1 < f1

It can be seen from formula (20) that if the new


3.3. Simulated annealing algorithm solution is less than the current solution, the higher
the temperature, the greater the possibility of the dif-
Simulated annealing algorithm was originally pro- ferential solution. Therefore, the simulated annealing
posed in 1953. the algorithm is mainly based on algorithm can be optimized locally more easily, and
the complexity of the NP. through the optimiza- the probability of accepting the differential solution
tion process, the partial optimization is achieved. the decreases with the decrease of temperature. Metropo-
optimization process is compared with the thermal lis criterion mainly involves simulated annealing
equilibrium problem of statistical thermodynamics. algorithm see Fig. 4 for details.
the physical images and statistical features of the solid
annealing process are used as the physical environ-
ment to avoid local optimization of the algorithm. 4. An improved genetic algorithm ——
the solid reaction is to heat the solid to a sufficiently simulated annealing genetic algorithm
high temperature so that the molecules are randomly
placed, then cooled, and finally the molecules are 4.1. Ideas for genetic algorithm improvement
sorted in a low energy state. Table 2 compares the
optimization problem with the solid similarity see The basis of genetic algorithm is: choosing to pass
Table 2 for details. the best model of the present individual to the next
The basic concepts of the simulation defense algo- generation individual for arithmetic, using the cross
rithm: equation to adjust the structure of the model, some
(1) Objective function: the objective function is to bad models are phased out, and some good models
optimize the minimum value of the objective function are left behind. And gradually get the best results.
in general when the maximum value of the desired However, in the operation of practical algorithms,
function is converted to the minimum value of the multiple models affect the efficiency of genetic algo-
objective function multiplied by -1. rithms. In the case of limited resources, it is necessary
6738 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

4.2. Operation of simulated annealing genetic


algorithm

Simulated annealing gene algorithm is a combi-


nation of gene algorithm and simulated annealing
algorithm. Its main operation is screening, interleav-
yes ing and mutation.
(1) Selective manipulation based on immune
no
yes mechanisms: Selective manipulation is the selection
no of the most environmentally appropriate individuals
from the population and their use in the next genera-
yes tion of reproduction.
no (2) Adaptation to cross-operation: Cross-operation
refers to the exchange of the same gene between dif-
ferent individuals to create a new gene, which is
Fig. 4. Flowchart of simulated annealing algorithm. an important step to protect the diversity of clus-
ters. This paper adopts a self-contained adaptation
method to dynamically adjust the probability of inter-
action between PC and PM and further reduce the best
to choose the “best choice “. For example, half of probability.
the less suitable models are eliminated each time (3) Adaptive variability manipulation: mutation
the gene is operated. Genetic algorithms tend to manipulation is a heterogeneous transformation of
be highly adaptive models, but because of the lim- an individual specific gene, another important oper-
ited size of genetic algorithms, it may lead to more ation of biodiversity and an important component of
reproduction of the next generation of individuals analog gene algorithms, based on Metropolis crite-
above the average level of individual adaptation. This ria, which will affect the convergence behavior of the
will continue after some individuals have absolute whole algorithm. The usual annealing functions are
advantages in the individual group. Genetic algorithm as follows:
enhances this advantage, the community begins to Fast cooling: tk =␣/+k 1
meet, the individual becomes more and more simi- (Index decline: tk=␣t per centk-1
lar, the bad individual does not have more chance to Decline: tk =(1- k/K) t0
reproduce, finally, the population will break the dead- Logarithmic decrease (K number of decay steps):
lock, which causes the genetic algorithm to appear tk =␣/log (k + 1)
precocious. There are two strategies or ideas for
improving genetic algorithms:
The first is to maintain as much diversity as possi-
ble, or to ensure that the diversity of populations is not 5. Application of simulated annealing genetic
lost, as in small genetic algorithms, if the evolution algorithm in association rule mining
of genetic algorithms is not complete.
The second is that the loss of group diversity Genetic algorithm is a random search method
may occur during the evolution, but it provides based on biological natural selection and genetic
a mechanism to generate new forms of individual mechanism. The object is all individuals in the
participation in group evolution, thereby increasing population. The spatial parameters are encoded by
group diversity. First, the new methods of individu- effective search technology. The search ability is
als are used to increase the diversity of groups, which reviewed by genetic algorithm to find concentration
often combine other algorithms and genes to produce and frequency.
new individuals. First, the user problem information is processed
In this paper, the traditional genetic algorithm is through a predetermined processor, the information
improved, the chromosomes of immune mechanism is encoded as information with limited time, then the
are selected, and mutual adaptation and mutation are image is drawn for each attribute, and then the tem-
carried out according to the model to overcome the porary information table is detected in the database
precocious phenomenon of genetic algorithm. SQL the search engine, and then separated.
S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm 6739

5.1. Application steps 5.1.3. Selection operator of immune mechanism


The traditional roulette strategy is often immature.
5.1.1. Coding Therefore, this paper adopts the selection strategy
The coding of genetic algorithm is used to describe based on immunization mechanism. The probability
the feasible solution of the problem, that is, the fea- is as follows:
sible solution to decompose the spatial problem is ⎧
⎪ 1
transformed into the search space method which can ⎪
⎪ (1 − d) Individuals with the highest

⎪ M
be processed by genetic algorithm. In this paper, the ⎨
Pd = concentrations in the population
coding method is adopted. The selection value of each ⎪  


service attribute is represented by the number after the ⎪
⎪ 1 d2
⎩ 1+ Other individuals in groups
decimal point, the number after each decimal point M 1−d
represents a gene, connects a service attribute, and (22)
forms a decimal string.
The advantages of this approach are as follows: the
higher the value of individual adaptation, the greater
the possibility of PF adaptation, the greater the pos-
5.1.2. Design of fitness function sibility of choosing P, the greater the possibility of
The degree of adaptation is usually used to mea- individual selection (catalytic effect), and the higher
sure the degree of excellence of a group to achieve the degree of convergence of the accelerated algo-
or close to the optimization calculation. It is the basis rithm; the smaller the possibility of PF adaptation,
of the application of genetic algorithm. The adaptive the lower the degree of individual concentration, the
function is used to evaluate the degree of adaptation. lower the likelihood of commodity dependence and
The criteria for adaptive function are: the lower the likelihood of commodity selection.

⎧ 5.1.4. Adaptive crossover operator


⎨ aSupp(X) + βConf (X) Inte(X) ≥ 1
⎪ Cross operation is a process in which two matched
F (X) = 1 1 chromosomes exchange a part of the gene in some

⎩ Supp(X) + Conf (X) Inte(X) < 1
α β way, resulting in two new chromosomes. individu-
(21) als in this document, the intersection probability and

Fig. 5. Flowchart of association rules of simulated annealing genetic algorithm.


6740 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

the probability of polychlorinated triphenyl variabil- Table 3


ity were dynamically adjusted using various methods. Attribute value mapping results
When the degree of adaptation is different, if the pop- Month Temp RH Wind Rain Area
ulation difference is large, the heterogeneity is large, 3 4 2 2 1 4
the possibility of crossover and mutation is small, 3 4 2 2 1 4
3 3 3 2 1 4
and the population diversity is low, the adaptation 3 2 2 1 1 4
tends to converge or optimize locally, it is allowed to 3 2 2 1 1 4
change with the degree of adaptation. The possibility 3 2 2 2 1 4
3 4 2 2 1 4
of interlacing and mutation increases, thus effectively
3 3 2 2 1 4
preventing “premature” phenomena. The intersecting 3 2 1 2 1 4
possibilities of self-adaptation used in this document 3 3 2 2 1 4
are as follows: 3 2 2 1 1 4
3 4 2 2 1 3
⎧ 3 4 1 2 1 3
⎨Pc1 f’ < favg 3 4 2 2 1 3
Pc = f’ - favg (23) 3 2 2 2 1 3
⎩Pc1 − Pc2 f - f f’ ≥ favg 3 2 2 2 1 3
max avg
3 3 2 2 1 3
2 3 2 2 1 3
3 4 2 2 1 3
5.1.5. Adaptive mutation operator 2 2 2 4 1 3
Variation is a simulation of gene mutations in bio- 3 3 2 2 1 3
logical evolution. In this paper, the following adaptive 3 3 2 1 1 3
3 3 3 2 1 3
mutations are used: 1 2 3 1 1 3
⎧ 3 4 2 2 1 3
−f
⎨Pm1 − Pm2 ffmax−f
max avg
f  ≥ favg 4 2 2 2 1 3
Pm = (24) 3 3 2 2 1 3
⎩Pm1 f  < favg 3 2 3 3 1 3
3 4 2 2 1 3
3 3 3 1 1 3
... ... ... ... ... ...

5.1.6. Simulated annealing crossover and


mutation operation certain level, that is:
The basic idea of simulated annealing algorithm: ⎧
from the point of view of statistical physics, with the ⎨1 fi+1 ≥ fi
P=  (25)
decrease of temperature, the energy of matter will ⎩exp k(fi+1T −fi ) fi+1 < fi
gradually approach a lower state, and finally reach a

Fig. 6. Attribute value mapping results.


S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm 6741

Fig. 7. Breakdown1. Fig. 9. Breakdown3.

Table 4
Table of correspondence between arrays and attributes
A[1] A[2] A[3] A[4] A[5] A[6]
month temp RH wind rain Area

Table 5
Selected association rules mined
Rule code Parameters
002010 61.7% support; 100% confidence; 1.01 interest
320011 16.4% support; 51% confidence; 1.07 interest
332210 13.7% support; 51% confidence; 1.65 interest
002122 11.9% support; 54% confidence; 0.94 interest
Fig. 8. Breakdown2. 320210 17.9% support; 56% confidence; 0.97 interest
300013 11.6% support; 98% confidence; 1.00 interest
302110 17.8% support; 84% confidence; 1.17 interest
5.1.7. Extraction and evaluation of rules ... ... ... ...
If the average adaptation of nearby generations is
lower than a certain level, the flow algorithm of the
According to the algorithm described above, the
above rules can improve the simulated annealing gene
association rules are excavated as follows: see Table 5
algorithm in Fig. 5: see Fig. 5 for details.
for details.

5.2. Empirical analysis


6. Conclusion
In the actual coding process, the coding of each
This paper introduces the basic theory and devel-
attribute is added “0”, indicating that ownership is
opment of data extraction more systematically,
independent of other attributes (the user does not have
summarizes the methods, tools and techniques used
to worry about applying random production standards
in data acquisition, describes the data extraction
to genetic algorithms).
techniques in connection rules more completely, clas-
Table 3 attribute values have been converted to
sifies the mining techniques of connection rules, and
numeric values and appropriate attributes have been
introduces the steps of the top-down algorithm of
selected as needed, and the results of the database
traditional connection rules in detail.
tables have been plotted against the above ratios, as
follows: see Table 3, and Figs 6–9 shows attribute
value mapping results, Breakdown1, Breakdown2 Acknowledgment
and Breakdown3.
Coding through real sets facilitates the operation of Inner Mongolia Science and Technology Agency:
gene chromosomes. Tables 6 to 9 show the correspon- BeefNet-Construction of cloud platform for preci-
dence between populations see Table 4 for details. sion breeding and breeding system of beef cattle
6742 S. Ruifeng / Research on data mining system based on artificial intelligence and improved genetic algorithm

(NO: 2019GG350); Project Supported by Basic and classification approach Neural networks (IJCNN). In:
Foundation of Inner Mongolia Agricultural Univer- 2016 international joint conference on, IEEE 63(1)(2016),
5149–5155.
sity (NO: JC2013001). [15] A.A. Ewees, M.A. EL Aziz and A.E. Hassanien, Chaotic
multi-verse optimizer-based feature selection, Neural Com-
put Appl 31(4) (2019), 991–1006.
References [16] X. Fan and T. Tjahjad, A dynamic framework based on local
zernike moment and motion history image for facial expres-
sion recognition, Pattern Recognit 64(9) (2017), 399–406.
[1] J. Rafferty, et al., Automatic summarization of activities [17] C. Fuentes, V. Herskovic, I. Rodrı́guez, et al., A systematic
depicted in instructional videos by use of speech analysis. literature review about technologies for self-reporting emo-
In: Pecchia L et al (eds.) Ambient assisted living and daily tional information, J Ambient Intell Human Comput 8(3)
activities. Lecture notes in computer science. Springer, New (2017), 593–606.
York 35(8) (2014), 123–130. [18] R. Gross, I. Matthews, J. Cohn, T. Kanade and S. Baker,
[2] J. Rafferty, et al., NFC based provisioning of instructional Multi-PIE. In: 8th IEEE International Conference on auto-
videos to assist with instrumental activities of daily living. matic face & gesture recognition, Amsterdam 46(2) (2008),
In: 2014 36th annual international conference of the IEEE 1–8.
engineering in medicine and biology society, EMBC 56(8) [19] S.L. Happy, S. Member and A. Routray, Automatic facial
(2014), 4131–4134. expression recognition using features of salient facial
[3] J. Rafferty, L. Chen et al., Goal lifecycles and ontologi- patches, IEEE Trans Affect Comput 6(4) (2015), 1–12.
cal models for intention based assistive living within smart [20] H. Sikkandar and R. Thiyagarajan, Soft biometrics-based
environments, Comput Syst Sci Eng 30(1) (2015), 7–18. face image retrieval using improved grey wolf optimization,
[4] J. Rafferty, C. Nugent, et al., Automatic metadata generation IET Image Process 14(3) (2020), 451–461.
through analysis of narration within instructional videos, J [21] H.R. Kanan, K. Faez and M. Hosseinzadeh, Face recogni-
Med Syst 39(9) (2015), 1–7. tion system using ant colony optimization-based selected
[5] A.H. Shabani, J.S. Zelek and D.A. Clausi, Multi- features. In: Proceedings of the 2007 IEEE symposium on
ple scale-specific representations for improved human computational intelligence in security and defense applica-
action recognition, Pattern Recognit Lett 34(15) (2013), tions (CISDA 2007), IEEE 62(5) (2007), 57–62.
1771–1779. [22] N. Karaboga, A new design method based on artificial bee
[6] H. Yang and C. Meinel, Content based lecture video retrieval colony algorithm for digital IIR filters, J Frankl Inst 346(4)
using speech and video text information, IEEE Trans Learn (2009), 328–348.
Technol 7(2) (2014), 142–154. [23] V. Kazemi and J. Sullivan, One millisecond face alignment
[7] J.I. Ababneh and M.H. Bataineh, Linear phase FIR filter with an ensemble of regression trees, In: 2014 IEEE con-
design using p swarm optimization and genetic algorithms, ference on computer vision and pattern recognition 43(9)
Digital Signal Process 18(4) (2008), 657–668. (2014), 1867–1874.
[8] M.A.E. Aziz, A.A. Ewees and A.E. Hassanien, Multi- [24] A. Krizhevsky, I. Sutskever and G.E. Hinton, Image net
objective whale optimization algorithm for content- classification with deep convolutional neural networks, Adv
based image retrieval, Multimed Tools 77(4) (2018), Neural Inf Process Syst 82(6) (2012), 1097–1105.
26135–26172. [25] L. Pappula and D. Ghosh, Cat swarm optimization with nor-
[9] M.A.E. Aziz and A.E. Hassanien, Modified cuckoo search mal mutation for fast convergence of multimodal functions,
algorithm with rough sets for feature selection, Neural Com- Appl Soft Comput 66(2) (2018), 473–491.
put 29(4) (2018), 925–934. [26] Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature
[10] G. Boqing, Y. Wang, J. Liu and X. Tang, Automatic facial 521(3) (2015), 436–444.
expression recognition on a single 3D face by exploring [27] P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and
shape deformation, In: Proc. 17th ACM Int. Conf. Multimed I. Matthews, The extended Cohn–Kanade Dataset (CK+):
58(6) (2009), 569–572. a complete dataset for action unit and emotion-specified
[11] I. Buciu, C. Kotropoulos and I. Pitas, ICA and gabor expression, IEEE Comput Soc Conf Comput Vision Pattern
representation for facial expression recognition, In: Pro- Recogn 26(7) (2010), 1325–1338.
ceedings International Conference on Image Processing [28] M. Lyons, M. Kamachi and J. Gyoba, The Japanese Female
89(5) (2003), 855–858. Facial Expression (JAFFE) Database, Zenodo 10(5) (1998),
[12] H.T.Y. Chang, Facial expression recognition using a com- 235–249.
bination of multiple facial features and support vector [29] A. Mehrabian, Communication without words, Psychol
machine, Soft Comput 22(2) (2017), 4389–4405. Today 2(4) (1968), 53–56.
[13] S.C. Chu, P.W. Tsai and J.S. Pan, Cat Swarm Optimization, [30] M. Minsky and S. Papert, Perceptrons: an introduction
LNAI 3(1) (2006), 854–858. to computational geometry, MIT Press, Cambridge 78(3)
[14] M.J. Cossetin, J.C. Nievola and A.L. Koerich, Facial (1969), 780–782.
expression recognition using a pairwise feature selection

You might also like