Study On Application of Apriori Algorithm in Data Mining
Study On Application of Apriori Algorithm in Data Mining
mining; information
extraction; Apriori
I
INT RODUCTION
Although modern computer technology and database
technology have been developed rapidly, could support
the store and quickly retrieve the grand scale databases or
data warehouses, but these techniques was only to gather
these "massive" data, and not to effectively organize and
use the knowledge hidden them, which eventually led to
todays phenomenon of "rich data, poor knowledge". The
emergence of data mining technology met people needs.
The technology involved in artificial intelligence, machine
learning, statistical analysis and other technologies, and it
makes decision analysis into a new stage. In this paper the
association rule min ing algorithm - Apriori algorith m
which is common ly used in data mining is mainly
discussed.
Tid
T1
T2
T3
T4
T5
T6
T7
T8
T9
Item list
I1,I2,I5
I2,I4
I2,I3
I1,I2,I4
I1,I3
I2,I3
I1,I3
I1,I2,I3,I5
I1,I2,I3
C1
L1
107
111
C1
Compare
with
minimum
support
count
C2
Item sets
{I1,I2}
{I1,I3}
{I1,I4}
{I1,I5}
{I2,I3}
{I2,I4}
{I2,I5}
{I3,I4}
{I3, I5}
{I4,I5}
C3
item sets
{I1,I2,I3}
{I1,I2,I5}
{I1,I3,I5}
{I2,I3,I4}
{I2,I3,I5}
{I2,I4,I5}
Scan
calcul
ate
suppo
rt
count
delete
item set
with
subset not
belong to
L2
Item sets
{I1,I2}
{I1,I3}
{I1,I4}
{I1,I5}
{I2,I3}
{I2,I4}
{I2,I5}
{I3,I4}
{I3, I5}
{I4,I5}
Support count
4
4
1
2
4
2
2
0
1
0
C3
item sets
{I1,I2,I3}
{I1,I2,I5}
Compare
L3
with
minimum item sets support count
2
support {I1,I2,I3}
2
count {I1,I2,I5}
Figure1.
C2
C3=L2L2={{I1,I2,I3},{I1,I2,I5},{I1,I3,I5},{I2,I3,I4
},{I2,I3,I5},{I2,I3,I5},{I2,I3,I5}
Item set {I1, I3, I5} has a 2-item subset {I3, I5} which
is not a L2 element, thus it is not frequent, from C3 to
remove {I1, I3, I5}, empathy delete{I2,I3,I4}, {I2,I3,I5},
{I2,I4,I5}. Thus C3= {{I1, I2, I3}, {I1, I2, I5}}. For each
item in the C3, scanning the transaction database and
calculate its support count. And then it will be co mpared
with the min imu m support count 2, to determine whether
the frequency, and to determine frequent 3-itemset L3.
5) To find frequent 4-itemsets L4.
Using L3 to generate aggregation C4 of candidate
4-itemsets and C4 = L3 L3= {{I1, I2, I3, I5}}. The
3-subset {I2, I3, I5} of item set {I1, I2, I3, I5} does not
belong to L3, thus {I1, I2, I3, I5} will be removed.
Therefore, C4=, the algorith m is ended and all the
frequent item sets are found.
L2
Compar
e with
minimu
m
support
count
scan D,
calculate
support
count
Item sets
{I1,I2}
{I1,I3}
{I1,I5}
{I2,I3}
{I2,I4}
{I2,I5}
item sets
{I1,I2,I3}
{I1,I2,I5}
C4
item sets
{I1,I2,I3,I5}
Support count
4
4
2
4
2
2
C3
support count
2
2
delete
item set C4
with
subset not item sets
belong to
L3
confidence( X Y ) G ( X Y ) / G x u 100%
sup port ( X Y ) / sup port ( X ) u 100%
sup port (l )
t min confidence,
sup port ( s)
then to generate
112
108
CONCLUSION
1995:175-186
[2] Fayyad U M, Piatetsky-Shapiro G, Smyth p, The
KDD process for extracting useful knowledge
from volumes of data[J], Communication of the
ACM, 1996. 39 (11)
[3] D. W. Cheung, J. Han. V Ng, Maintenance of
discovered association rules in large databases: An
incremental updating technique, In proc.1996 Int.
Conf. Data Engineering
[4] E. H. Han, G Kary Pis, V Kumar, Scalable parallel
data mining for associate rules. In Proc.1997,
ACM-SIGMOD Int. Conf. Management of Data
[5] J. Han, Y Fu, Discovery of multiple-level
association rules from large databases. In
Proc.1995 Int. Conf. Very Large Data Bases
114
110