An Analysis of Web User Behavior Using Hybrid Algorithm Based On Sequential Pattern Mining
An Analysis of Web User Behavior Using Hybrid Algorithm Based On Sequential Pattern Mining
2339-2346
© Research India Publications. http://www.ripublication.com
2339
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
Then, an algorithm is designed to remove repetitive patterns analyzing and understanding users’ behavior to improve the
in software and software influential nodes mining algorithm is quality of services offered by the World Wide Web
put forward to mine influential functions in software and to (WWW).
rank them by the rank-index. Finally, by comparatively
In 2013, Rahul Moriwal [25] - It presented a method for
analyzing the top ten functions got from PageRank and
Finding Frequent Sequential Traversal Patterns from Web
those from Degree-Based algorithm, the approach has
Logs which is based on Dynamic Weight Constraint, where
proven to be an effective and accurate one which combines
various frequent sequential pattern mining algorithms have
advantages of the these two classic algorithms.
been proposed that mines the set of frequent subsequences
Minubhai Chaudhari and Chirag Mehta propose a Prefixspan pattern which satisfying a min. support constraint in a
algorithm with GRC constraints which generates sequential particular session database. Though, previously sequential
patterns by using the prefix projected pattern growth approach pattern mining algorithms give equal weightage to sequential
is implemented. Other than frequency this algorithm also uses traversal patterns, whereas the pages in sequential patterns
gap, compactness and recency constraints during sequential have different importance and also have different weightage.
pattern mining process. The gap constraint applies limit on the Another problem in most of the frequent sequential pattern
separation of two consecutive transactions of discovering mining algorithms is that a large number of sequential patterns
patterns, recency constraint makes patterns to quickly adapt are generated, when min. support is lowered and here they do
the latest behaviors and compactness constraint make sure not have any alternative ways of adjusting the number of
reasonable time spans for the discovered patterns. sequential patterns other than increment in the minimum
support. The pages are given dissimilar weights and traversal
Fan Muhan, Shao Sujie, and Rui Lan propose a method for
sequences assign a min. and max. weight. For scanning a
miing the frequent closed patterns in a sliding window to
session database, max. and min. weight in the session
capture information timely and accurately when new data
database is utilized to cut infrequent sequential subsequence
stream arrives. Data stream is divided into several basic
and by this downward closure property is maintained.
windows. All possible frequent closed patterns are mined in
each basic window and be stored in a Closed Pattern - tree in Ketki Muzumdar, Ravi Mante, Prashant Chatur (IJRTE-2013)
the form of node compression to save space; As the data in proposed “Neural Network Approach for Web Usage Mining”
sliding window updates, Closed Pattern-tree can be in which Web usage mining tries to discover useful
incrementally updated and the infrequent or unclosed patterns knowledge from the secondary data obtained from the
will be deleted from the tree. connections of the users with the Web. Web usage mining has
become very dangerous for effective Web site management,
Doddegowda B J, G T Raju, Sunil Kumar S Manvi proposed a
business and support services, personalization, and network
Web Personalization process that adjusts information/
traffic flow analysis, etc. Earlier study on the Web usage
services delivered by a Web to the needs of each user or
mining using a concurrent Clustering, Neural based method
group of users, taking their behavioral patterns.
has shown that the practice trend analysis very much depends
on the performance of the clustering of the number of
requests. In this paper, a novel method Self Organizing Map is
introduced, which is a kind of neural network, in the process
of Web Usage Mining to detect user’s patterns. And analyze
the traditional K-Means algorithm result with comparison to
SOM. The process details the transformations necessaries to
modify the data stored in the Web Servers Log files to an
input of SOM.
C. Umapathi1, M. Aramuthan, proposed “Enhancing Web
Services Using Predictive Caching” where the exponential
Fig. 2: System Architecture growth of World Wide Web (WWW) users and network
traffic, the development of new services require high
bandwidth and also provides high perceived latency.
Frequent Sequential Patterns (FSPs) that are extracted Analyzing the behavior of a web site users also known as Web
from Web Usage Data (WUD) are very important for usage mining, is a research field which consists of adapting
2340
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
the data mining methods to the records of access log files. Web access pattern tree (WAP-tree) by recursively recreating
Web usage mining method provides information about the intermediate trees, started with suffix sequences and ended
activities of the client in order to extract relationships in the with prefix sequences. An effort has been made to modify
registered data. Each record in the log file contains the clients WAP tree method for improving efficiency. mWAP
IP address, the requested object and additional information completely remove the want to engage in numerous
such as the protocol of the request, size of object, etc. The log reconstruction of intermediate Web access pattern trees
file contains all the information of different users and (WAP-trees) during mining and considerably reduces time of
incomplete information irrelevant data, noise and errors that execution.
need to be filtered out. Hence, preprocessing is required. After
In 2014, Jerry Chun [26] proposed the prelarge concept is
preprocessing, the pattern is discovered and analyzed. Based
adopted to handle the discovered sequential patterns with
on this, pre-fetching techniques are used to improve the
sequence deletion. An FUSP tree is first built to keep only the
performance of web sites in turn reduce latency. Although, the
frequent 1-sequences from the original database. The prelarge
most essential element in web prefetching is the algorithm
1-sequences are also kept in a set for later maintenance
based on prediction. The effectiveness of pre-fetching is based
approach. When some sequences are deleted from the original
on the algorithm which in turn improves the performance of
database, the proposed algorithm is then performed to divide
the web. A long-standing use of caching tries to improve the
the kept frequent 1-sequences and prelarge 1-sequences from
quality of service perceived in web browsing. This paper
the original database and the mined 1-sequences from the
explains about the technique to perform prefetching. It also
deleted customer's sequences into three parts with nine cases.
implements Prefetch enhanced caching algorithm and
Each case is then processed by the designed algorithm to
provides experimental results.
maintain and update the built FUSP tree. When the number of
In 2013, Omar Zaarour, Mohamad Nagi [24] proposed an deleted customer sequences is smaller than the safety bound
improvement to the web log mining procedure and to the of the prelarge concept, the original customer’s sequences are
prediction of online navigational pattern. Their contribution unnecessary to be rescanned, but the sequential patterns can
contains three different components. First, they proposed for still be actually maintained and updated.
session identification, a refined time-out based heuristic,.
In 2016, Doddegowda [30] having approached to personalize
Secondly, suggested the practice for navigational pattern
the information available on the Web according to user
detection by using a specific density based algorithm. Finally,
requirements. This is called Web Personalization process that
a new method for efficient online prediction is also
adjusts information/services delivered by a Web to the
recommended. The conducted experiment shows the
needs of each user or group of users, taking their
applicability and effectiveness of the proposed method.
behavioral patterns. Frequent Sequential Patterns (FSPs)
Qingqing Gan, Torsten Suel proposed the techniques used to that are extracted from Web Usage Data (WUD) are very
optimize query processing performance. Author initial important for analyzing and understanding users’ behavior
contribution is the study of outcome caching as a weighted to improve the quality of services offered by the World Wide
caching problem. Mainly earlier work focused on optimizing Web (WWW). User behavioral patterns are required to
cache hit ratios, however, given that processing costs of build profiles for each user, using which Personalization
queries can differ very significantly and argue that overall cost of a website is made.
savings also need to be considered. They described and
In 2016, Minubhai [31] proposed a prefixspan algorithm with
evaluated some algorithms for weighted result caching, and
GRC constraints which generates sequential patterns by using
study the influence of Zipf-based query distributions on
the prefix projected pattern growth approach is implemented.
outcome caching. The next main work is a latest set of
Other than frequency this algorithm also uses gap,
feature-based cache eviction strategy to get significant
compactness and recency constraints during sequential pattern
improvements over all previous techniques, significantly
mining process. The gap constraint applies limit on the
narrowing the presented performance gap to the theoretically
separation of two consecutive transactions of discovering
optimal method. Finally, using the same approach, they also
patterns, recency constraint makes patterns to quickly adapt
acquire performance gains for the linked problem of inverted
the latest behaviors and compactness constraint make sure
list caching.
reasonable time spans for the discovered patterns.
Jatin D Parmar, Sanjay Garg proposed modified web access
In 2016, Fan Muhan [32] proposes a method for mining the
pattern (mWAP) method for sequential pattern mining. Web
frequent closed patterns in a sliding window to capture
access pattern (WAP), is the sequence of frequently accesses
information timely and accurately when new data stream
practice by users, practice is of interesting and useful
arrives. The data stream is divided into several basic windows.
information. Sequential Pattern mining is the process of using
All possible frequent closed patterns are mined in each basic
data mining method for a sequential database for discovering
window and be stored in a Closed Pattern - tree in sliding
the correlation relationships which presents with a structured
window updates, Closed Pattern-tree can be incrementally
list of events. Web access pattern tree mining is a sequential
updated and the infrequent or unclosed patterns will be
pattern mining process for web log access sequences, which
deleted from the tree.
first saves the original web access sequence database on a
prefix tree, alike to the frequent pattern tree (FP-tree) which In 2017, Bing Zhang [33] proposed a new approach to
stores non-sequential data. Web access pattern tree (WAP- efficiently mine influential functions based on software
tree) method, then, mines the frequent sequences from the execution sequence is proposed. First, the authors design a
2341
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
novel modeling strategy by which software execution traces In 2017, H. Ryang [34] propose a novel algorithm and list
are modeled as sequential patterns. Owing to loops, patterns structure for finding high utilization patterns over data
can occur multiple times in a trace, which leads to high cost of streams on the basis of a sliding window mode. Unlike
time and the extreme complexity of the research. Then, an existing algorithms, the proposed algorithm does not
algorithm is designed to remove repetitive patterns in software consume huge computational resources for verifying
and software influential nodes mining algorithm is put candidate patterns because it can avoid the generation of
forward to mine influential functions in software and to rank candidate patterns. Therefore, the algorithm efficiently
them by the rank-index. Finally, by comparatively analyzing works in complex dynamic systems.
the top ten functions got from Page Rank and those from
Degree-Based algorithm.
Extraction of Behavioral
It provide better services in The local conditional
Doddegowda B J, G Patterns from
the web of each user or probability distribution of
4 T Raju, Sunil Kumar Preprocessed Web
group of users for their each node, which is
S Manvi Usage Data for Web
behavioral patterns. calculated accordingly.
Personalization
IV. PROPOSED APPROACH at this time this tree having the web pages of website in
proportional sessions can access partially. To applied this
The proposed Profile based Closed Sequential Pattern Mining
method call the procedure Clustering_Method.
using SOM Clustering (PCSPSC) approach is next applied to
discovering frequent sequential patterns item in a cluster by The following Fig. 3 shows that the process of PCSPSC
using the Self Organization Map algorithm of Neural Network algorithm which generate useful closed sequential pattern
for producing the cluster of web data set. This cluster is used using web data.
to access the partial web data set not whose web data set. So
2342
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
2343
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
V. CONCLUSION REFERENCES
In this research it scans only partial database not the whole [1] R. Agrawal and R. Srikant, “Fast Algorithms for
database so that multiple scanning of the database will be Mining Association Rules in Large Databases”, in
reduced and response time is increased. It enhanced reflection Proc. Int. Conf. Very Large Data Bases, pp. 487–
of the importance of pages by using min-max weight and 499, 1994.
support of every page by using min-max weight of pages
[2] R. Agrawal and R. Srikant, “Mining Sequential
updating automatically by using web services.
Patterns”, In Proceedings of the 1995 International
It is identifying the user previous search subjects and topics so Conference on Data Engineering, pp. 3-14, 1995.
that current search will be more up to the point. So
[3] R. Agrawal and R. Srikant, “Mining Sequential
information gathered could be used to offer feedback to users
Patterns: Generalizations and Performance
on their use of the internet. It enables the effective tracking for
Improvements”, In Proceedings of the 5th
the development and also improvement of the user interface in
International Conference on Extending Database
software by analyzing user behavior.
Technology, pp. 3-17, Avignon, France, 1996.
In future work, other data mining algorithms can be
[4] M. N. Garofalakis, R. Rastogi, K. Shim, “SPIRIT:
implemented in cloud to efficiently handle big data of many
Sequential Pattern Mining with Regular
Hospital website in a distributed environment for finding any
Expression Constraints”, In Proceedings of 25th
critical diseases.
VLDB Conference, pp. 223-234, San Francisco,
California, 1999.
2344
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
[5] M. J. Zaki, “SPADE: An Efficient Algorithm for Trees”, Expert System Application, vol. 34, no. 4,
Mining Frequent Sequences”, Machine Learning pp. 2424–2435, May 2008.
Journal, Vol. 42, Issue (1-2), pp. 31-60, 2001.
[18] Krzysztof D., Wojciech K., Marcin S., “Effective
[6] T.-P. Hong, C.-Y. Wang, and Y.-H. Tao, “A New Prediction of Web User Behaviour with User-Level
Incremental Data Mining Algorithm using Pre-large Models”, Fundamental Informatics, IOS Press , Vol.
Itemsets”, Intell. Data Anal., vol. 5, no. 2, pp. 111– 89, No. 2-3, pp. 189, 2008.
129, Apr. 2001.
[19] K. R. Suneetha, Dr. K. R. Krishnamoorthy,
[7] Jian Pei, Jiawei Han and Helen Pinto, “PrefixSpan: “Identifying User Behavior by Analyzing Web
Mining Sequential Patterns Efficiently by Prefix- Server Access Log File”, IJCSNS International
Projected Pattern Growth”, In Proceedings of 12th Journal of Computer Science and Network Security,
International Conference on Data Engineering, pp. Vol. 9, No.4, pp. 327, 2009.
215-224, Heidelberg, Germany, 2001.
[20] T.-P. Hong, C.-W. Lin, and Y.-L. Wu, “Maintenance
[8] Freire J., Kumar B., and Lieuwen D., “WebViews: of Fast updated Frequent Pattern Trees for Record
Accessing Personalized Web Content and Services”, Deletion”, Comput. Statist. Data Analysis, vol. 53,
In Proceedings of the Tenth International World no. 7, pp. 2485–2499, May 2009.
Wide Web Conference, 2001.
[21] Dhirendra Kumar Jha, Anil Rajput, Manmohan
[9] Antunes, A. L. Oliveira, “Generalization of Pattern- Singh. & Archana Tomar, (2010) “An Efficient
growth Methods for Sequential Pattern Mining with Model for Information Gain of Sequential Pattern
Gap Constraints”, Machine Learning and Data from Web Logs based on Dynamic Weight
Mining in Pattern Recognition, Third International Constraint”, IEEE International Conference on
Conference, MLDM 2003, Leipzig, Germany, July Computer Information Systems and Industrial
5-7, 2003, Proceedings 2003. Management Applications, pp. 518-523.
[10] J. Han, J. Pei, Y. Yin, and R. Mao, “Mining Frequent [22] C.-W. Lin, T.-P. Hong, and W.-H. Lu, “An Effective
Patterns without Candidate Generation: A Frequent- Tree Structure for Mining High Utility Itemsets”,
Pattern Tree Approach”, Data Mining Knowledge Expert System Application, vol. 38, no. 6, pp. 7419–
Discovery, vol. 8, no. 1, pp. 53–87, 2004. 7424, Jun. 2011.
[11] Show-Jane Yen and Yue-Shi Lee, “Mining [23] C.-W. Lin and T.-P. Hong, “A New Mining
Sequential Patterns with Item Constraints”, DaWaK Approach for Uncertain Databases using CUFP
2004: data warehousing and knowledge discovery: Trees”, Expert System Application, vol. 39, no. 4,
International conference on data warehousing and pp. 4084–4093, Mar. 2012
knowledge discovery, Zaragoza, ESPAGNE, vol.
[24] Omar Zaarour, Mohamad Nagi, “Effective Web Log
3181, pp. 381-390, 2004.
Mining and Online Navigational Pattern Prediction",
[12] Rigou, M., Sirmakessis, S., and Tsakalidis, A. K., “A ELSEVIER, 2013.
Computational Geometry Approach to Web
[25] Rahul Moriwal and Vijay Prakash, “An Efficient
Personalization”, In Proceedings of CEC, pp. 377-
Algorithm for Finding Frequent Sequential Traversal
380, 2004.
Patterns from Web Logs based on Dynamic Weight
[13] J. Pei et al., “Mining Sequential Patterns by Pattern- Constraint”, Proceedings of the Third International
Growth: The Prefix Span Approach”, IEEE Trans. Conference on trends in Information,
Knowledge Data Eng., vol. 16, no. 11, pp. 1424– Telecommunication and Computing, Vol. 150, 2013.
1440, Nov. 2004.
[26] Jerry Chun, Wensheng Gan, Tzung Pei Hong,
[14] P. Berkhin, “A Survey of Clustering Data Mining “Efficiently Maintaining the Fast Updated Sequential
Techniques”, in Grouping Multidimensional Data. Pattern Trees With Sequence Deletion”, IEEE
Berlin, Germany: Springer-Verlag, 2006, pp. 25–71. Access - The Journal for Rapid open access
publishing, Vol. 2, pp. 1374-1383, 2014.
[15] Yen-Liang Chen, Ya-Han Hu, “The Consideration of
Recency and Compactness in Sequential Pattern [27] Sahu S., Saurabh P. and Rai S. ,An enhancement in
Mining”, In Proceedings of the second workshop on clustering for sequential pattern mining through
Knowledge Economy and Electronic Commerce, neural algorithm using Web logs Proceedings of
Vol. 42, Iss. 2 , pp. 1203-1215, 2006. International Conference on Computational
Intelligence and Communication Networks 758-764
[16] Jian Pei, Jiawei Han, Wei Wang, “Constraint-based
IEEE Press, 2014.
Sequential Pattern Mining : The Pattern Growth
Methods”, J Intell Inf Syst, Vol. 28, No.2, pp. 133 - [28] Dmitriy Fradkin, Fabian Mörchen, “Mining
160 , 2007. Sequential Patterns for Classification”, Vol. 45, Issue
3, pp 731- 749, December 2015.
[17] T.-P. Hong, C.-W. Lin, and Y.-L. Wu,
“Incrementally Fast Updated Frequent Pattern
2345
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 10 (2019) pp. 2339-2346
© Research India Publications. http://www.ripublication.com
2346