Week+1+-+Part+2
Week+1+-+Part+2
• Illustrate the main data sources of data mining: Relational database, Data warehouse, and
Transactional data
[email protected]
DLZNK464L9
• Explain the evoluation methods, major issues, and history of data mining
[email protected]
DLZNK464L9
Data Mining
● Examples:
○ Buys(X, “computer) => buys(X, “software”)
○ https://www.stat.auckland.ac.nz/~wild/d2i/articles/4.5.Association%20and%20Correlation_A
RTICLE.pdf
○ Credit card fraud detection, direct marketing, classifying web-pages, and other entities, …
● A lot of predictive problems, if not all, can be formulated as classification or regression problems
● Some machine learning methods are best for classification or regression tasks, others can perform
[email protected]
both tasks (e.g. ANN (Artificial neural networks), SVM (Support vector machine), decision trees etc.).
DLZNK464L9
● Group data to form categories (i.e., clusters), e.g., cluster houses to find distribution patterns
● Principle:
○ Maximizing intra-group similarity & minimizing inter-group similarity
[email protected]
DLZNK464L9
● Avoid using “classes” when refer to “clusters”
● Example:
○ Outlier: A data object that does not comply with the general behavior of the data
○ Noise or exception? ― One person’s garbage could be another person’s treasure
○ Methods:
[email protected]
■ Statistical tests
DLZNK464L9
■ Classification
• We discussed Cluster analysis, unsupervised learning used to group data, to form categories(i.e.
clusters).
• We discussed the Outlier analysis to identify the anomalous observation in the dataset.
[email protected]
DLZNK464L9
Data Mining
High-Performance
Information Retrieval Applications
Computing
● Biological and medical data analysis: classification, cluster analysis (microarray data analysis),
[email protected]
● From major dedicated data mining systems/tools (e.g., SAS, MS SQL-Server Analysis Manager, Oracle
Data Mining Tools) to invisible data mining (=data mining built into regular functional components,
running all the time, often unbeknownst to the user)
• We discussed various applications of Data Mining, such as Web page analysis, Collaborative analysis &
recommender systems, etc.
[email protected]
DLZNK464L9
(KDD’95-98)
○ Journal of Data Mining and Knowledge Discovery (1997)
● ACM SIGKDD conferences since 1998 and SIGKDD Explorations
● More conferences on data mining
○ PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc.
● ACM Transactions on KDD starting in 2007
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Mining: Conferences and Journals
KDD Conferences Other related conferences
● ACM SIGKDD Int. Conf. on ● DB conferences: ACM SIGMOD,
Knowledge Discovery in Databases VLDB, ICDE, EDBT, ICDT, …
and Data Mining (KDD) ● Web and IR conferences: WWW,
● SIAM Data Mining Conf. (SDM)
SIGIR, WSDM
● (IEEE) Int. Conf. on Data Mining
● ML conferences: ICML, NIPS
(ICDM)
[email protected]
● PR conferences: CVPR,
DLZNK464L9
[email protected]
DLZNK464L9
Summarize the main data sources of data mining: Relational database, Data warehouse, and
Transactional data
[email protected]
DLZNK464L9
Examine the evaluation methods, major issues, and history of data mining.