0% found this document useful (0 votes)
4 views

01 Intro

CSE5243 is an introductory course on data mining offered at The Ohio State University, covering fundamental concepts, techniques, and applications of data mining. The course includes lectures, homework, projects, and exams, with a focus on understanding data mining processes, types of data, and various mining functions such as classification and clustering. Recommended textbooks and additional resources are provided to support students in their learning journey.

Uploaded by

ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

01 Intro

CSE5243 is an introductory course on data mining offered at The Ohio State University, covering fundamental concepts, techniques, and applications of data mining. The course includes lectures, homework, projects, and exams, with a focus on understanding data mining processes, types of data, and various mining functions such as classification and clustering. Recommended textbooks and additional resources are provided to support students in their learning journey.

Uploaded by

ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

CSE5243 INTRO.

TO DATA MINING

Chapter 1. Introduction
Yu Su, CSE@The Ohio State University

Slides adapted from UIUC CS412 by Prof. Jiawei Han and OSU CSE5243 by
Prof. Huan Sun
CSE 5243. Course Page & Schedule
¨ Class Homepage:
https://ysu1989.github.io/courses/sp20/cse5243/

¨ Class Schedule:
9:35-10:55 AM, Wed/Fri, Caldwell Lab 171

¨ Office hours:
¤ Instructor: Yu Su @ DL783, Fri 11:00am-12:15pm (right after class)
First week: No office hours

¤ TA: Jiaqi Xu (xu.1629), Wed 03:00pm-04:00pm, Baker 406

2
CSE 5243. Textbook
¨ Recommended but not required
¨ (Primary) Jiawei Han, Micheline Kamber and Jian Pei, Data
Mining: Concepts and Techniques (3rd ed), 2011
¤ More resources:
https://wiki.illinois.edu/wiki/display/cs412/2.+Course+Syllabus+and
+Schedule
¨ (Primary) Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Introduction to Data Mining, 2006
¨ (Supplementary) Mohammed J. Zaki and Wagner Meira, Jr.,
Data Mining Analysis and Concepts, 2014
¨ (Supplementary) Jure Leskovec, Anand Rajaraman, Jeff Ullman,
Mining of Massive Datasets
¤ More resources: http://www.mmds.org/
3
CSE 5243. Course Work and Grading

¨ Homework, Course Projects, and Exams


¤ Participation: 10% (Online discussion and/or class participation)
¤ Homework: 50% (No Late Submissions!)
¤ Midterm exam: 20%
¤ Final exam: 20%

¨ Need help and/or discussions?


¤ Carmen: https://osu.instructure.com/courses/76423/discussion_topics
n Receive credits: answer questions on Carmen and engage in class discussion.

¨ Check your homework/exam scores


¤ Carmen: https://osu.instructure.com/courses/76423/gradebook

4
Videos

¨ 10 TED talks on Big Data and Analytics


¤ https://www.promptcloud.com/blog/top-ted-talks-on-big-data/
¤ Shyan Sanker (Director at Palantir Technologies):
n https://www.youtube.com/watch?time_continue=19&v=ltelQ3iKybU
¨ 5 TED talks on Data analytics for business leaders
¤ https://bigdata-madesimple.com/5-best-ted-talks-on-data-analytics-for-business-
leaders/
¨ Data analytics for beginners
¤ https://www.youtube.com/watch?v=66ko_cWSHBU (If you love sports, this
TED Talk on data analytics is going to be an interesting watch)

5
Chapter 1. Introduction
¨ What is Data Mining?
¨ Why Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
6
What is Data Mining?
¨ Data mining (knowledge discovery from data, KDD)
¤ Extraction of interesting (non-trivial, implicit, previously
unknown, and potentially useful) patterns or knowledge
from huge amount of data

¨ Alternative names
¤ Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.

7
What is Data Mining?
¨ Data mining (knowledge discovery from data, KDD)
¤ Extraction of interesting (non-trivial, implicit, previously
unknown, and potentially useful) patterns or knowledge
from huge amount of data

¨ Alternative names
¤ Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
One of the best conferences to publish your research work:
SIGKDD (check resources)
8
Knowledge Discovery (KDD) Process
¨ (Narrow view) Data mining plays
Pattern
an essential role in the
Evaluation
knowledge discovery process
Data
¨ (Broad view) Data mining is the
Mining
knowledge discovery process
Task-relevant Data

Data Warehouse Selection

Data Cleaning
Data Integration

Databases
9
Example: A Web Mining Framework
¨ Web mining usually involves
¤ Data crawling and cleaning
¤ Data integration from multiple sources
¤ (Optional) Warehousing the data
¤ (Optional) Data cube construction
¤ Data selection for data mining
¤ Data mining
¤ Presentation of the mining results
¤ Patterns and knowledge to be used or stored into
knowledge base
10
KDD Process: A View from ML and Statistics

Data
Input Data Data Post-
Pre- Processing
Processing Mining

Data integration Pattern discovery Pattern evaluation


Normalization Classification Pattern selection
Feature selection Clustering
Pattern interpretation
Outlier analysis
Dimension Pattern visualization
reduction …………

¨ This is a view from typical machine learning and statistics


communities
12
Data Science

Figure from: https://www.datasciencecentral.com/profiles/blogs/difference-


13 of-data-science-machine-learning-and-data-mining
Chapter 1. Introduction
¨ What Is Data Mining?
¨ Why Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
14
Why Data Mining?
¨ The Explosive Growth of Data: from terabytes to petabytes
¤ Data collection and data availability
n Automated data collection tools, database systems, Web,
computerized society

15
Why Data Mining?
¨ The Explosive Growth of Data: from terabytes to petabytes
¤ Data collection and data availability
n Automated data collection tools, database systems, Web,
computerized society
¤ Major sources of data
n Business: Web, e-commerce, transactions, stocks, …
n Science: Remote sensing, bioinformatics, scientific simulation, …
n Society and everyone: news, digital cameras, YouTube

16
“How much data is generated each day?” – World Economic Forum
17
Why Data Mining?
¨ The Explosive Growth of Data: from terabytes to petabytes
¤ Data collection and data availability
n Automated data collection tools, database systems, Web,
computerized society
¤ Major sources of data
n Business: Web, e-commerce, transactions, stocks, …
n Science: Remote sensing, bioinformatics, scientific simulation, …
n Society and everyone: news, digital cameras, YouTube
¨ We are drowning in data, but starving for knowledge!
¨ “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
18
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
19
Multi-Dimensional View of Data Mining
¨ Data to be mined
¤ Database data (extended-relational, object-oriented, heterogeneous), data
warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text
and web, multi-media, graphs & social and information networks

20
Multi-Dimensional View of Data Mining
¨ Data to be mined
¤ Database data (extended-relational, object-oriented, heterogeneous), data
warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text
and web, multi-media, graphs & social and information networks
¨ Knowledge to be mined (or: Data mining functions)
¤ Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, …
¤ Descriptive vs. predictive data mining
¤ Multiple/integrated functions and mining at multiple levels

21
Multi-Dimensional View of Data Mining
¨ Data to be mined
¤ Database data (extended-relational, object-oriented, heterogeneous), data
warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text
and web, multi-media, graphs & social and information networks
¨ Knowledge to be mined (or: Data mining functions)
¤ Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, …
¤ Descriptive vs. predictive data mining
¤ Multiple/integrated functions and mining at multiple levels
¨ Techniques utilized
¤ Data warehousing (OLAP), machine learning, statistics, pattern recognition,
visualization, high-performance computing, etc.

22
Multi-Dimensional View of Data Mining
¨ Data to be mined
¤ Database data (extended-relational, object-oriented, heterogeneous), data
warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text
and web, multi-media, graphs & social and information networks
¨ Knowledge to be mined (or: Data mining functions)
¤ Characterization, discrimination, association, classification, clustering,
trend/deviation, outlier analysis, …
¤ Descriptive vs. predictive data mining
¤ Multiple/integrated functions and mining at multiple levels
¨ Techniques utilized
¤ Data warehousing (OLAP), machine learning, statistics, pattern recognition,
visualization, high-performance computing, etc.
¨ Applications adapted
¤ Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market
analysis, text mining, Web mining, etc.

23
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
24
Data Mining: On What Kinds of Data?
¨ Database-oriented data sets and applications
¤ Relational database, data warehouse, transactional database
¤ Object-relational databases, Heterogeneous databases and legacy
databases

¨ Advanced data sets and advanced applications


¤ Data streams and sensor data
¤ Time-series data, temporal data, sequence data (incl. bio-sequences)
¤ Structure data, graphs, social networks and information networks
¤ Spatial data and spatiotemporal data
¤ Multimedia database
¤ Text databases

25 ¤ The World-Wide Web


Survey

Your Name, ID, Major

Question 1: What do you think Data Mining is?

Question 2: What project have you done so far that you think is most relevant to
Data Mining?
• Not necessarily research project; can be your course project or any hackathon
event you participated in.

Question 3: What do you expect to learn from this course?

Briefly answer each question with a few sentences.

26
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
27
Data Mining Functions: Pattern Discovery

¨ Frequent patterns
¤ What items do you frequently purchase together on Amazon?

28
Data Mining Functions: Pattern Discovery

¨ Frequent patterns
¤ What items do you frequently purchase together on Amazon?
¨ Association and Correlation Analysis

29
Data Mining Functions: Pattern Discovery

¨ Frequent patterns
¤ What items do you frequently purchase together on Amazon?
¨ Association and Correlation Analysis

q A typical association rule


q Diaper à Beer [0.5%, 75%] (support, confidence)
q How to mine such patterns and rules efficiently in large datasets?
q How to use such patterns for classification, clustering, and other applications?
q More: friend recommendation, motif discovery, malware detection, fraud
30 detection, etc.
Data Mining Functions: Classification
¨ Classification and label prediction
¤ Construct models (functions) based on some training examples
¤ Describe and distinguish classes or concepts for future prediction
n Ex. 1. Classify countries based on (climate)
n Ex. 2. Classify cars based on (gas mileage)
¤ Predict some unknown class labels

31
Data Mining Functions: Classification
¨ Classification and label prediction
¤ Construct models (functions) based on some training examples
¤ Describe and distinguish classes or concepts for future prediction
n Ex. 1. Classify countries based on (climate)
n Ex. 2. Classify cars based on (gas mileage)
¤ Predict some unknown class labels
¨ Typical methods
¤ Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …

32
Data Mining Functions: Classification
¨ Classification and label prediction
¤ Construct models (functions) based on some training examples
¤ Describe and distinguish classes or concepts for future prediction
n Ex. 1. Classify countries based on (climate)
n Ex. 2. Classify cars based on (gas mileage)
¤ Predict some unknown class labels
¨ Typical methods
¤ Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
¨ Typical applications:
¤ Credit card fraud detection, direct marketing, classifying stars,
diseases, web pages, …

33
Data Mining Functions: Cluster Analysis
¨ Unsupervised learning (i.e., Class
label is unknown)
¨ Group data to form new
categories (i.e., clusters), e.g.,
cluster houses to find distribution
patterns

34
Data Mining Functions: Cluster Analysis
¨ Unsupervised learning (i.e., Class
label is unknown)
¨ Group data to form new
categories (i.e., clusters), e.g.,
cluster houses to find distribution
patterns
¨ Principle: Maximizing intra-class
similarity & minimizing interclass
similarity
¨ Many methods and applications

35
Data Mining Functions: Outlier Analysis
¨ Outlier analysis
¤ Outlier: A data object that does not comply with the
general behavior of the data
¤ Noise or exception?―One person’s garbage could
be another person’s treasure

36
Data Mining Functions: Outlier Analysis
¨ Outlier analysis
¤ Outlier: A data object that does not comply with the
general behavior of the data
¤ Noise or exception?―One person’s garbage could
be another person’s treasure
¤ Methods: by product of clustering or regression
analysis, …
¤ Useful in fraud detection, rare events analysis

37
Data Mining Functions: Time and Ordering:
Sequential Pattern, Trend and Evolution Analysis
¨ Sequence, trend and evolution analysis
¤ Trend, time-series, and deviation analysis

n e.g., regression and value prediction


¤ Sequential pattern mining

n e.g., buy digital camera, then buy large


memory cards
¤ Periodicity analysis

¤ Motifs and biological sequence analysis

n Approximate and consecutive motifs


¤ Similarity-based analysis

¨ Mining data streams


¤ Ordered, time-varying, potentially infinite, data
streams
38
Data Mining Functions: Structure and
Network Analysis
¨ Graph mining
¤ Finding frequent subgraphs (e.g., chemical compounds), trees (XML),
substructures (web fragments)

39
Data Mining Functions: Structure and
Network Analysis
¨ Graph mining
¤ Finding frequent subgraphs (e.g., chemical compounds), trees (XML),
substructures (web fragments)
¨ Information network analysis
¤ Social networks: actors (objects, nodes) and relationships (edges)
n e.g., author networks in CS, terrorist networks
¤ Multiple heterogeneous networks
n A person could be multiple information networks: friends, family,
classmates, …
¤ Knowledge graphs: knowledge backbone of AI systems

40
Data Mining Functions: Structure and
Network Analysis
¨ Graph mining
¤ Finding frequent subgraphs (e.g., chemical compounds), trees (XML),
substructures (web fragments)
¨ Information network analysis
¤ Social networks: actors (objects, nodes) and relationships (edges)
n e.g., author networks in CS, terrorist networks
¤ Multiple heterogeneous networks
n A person could be multiple information networks: friends, family,
classmates, …
¤ Knowledge graphs: knowledge backbone of AI systems
¨ Web mining
¤ Web is a big information network: from PageRank to Google
¤ Analysis of Web information networks
n Web community discovery, opinion mining, usage mining, …
41
Evaluation of Knowledge
¨ Are all mined knowledge interesting?
¤ One can mine tremendous amounts of “patterns”
¤ Some may fit only certain dimension space (time, location,
…)
¤ Some may not be representative, may be transient, …

42
Evaluation of Knowledge
¨ Are all mined knowledge interesting?
¤ One can mine tremendous amount of “patterns”
¤ Some may fit only certain dimension space (time, location, …)
¤ Some may not be representative, may be transient, …
¨ Evaluation of mined knowledge → directly mine only interesting
knowledge?
¤ Descriptive vs. predictive
¤ Coverage
¤ Typicality vs. novelty
¤ Accuracy
¤ Timeliness
¤ …
43
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
44
Data Mining: Confluence of Multiple Disciplines

Pattern
Machine Statistics
Recognition
Learning

Applications Data Mining Visualization

Algorithm High-Performance
Database Computing
Technology

45
Why Confluence of Multiple Disciplines?
¨ Tremendous amount of data
¤ Algorithms must be scalable to handle big data

¨ High-dimensionality of data
¤ Micro-array may have tens of thousands of dimensions

¨ High complexity of data


¤ Data streams and sensor data

¤ Time-series data, temporal data, sequence data

¤ Structure data, graphs, social and information networks

¤ Spatial, spatiotemporal, multimedia, text and Web data

¤ Software programs, scientific simulations

¨ New and sophisticated applications


46
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
47
Applications of Data Mining
¨ Web page analysis: classification, clustering, ranking
¨ Collaborative analysis & recommender systems
¨ Biological and medical data analysis
¨ Data mining and software engineering
¨ Data mining and text analysis
¨ Data mining and social and information network analysis
¨ Built-in (invisible data mining) functions in Google, MS, Yahoo!,
Linked, Facebook, …

48
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
49
Major Issues in Data Mining (1)
¨ Mining Methodology
¤ Mining various and new kinds of knowledge
¤ Mining knowledge in multi-dimensional space
¤ Data mining: An interdisciplinary effort
¤ Boosting the power of discovery in a networked environment
¤ Handling noise, uncertainty, and incompleteness of data
¤ Pattern evaluation and pattern- or constraint-guided mining

50
Major Issues in Data Mining (1)
¨ Mining Methodology
¤ Mining various and new kinds of knowledge
¤ Mining knowledge in multi-dimensional space
¤ Data mining: An interdisciplinary effort
¤ Boosting the power of discovery in a networked environment
¤ Handling noise, uncertainty, and incompleteness of data
¤ Pattern evaluation and pattern- or constraint-guided mining
¨ User Interaction & Human-Machine Collaboration
¤ Interactive mining
¤ Incorporation of background knowledge
¤ Presentation and visualization of data mining results
51
Major Issues in Data Mining (2)
¨ Efficiency and Scalability
¤ Efficiency and scalability of data mining algorithms
¤ Parallel, distributed, stream, and incremental mining methods
¨ Diversity of data types
¤ Handling complex types of data
¤ Mining dynamic, networked, and global data repositories
¨ Data mining and society
¤ Social impacts of data mining
¤ Privacy-preserving data mining

52
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
53
A Brief History of Data Mining Society
¨ 1989 IJCAI Workshop on Knowledge Discovery in Databases
¤ Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W.
Frawley, 1991)
¨ 1991-1994 Workshops on Knowledge Discovery in Databases
¤ Advances in Knowledge Discovery and Data Mining (U. Fayyad, G.
Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)
¨ 1995-1998 International Conferences on Knowledge Discovery in
Databases and Data Mining (KDD’95-98)
¤ Journal of Data Mining and Knowledge Discovery (1997)
¨ ACM SIGKDD conferences since 1998 and SIGKDD Explorations
¨ More conferences on data mining
¤ PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM
(2001), WSDM (2008), etc.
¨ ACM Transactions on KDD (2007)
54
Conferences and Journals on Data Mining
¨ KDD Conferences n Other related conferences
¤ ACM SIGKDD Int. Conf. on n DB conferences: ACM SIGMOD,

Knowledge Discovery in VLDB, ICDE, EDBT, ICDT, …


Databases and Data Mining (KDD) n Web and IR conferences:

¤ SIAM Data Mining Conf. (SDM) WWW, SIGIR, WSDM


¤ (IEEE) Int. Conf. on Data Mining n ML conferences: ICML, NIPS

(ICDM) n PR conferences: CVPR,

¤ European Conf. on Machine n Journals


Learning and Principles and n Data Mining and Knowledge
practices of Knowledge Discovery Discovery (DAMI or DMKD)
and Data Mining (ECML-PKDD) n IEEE Trans. On Knowledge and

¤ Pacific-Asia Conf. on Knowledge Data Eng. (TKDE)


Discovery and Data Mining n KDD Explorations
(PAKDD) n ACM Trans. on KDD
¤ Int. Conf. on Web Search and
55 Data Mining (WSDM)
Where to Find References? DBLP, CiteSeer, Google

¨ Data mining and KDD (SIGKDD)


¤ Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.
¤ Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM
TKDD

¨ Database systems (SIGMOD)


¤ Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT,
DASFAA
¤ Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc.

¨ AI & Machine Learning


¤ Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory),
CVPR, NIPS, etc.
¤ Journals: Machine Learning, Artificial Intelligence, Knowledge and
Information Systems, IEEE-PAMI, etc.
56
Where to Find References? DBLP, CiteSeer, Google

¨ Web and IR
¤ Conferences: SIGIR, WWW, CIKM, etc.
¤ Journals: WWW: Internet and Web Information Systems

¨ Statistics
¤ Conferences: Joint Stat. Meeting, etc.
¤ Journals: Annals of statistics, etc.

¨ Visualization
¤ Conference proceedings: CHI, ACM-SIGGraph, etc.
¤ Journals: IEEE Trans. visualization and computer graphics, etc.

57
Future of Data Science

https://www.youtube.com/watc
h?v=hxXIJnjC_HI (Future of
Data Science @ Stanford)

Related events in OSU:

DataFest
Hackathon
Conduct research in labs

Figure from: https://www.datasciencecentral.com/profiles/blogs/difference-


58 of-data-science-machine-learning-and-data-mining
Chapter 1. Introduction
¨ Why Data Mining?
¨ What Is Data Mining?
¨ A Multi-Dimensional View of Data Mining
¨ What Kinds of Data Can Be Mined?
¨ What Kinds of Patterns Can Be Mined?
¨ What Kinds of Technologies Are Used?
¨ What Kinds of Applications Are Targeted?
¨ Major Issues in Data Mining
¨ A Brief History of Data Mining and Data Mining Society
¨ Summary
59
Summary
¨ Data mining: Discovering interesting patterns and knowledge from
massive amount of data
¨ A natural evolution of science and information technology, in great
demand, with wide applications
¨ A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
¨ Mining can be performed in a variety of data
¨ Data mining functionalities: characterization, discrimination,
association, classification, clustering, trend and outlier analysis, etc.
¨ Data mining technologies and applications

60
¨ Major issues in data mining
Recommended Reference Books
¨ Charu C. Aggarwal, Data Mining: The Textbook, Springer, 2015
¨ E. Alpaydin. Introduction to Machine Learning, 2nd ed., MIT Press, 2011
¨ R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-
Interscience, 2000
¨ U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining
and Knowledge Discovery, Morgan Kaufmann, 2001
¨ J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Morgan
Kaufmann, 3rd ed. , 2011
¨ T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning:
Data Mining, Inference, and Prediction, 2nd ed., Springer, 2009
¨ T. M. Mitchell, Machine Learning, McGraw Hill, 1997
¨ P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005
(2nd ed. 2016)
¨ I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations, Morgan Kaufmann, 2nd ed. 2005
¨ Mohammed J. Zaki and Wagner Meira Jr., Data Mining and Analysis:
Fundamental Concepts and Algorithms 2014
61
Future of Data Science

https://www.youtube.com/watc
h?v=hxXIJnjC_HI

Related events in OSU:

DataFest
Hackathon
Conduct research in labs
Figure from: https://www.datasciencecentral.com/profiles/blogs/difference-
62 of-data-science-machine-learning-and-data-mining

You might also like