0% found this document useful (0 votes)
2 views

01Intro

This document provides an introduction to data science and data mining, discussing their definitions, evolution, techniques, and applications across various sectors. It emphasizes the importance of data mining in transforming vast amounts of data into actionable insights and highlights the various types of data that can be mined. Additionally, it outlines the knowledge discovery process and the different functions of data mining, such as classification, clustering, and outlier analysis.

Uploaded by

mansikumar020704
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

01Intro

This document provides an introduction to data science and data mining, discussing their definitions, evolution, techniques, and applications across various sectors. It emphasizes the importance of data mining in transforming vast amounts of data into actionable insights and highlights the various types of data that can be mined. Additionally, it outlines the knowledge discovery process and the different functions of data mining, such as classification, clustering, and outlier analysis.

Uploaded by

mansikumar020704
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Data Science

— Unit 1 —

1
Chapter 1. Introduction
◼ Why Data Science?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

2
Why Data Mining?

◼ The Explosive Growth of Data: from terabytes to petabytes


◼ Data collection and data availability
◼ Automated data collection tools, database systems, Web,
computerized society
◼ Major sources of abundant data
◼ Business: Web, e-commerce, transactions, stocks, …
◼ Science: Remote sensing, bioinformatics, scientific simulation, …
◼ Society and everyone: news, digital cameras, YouTube
◼ We are drowning in data, but starving for knowledge!
◼ “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets

3
Evolution of Sciences
◼ Before 1600, empirical science
◼ 1600-1950s, theoretical science
◼ Each discipline has grown a theoretical component. Theoretical models often
motivate experiments and generalize our understanding.
◼ 1950s-1990s, computational science
◼ Over the last 50 years, most disciplines have grown a third, computational branch
(e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)
◼ Computational Science traditionally meant simulation. It grew out of our inability
to find closed-form solutions for complex mathematical models.
◼ 1990-now, data science
◼ The flood of data from new scientific instruments and simulations
◼ The ability to economically store and manage petabytes of data online
◼ The Internet and computing Grid that makes all these archives universally
accessible
◼ Scientific info. management, acquisition, organization, query, and visualization
tasks scale almost linearly with data volumes. Data mining is a major new
challenge!
4
Evolution of Database Technology
◼ 1960s:
◼ Data collection, database creation, IMS and network DBMS
◼ 1970s:
◼ Relational data model, relational DBMS implementation
◼ 1980s:
◼ RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
◼ Application-oriented DBMS (spatial, scientific, engineering, etc.)
◼ 1990s:
◼ Data mining, data warehousing, multimedia databases, and Web
databases
◼ 2000s
◼ Stream data management and mining
◼ Data mining and its applications
◼ Web technology (XML, data integration) and global information systems

5
Data Science
◼ Definition: Data science is an interdisciplinary field that combines
statistics, mathematics, computer science, and domain knowledge to
extract insights from structured and unstructured data.
◼ Purpose: The primary goal is to analyze data to answer questions,
identify patterns, and support decision-making processes in various
industries.
◼ Key Components:
1. Data Collection: Gathering raw data from various sources such as
databases, sensors, or user interactions.
2. Data Cleaning: Ensuring the data is accurate and ready for analysis
by removing duplicates and filling in missing values.
3. Data Analysis: Applying statistical methods and algorithms to
identify trends and relationships within the data.
4. Data Visualization: Presenting analysis results through charts and
graphs to facilitate understanding and decision-making.
6
Data Science
◼ Techniques Used: Incorporates machine learning, predictive
analytics, and statistical modeling to derive actionable insights.
◼ Applications: Utilized in various sectors including healthcare,
finance, e-commerce, and marketing to enhance business strategies
and operations.
◼ Importance: As organizations increasingly rely on data for strategic
decisions, data science plays a crucial role in transforming raw data
into meaningful information that drives business success.

7
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

8
What Is Data Mining?

◼ Data mining (knowledge discovery from data)


◼ Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
◼ Data mining: a misnomer?
◼ Alternative names
◼ Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
◼ Watch out: Is everything “data mining”?
◼ Simple search and query processing
◼ (Deductive) expert systems

9
Knowledge Discovery (KDD) Process
◼ This is a view from typical
database systems and data
Pattern Evaluation
warehousing communities
◼ Data mining plays an essential
role in the knowledge discovery
process Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases CHECK GPT chat KDD process Overview


10
Example: A Web Mining Framework

◼ Web mining usually involves


◼ Data cleaning
◼ Data integration from multiple sources
◼ Warehousing the data
◼ Data cube construction
◼ Data selection for data mining
◼ Data mining
◼ Presentation of the mining results
◼ Patterns and knowledge to be used or stored into
knowledge-base

11
Data Mining in Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses


DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
12
Example: Mining vs. Data Exploration

◼ Business intelligence view


◼ Warehouse, data cube, reporting but not much mining
◼ Business objects vs. data mining tools
◼ Supply chain example: tools
◼ Data presentation
◼ Exploration
Business Intelligence View

Business intelligence (BI) involves analyzing historical and current data to make strategic decisions.
Warehouses and data cubes are used for structured storage and reporting, but they don’t necessarily involve complex data mining.
Business Objects vs. Data Mining Tools

Business objects (like reports, dashboards, and OLAP cubes) focus on data exploration rather than deep pattern discovery.
Data mining tools, on the other hand, extract hidden patterns, correlations, and predictions from large datasets.
Supply Chain Example

BI tools may provide supply chain dashboards, tracking inventory, shipments, and demand trends.
Data mining, however, could predict demand patterns, detect anomalies, or optimize logistics based on historical data.
Data Presentation

Data exploration uses charts, graphs, and summaries to present insights. 13


KDD Process: A Typical View from ML and
Statistics

Input Data Data Pre- Data Post-


Processing Mining Processing

Data integration Pattern discovery Pattern evaluation


Normalization Association & correlation Pattern selection
Feature selection Classification Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………

◼ This is a view from typical machine learning and statistics communities

14
Example: Medical Data Mining

◼ Health care & medical data mining – often


adopted such a view in statistics and machine
learning
◼ Preprocessing of the data (including feature
extraction and dimension reduction)
◼ Classification or/and clustering processes
◼ Post-processing for presentation

15
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

16
Multi-Dimensional View of Data Mining
◼ Data to be mined
◼ Database data (extended-relational, object-oriented, heterogeneous,

legacy), data warehouse, transactional data, stream, spatiotemporal,


time-series, sequence, text and web, multi-media, graphs & social
and information networks
◼ Knowledge to be mined (or: Data mining functions)
◼ Characterization, discrimination, association, classification,

clustering, trend/deviation, outlier analysis, etc.


◼ Descriptive vs. predictive data mining

◼ Multiple/integrated functions and mining at multiple levels

◼ Techniques utilized
◼ Data-intensive, data warehouse (OLAP), machine learning, statistics,

pattern recognition, visualization, high-performance, etc.


◼ Applications adapted
◼ Retail, telecommunication, banking, fraud analysis, bio-data mining,

stock market analysis, text mining, Web mining, etc.


17
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

18
Data Mining: On What Kinds of Data?
◼ Database-oriented data sets and applications
◼ Relational database, data warehouse, transactional database
◼ Advanced data sets and advanced applications
◼ Data streams and sensor data
◼ Time-series data, temporal data, sequence data (incl. bio-sequences)
◼ Structure data, graphs, social networks and multi-linked data
◼ Object-relational databases
◼ Heterogeneous databases and legacy databases
◼ Spatial data and spatiotemporal data
◼ Multimedia database
◼ Text databases
◼ The World-Wide Web

19
Different Sources of Data for Data Analysis
1. Primary Data Sources
◼ Surveys: Collecting firsthand information through questionnaires from a
targeted audience.
◼ Observations: Gathering data by observing behaviors or events in their natural
settings.
◼ Experiments: Conducting controlled tests to gather data on specific variables
and their effects.
◼ Interviews: Directly engaging with individuals to obtain detailed information.
◼ Focus Groups: Discussing topics with a group to gather diverse perspectives
and insights.

20
Different Sources of Data for Data Analysis
2. Secondary Data Sources
◼ Government Databases: Publicly available data such as census data, economic
reports, and health statistics.
◼ Academic Journals: Research studies and articles that provide validated data in
various fields.
◼ Corporate Reports: Financial statements and performance reports from
businesses.
◼ Online Repositories: Platforms that aggregate data sets from various sources
for public access.
◼ Historical Records: Archived data that can provide insights into past trends and
events.

21
Different Sources of Data for Data Analysis
3. External Data Sources
◼ Social Media: Data from platforms like Twitter, Facebook, and LinkedIn that can
reveal consumer sentiment and trends.
◼ Market Research Data: Insights from studies conducted by research firms on
consumer behavior and market conditions.
◼ Weather Data: Information about climate conditions that can be relevant for
various analyses, especially in agriculture and logistics.
◼ APIs (Application Programming Interfaces): Tools that allow access to data
from web services, enabling integration with other applications.

22
Different Sources of Data for Data Analysis
4. Big Data Sources
◼ Machine Data: Information generated by machines or sensors, often used in
IoT applications.
◼ File Data: Structured or unstructured data stored in files that can be shared
across platforms.
5. Open Data Sources
◼ Public Health Data: Information related to health trends, disease outbreaks,
and healthcare access.
◼ World Bank Open Data: Global statistics on development indicators across
countries.

23
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ A Brief History of Data Mining and Data Mining Society

◼ Summary
24
Data Mining Function: (1) Generalization

◼ Information integration and data warehouse construction


◼ Data cleaning, transformation, integration, and
multidimensional data model
◼ Data cube technology
◼ Scalable methods for computing (i.e., materializing)
multidimensional aggregates
◼ OLAP (online analytical processing)
◼ Multidimensional concept description: Characterization
and discrimination
◼ Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet region

25
Data Mining Function: (2) Association and
Correlation Analysis
◼ Frequent patterns (or frequent itemsets)
◼ What items are frequently purchased together in your
Walmart?
◼ Association, correlation vs. causality
◼ A typical association rule
◼ Diaper → Beer [0.5%, 75%] (support, confidence)
◼ Are strongly associated items also strongly correlated?
◼ How to mine such patterns and rules efficiently in large
datasets?
◼ How to use such patterns for classification, clustering,
and other applications?
26
Data Mining Function: (3) Classification

◼ Classification and label prediction


◼ Construct models (functions) based on some training examples
◼ Describe and distinguish classes or concepts for future prediction
◼ E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
◼ Predict some unknown class labels
◼ Typical methods
◼ Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
◼ Typical applications:
◼ Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …

27
Data Mining Function: (4) Cluster Analysis

◼ Unsupervised learning (i.e., Class label is unknown)


◼ Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns
◼ Principle: Maximizing intra-class similarity & minimizing
interclass similarity
◼ Many methods and applications

28
Data Mining Function: (5) Outlier Analysis
◼ Outlier analysis
◼ Outlier: A data object that does not comply with the general
behavior of the data
◼ Noise or exception? ― One person’s garbage could be another
person’s treasure
◼ Methods: by product of clustering or regression analysis, …
◼ Useful in fraud detection, rare events analysis

29
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

30
Data Mining: Confluence of Multiple Disciplines

Machine Pattern Statistics


Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance


Technology Computing

31
Why Confluence of Multiple Disciplines?

◼ Tremendous amount of data


◼ Algorithms must be highly scalable to handle such as tera-bytes of
data
◼ High-dimensionality of data
◼ Micro-array may have tens of thousands of dimensions
◼ High complexity of data
◼ Data streams and sensor data
◼ Time-series data, temporal data, sequence data
◼ Structure data, graphs, social networks and multi-linked data
◼ Heterogeneous databases and legacy databases
◼ Spatial, spatiotemporal, multimedia, text and Web data
◼ Software programs, scientific simulations
◼ New and sophisticated applications
32
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

33
Applications of Data Mining
◼ Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
◼ Collaborative analysis & recommender systems
◼ Basket data analysis to targeted marketing
◼ Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological
network analysis
◼ Data mining and software engineering (e.g., IEEE Computer, Aug.
2009 issue)
◼ From major dedicated data mining systems/tools (e.g., SAS, MS SQL-
Server Analysis Manager, Oracle Data Mining Tools) to invisible data
mining

34
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

35
Major Issues in Data Mining (1)

◼ Mining Methodology
◼ Mining various and new kinds of knowledge
◼ Mining knowledge in multi-dimensional space
◼ Data mining: An interdisciplinary effort
◼ Boosting the power of discovery in a networked environment
◼ Handling noise, uncertainty, and incompleteness of data
◼ Pattern evaluation and pattern- or constraint-guided mining
◼ User Interaction
◼ Interactive mining
◼ Incorporation of background knowledge
◼ Presentation and visualization of data mining results

36
Major Issues in Data Mining (2)

◼ Efficiency and Scalability


◼ Efficiency and scalability of data mining algorithms
◼ Parallel, distributed, stream, and incremental mining methods
◼ Diversity of data types
◼ Handling complex types of data
◼ Mining dynamic, networked, and global data repositories
◼ Data mining and society
◼ Social impacts of data mining
◼ Privacy-preserving data mining
◼ Invisible data mining

37
Challenges in Data Science

◼ Data Quality: Ensuring accuracy and reliability of data can be


difficult due to inconsistencies or missing values.
◼ Data Privacy: Handling sensitive information requires compliance
with regulations like GDPR and HIPAA to protect user privacy.
◼ Integration: Combining data from multiple sources can lead to
compatibility issues and require significant preprocessing efforts.
◼ Scalability: Managing large volumes of data necessitates robust
infrastructure and tools to process and analyze effectively.
◼ Skill Gap: There is often a shortage of skilled professionals who can
effectively utilize data science techniques and tools.

38
Applications of Data Science

◼ Healthcare: Analyzing patient data for better diagnosis, treatment


plans, and predicting disease outbreaks.
◼ Finance: Risk assessment, fraud detection, algorithmic trading, and
customer segmentation through predictive analytics.
◼ Marketing: Personalizing marketing strategies based on consumer
behavior analysis and improving customer engagement.
◼ E-commerce: Enhancing user experience through recommendation
systems that suggest products based on browsing history.
◼ Social Media Analysis: Understanding user sentiment and trends
by analyzing social media interactions and content.

39
Introduction to Data Modeling

◼ Definition: Data modeling involves creating a conceptual representation


of data and its relationships to facilitate data analysis and decision-
making.
◼ Data Mining: A key process in data modeling, data mining is the
extraction of meaningful patterns and insights from large datasets using
algorithms and statistical methods.
◼ Key Concepts
◼ Model Creation: The primary goal of data mining is often to develop a
model from data, which can be used for predictions and insights.
◼ Machine Learning: Many data mining techniques utilize machine learning,
allowing systems to learn from data without explicit programming.
◼ Algorithms vs. Models: While some approaches focus on creating models
(like predicting phishing emails), others involve algorithms that do not
require a model, such as locality-sensitive hashing. 40
Introduction to Data Modeling
◼ Example: Phishing Email Detection
◼ Model Development: To detect phishing emails, a model can be built by
analyzing reported phishing emails to identify common words or phrases
(e.g., "verify account").
◼ Weight Assignment: The model assigns weights to words based on their
frequency in phishing emails—positive weights for common words and
negative weights for uncommon ones.
◼ Detection Algorithm: The algorithm applies the model by summing the
weights of words in each email. An email is flagged as phishing if the
total weight is positive.

41
Introduction to Data Modeling
◼ Challenges in Data Modeling
◼ Model Complexity: Developing an effective model can be challenging,
requiring careful consideration of data features and relationships.
◼ Algorithm Implementation: Once a model is created, applying it through
algorithms is generally straightforward, but finding the optimal model
parameters can be difficult.

42
Statistical Modeling
◼ Definition: Statistical data modeling is the process of applying statistical
analysis techniques to datasets to understand relationships, make
predictions, and derive insights.
◼ Purpose: The goal is to create mathematical representations (models)
that describe the underlying structure of the data and can be used for
forecasting future outcomes.
◼ It is an underlying distribution from which the visible data is drawn.

43
Computational Approaches to Modeling
◼ Definition: Computational modeling involves using algorithms and
computer programs to analyze and simulate complex systems, often
contrasting with traditional statistical approaches.
◼ Algorithmic Perspective:
◼ In computational modeling, a model is seen as the solution to a complex
query about the data rather than a statistical representation.
◼ For example, calculating the average and standard deviation of a
dataset provides insights without necessarily fitting a Gaussian
distribution.

44
Computational Approaches to Modeling
◼ Modeling Approaches
◼ Summarization:
◼ This approach focuses on succinctly representing data while capturing
essential features, allowing for easier interpretation and analysis.
◼ Feature Extraction:
◼ Prominent features of the data are identified and retained, while less
significant information is ignored, simplifying the dataset for further
analysis.
◼ Examples of Computational Modeling Techniques
◼ Random Processes: Constructing models based on random processes that
simulate how data could have been generated.
◼ Machine Learning Algorithms: Utilizing algorithms that learn from historical
data to make predictions or identify patterns.
45
Statistical Limits on Data Mining

◼ A common sort of data-mining problem involves discovering unusual


events hidden within massive amounts of data. This section is a
discussion of the problem, including “Bonferroni’s Principle,” a warning
against overzealous use of data mining.
◼ Bonferroni’s Principle
◼ Definition: Bonferroni's Principle is a statistical concept that helps avoid
false positives when searching for significant events in data. It suggests
that if the expected number of occurrences of an event is high, many of
those occurrences are likely to be random artifacts rather than genuine
signals.

46
Statistical Limits on Data Mining

◼ Random Data and Bogus Events:


◼ When analyzing data, random occurrences can mimic significant events.
◼ As the size of the dataset increases, the likelihood of observing these
random occurrences also increases.
◼ Calculating Expected Occurrences:
◼ To apply Bonferroni’s Principle, calculate the expected number of
occurrences of the events of interest under the assumption that the data
is random.
◼ If this expected number is much larger than the actual number of real
instances you anticipate, most findings should be considered bogus.

47
48
Bonferroni’s Principle: Example
◼ Even without any actual evil-doers, there would be approximately
250,000 pairs appearing suspicious.
◼ This highlights the challenge in distinguishing genuine signals from noise
in large datasets.
◼ Implications
◼ False Positives: High numbers of expected occurrences can lead to
significant resources being wasted investigating innocent individuals.
◼ Application in Security: In contexts like terrorism detection, it emphasizes
the need to look for rare events that are less likely to occur randomly to
effectively identify genuine threats.
◼ Bonferroni’s Principle serves as a critical reminder in data analysis and
mining, urging caution against overinterpreting results from large
datasets and highlighting the importance of rigorous statistical methods
to validate findings. 49
Chapter 1. Introduction
◼ Why Data Mining?

◼ What Is Data Mining?

◼ A Multi-Dimensional View of Data Mining

◼ What Kind of Data Can Be Mined?

◼ What Kinds of Patterns Can Be Mined?

◼ What Technology Are Used?

◼ What Kind of Applications Are Targeted?

◼ Major Issues in Data Mining

◼ Summary

50
Summary
◼ Data mining: Discovering interesting patterns and knowledge from
massive amount of data
◼ A natural evolution of database technology, in great demand, with
wide applications
◼ A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
◼ Mining can be performed in a variety of data
◼ Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
◼ Data mining technologies and applications
◼ Major issues in data mining

51
Recommended Reference Books
◼ S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data. Morgan
Kaufmann, 2002
◼ R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Interscience, 2000
◼ T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, 2003
◼ U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and
Data Mining. AAAI/MIT Press, 1996
◼ U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge
Discovery, Morgan Kaufmann, 2001
◼ J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd ed., 2011
◼ D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001
◼ T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference,
and Prediction, 2nd ed., Springer-Verlag, 2009
◼ B. Liu, Web Data Mining, Springer 2006.
◼ T. M. Mitchell, Machine Learning, McGraw Hill, 1997
◼ G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991
◼ P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005
◼ S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998
◼ I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations, Morgan Kaufmann, 2nd ed. 2005

52

You might also like