0% found this document useful (0 votes)
5 views

Week1-2

The document discusses major issues, algorithms, and venues in data mining, emphasizing the importance of identifying interesting patterns and the challenges of handling large, complex datasets. It outlines various data types and mining functionalities, including classification, clustering, and outlier analysis. Additionally, it highlights the need for scalable algorithms and user interaction in data mining applications.

Uploaded by

sidramughal1011
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Week1-2

The document discusses major issues, algorithms, and venues in data mining, emphasizing the importance of identifying interesting patterns and the challenges of handling large, complex datasets. It outlines various data types and mining functionalities, including classification, clustering, and outlier analysis. Additionally, it highlights the need for scalable algorithms and user interaction in data mining applications.

Uploaded by

sidramughal1011
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA MINING: LECTURE 2

agenda
Major Issues in Data Mining

Major Algorithms in Data Mining

Major Venues of Data Mining


MAJOR ISSUES
IN DATA MINING
Are All the “Discovered” Patterns Interesting?

• Data mining may generate thousands of patterns: Not all of


them are interesting
• Suggested approach: Human-centered, query-based, focused
mining

• Interestingness measures
• A pattern is interesting if it is easily understood by humans,
valid on new or test data with some degree of certainty,
potentially useful, novel, or validates some hypothesis that a
user seeks to confirm

• Objective vs. subjective interestingness measures


• Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
Find All and Only Interesting Patterns?

• Find all the interesting patterns: Completeness


• Can a data mining system find all the interesting patterns?
Do we need to find all of the interesting patterns?
• Heuristic vs. exhaustive search
• Association vs. classification vs. clustering
• Search for only interesting patterns: An optimization problem
• Can a data mining system find only the interesting
patterns?
• Approaches
• First generate all the patterns and then filter out the
uninteresting ones
• Generate only the interesting patterns—mining query
optimization
Other Pattern Mining Issues
• Precise patterns vs. approximate patterns
• Association and correlation mining: find sets of precise
patterns
• But approximate patterns can be more compact and sufficient
• How to find high quality approximate patterns??
• Gene sequence mining: approximate patterns are inherent
• How to device efficient approximate pattern mining algorithms??

• Constrained vs. non-constrained patterns


• Why constraint-based mining?
• What are the possible constraints? How to push constraints
into the mining process?
Why Not Traditional Data Analysis?
• Tremendous amount of data
• Algorithms must be highly scalable to handle such tera-
bytes of data
• High-dimensionality of data
• Micro-array may have tens of thousands of dimensions
• High complexity of data
• Data streams and sensor data
• Time-series data, temporal data, sequence data
• Structure data, graphs, social networks and multi-linked data
• Heterogeneous databases and legacy databases
• Spatial, spatiotemporal, multimedia, text and Web data
• Software programs, scientific simulations
• New and sophisticated applications
Multi-Dimensional View of Data Mining

• Data to be mined
• Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-
media, heterogeneous, legacy, WWW
• Knowledge to be mined
• Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
• Multiple/integrated functions and mining at multiple levels
• Techniques utilized
• Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, etc.
• Applications adapted
• Retail, telecommunication, banking, fraud analysis, bio-data
mining, stock market analysis, text mining, Web mining, etc.
Data Mining: Classification Scheme

• General functionality
• Descriptive data mining
• Predictive data mining

• Different views lead to different


classifications
• Data view: Kinds of data to be mined
• Knowledge view: Kinds of knowledge to be discovered
• Method view: Kinds of techniques utilized
• Application view: Kinds of applications adapted
Data Mining: On What Kinds of Data?

• Database-oriented data sets and applications

• Relational database, data warehouse, transactional database

• Advanced data sets and advanced applications

• Data streams and sensor data


• Time-series data, temporal data, sequence data (incl. bio-
sequences)
• Structure data, graphs, social networks and multi-linked data
• Object-relational databases
• Heterogeneous databases and legacy databases
• Spatial data and spatiotemporal data
• Multimedia database
• Text databases
• The World-Wide Web
Relational Database

• Interrelated data
• Software programs
• to manage & access
• Data
• Data access
• Concurrent
• Shared
• Distributed
• Based on ER model
• Contains a unique
key
• SQL Vs. Data
mining
• SQL: Look for
customers or
sales in a month
• Data mining:
determine credit
Transactional Database

• File where each record


represents a transaction
• Normal queries
• Items bought by Zafar
Iqbal
• Transactions for a
certain item such as
cigarettes etc.

• Data mining can


• Find what items are
sold together (market
basket)
• What items are more
frequently sold
Data Warehouse
• Summary of data organized around major subjects
• Involves data cleaning, integration, transformation,
loading and periodic refreshing
• Multi-dimensional database structure
• Each dimension corresponds to an attribute

Data mart vs. Datawarehouse: Department wide vs. enterprise wide


Data-cubes Level 1

Lets “drill down” on countries!


Data-cubes Level 2

Lets ‘drill down’ on time!


Advanced Data and Information Systems

• Object Relational Databases


• Temporal, Sequence and Time-series Databases
• Examples: data from stock exchange, inventory control and
observation of natural phenomena
• Data mining to unravel the change in trends
• Spatial and Spatiotemporal Databases
• Uncover patterns pertaining fields, gardens or houses
• Text and Multimedia Databases
• Heterogeneous and Legacy Databases
• Information exchange is the main issue which may be resolved
using Data mining to generalize data in to higher conceptual
levels
• Data Streams
• Scientific and engineering data
• Data mining-constantly evaluate incoming streams for patterns
and dynamic changes
• The World Wide Web
Other Major Issues in Data Mining
• Mining methodology
• Mining different kinds of knowledge from diverse data types, e.g., bio,
stream, Web
• Performance: efficiency, effectiveness, and scalability
• Pattern evaluation: the interestingness problem
• Incorporation of background knowledge
• Handling noise and incomplete data
• Parallel, distributed and incremental mining methods
• Integration of the discovered knowledge with existing one: knowledge
fusion

• User interaction
• Data mining query languages and ad-hoc mining
• Expression and visualization of data mining results
• Interactive mining of knowledge at multiple levels of abstraction
• Applications and social impacts
• Domain-specific data mining & invisible data mining
• Protection of data security, integrity, and privacy
Data Mining
Functionalities
• Data mining tasks can be classified in to two
categories:
• Descriptive: Characterize the general properties
of data
• Predictive: Inferences on current data in order
to make predictions
• A measure of certainty may also be associated with
each pattern
Data Mining Functionalities
• Multidimensional concept description: Characterization and
discrimination
• Characterization: Generalize or summarize the target class or
class under study based upon features, and contrast data
characteristics, e.g., dry vs. wet regions
• Discrimination: Is comparing a target class with a set of
contrasting classes

• Frequent patterns, association, correlation vs. causality


• Extremist Literature  Bombs [0.5%, 75%] (Correlation or
causality?)

• Classification and prediction


• Construct models (functions) that describe and distinguish
classes or concepts for future prediction
• E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
Classification
Clustering
Data Mining Functionalities
• Cluster analysis
• Class label is unknown: Group data to form new classes, e.g.,
cluster houses to find distribution patterns
• Maximizing intra-class similarity & minimizing interclass
similarity
• Outlier analysis
• Outlier: Data object that does not comply with the general
behavior of the data
• Noise or exception? Useful in fraud detection, rare events
analysis
• Trend and evolution analysis
• Trend and deviation: e.g., regression analysis
• Sequential pattern mining: e.g., digital camera  large SD
memory
• Periodicity analysis
• Similarity-based analysis
• Other pattern-directed or statistical analyses
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decisio
n
Making
Data Presentation Business
Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
DBA
Data Preprocessing/Integration, Data Warehouses

Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Architecture: Typical Data Mining System

Graphical User Interface

Pattern Evaluation
Know
Data Mining Engine ledge
-Base
Database or Data
Warehouse Server

data cleaning, integration, and selection

Data World-Wide Other Info


Database Repositories
Warehouse Web

You might also like