0% found this document useful (0 votes)

33 views100 pages

4 Data Mining & Preprocessing L 11,12,13,14,15,16

Uploaded by

MANOJ KUMAWAT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views100 pages

4 Data Mining & Preprocessing L 11,12,13,14,15,16

Uploaded by

MANOJ KUMAWAT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Data Mining and

Pre-processing
Lecture-11,12,13,14,15,16
Dr. Sumit Dhariwal
School of Computing Information Technology
Manipal University Jaipur
India
Outline

➢1.1 Motivation: Why data mining?

➢1.2 What is data mining?

➢1.3 Data Mining: On what kind of data?

➢1.4 Data mining functionality: What Kinds of Patterns Can Be Mined?

➢1.5 Are all the patterns interesting?

➢1.6 Classification of data mining systems

➢1.7 Data Mining Task Primitives

➢1.8 Integration of data mining system with a DB and DW System

➢1.9 Major issues in data mining

Data Mining: Concepts and Techniques

1.1 Why Data Mining?

• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)

• Data collection and data availability
• Automated data collection tools, database systems, web
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: bioinformatics, scientific simulation, medical research …
• Society and everyone: news, digital cameras, …
• Data rich but information poor!
• What does those data mean?
• How to analyze data?

• Data mining — Automated analysis of massive data sets

Data Mining: Concepts and Techniques
Evolution of Database Technology
1.2 What Is Data Mining?
• Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.

Data Mining: Concepts and Techniques 5

Potential Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
• Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
• Fraud detection and detection of unusual patterns (outliers)
• Other Applications
• Text mining (news group, email, documents) and Web mining
• Stream data mining
• Bioinformatics and bio-data analysis
Data Mining: Concepts and Techniques
Ex.: Market Analysis and Management

• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
• Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses $600 - $800 a month live in that area
• Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k usually buy this type of CD player

• Cross-market analysis—Find associations/co-relations between product sales, & predict

based on such association
• E.g. Customers who buy computer A usually buy software B

Data Mining: Concepts and Techniques 7

Ex.: Market Analysis and Management (2)

• Customer requirement analysis

• Identify the best products for different customers
• Predict what factors will attract new customers
• Provision of summary information
• Multidimensional summary reports
• E.g. Summarize all transactions of the first quarter from three different branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
• Statistical summary information
• E.g. What is the average age for customers who buy product A?

• Fraud detection
• Find outliers of unusual transactions
• Financial planning
• Summarize and compare the resources and spending

Data Mining: Concepts and Techniques 8

Knowledge Discovery (KDD) Process

Data Mining: Concepts and Techniques

KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Identifying a target data set: data selection
• Data processing
• Data cleaning (remove noise and inconsistent data)
• Data integration (multiple data sources maybe combined)
• Data selection (data relevant to the analysis task are retrieved from database)
• Data transformation (data transformed or consolidated into forms appropriate for mining)
(Done with data preprocessing)
• Data mining (an essential process where intelligent methods are applied to extract
data patterns)
• Pattern evaluation (indentify the truly interesting patterns)
• Knowledge presentation (mined knowledge is presented to the user with
visualization or representation techniques)

• Use of discovered knowledge

Data Mining: Concepts and Techniques 10
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses

DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Data Mining: Concepts and Techniques 11
A typical DM System Architecture

• Database, data warehouse, WWW or other information

repository (store data)
• Database or data warehouse server (fetch and
combine data)
• Knowledge base (turn data into meaningful groups
according to domain knowledge)
• Data mining engine (perform mining tasks)
• Pattern evaluation module (find interesting patterns)
• User interface (interact with the user)

Data Mining: Concepts and Techniques

A typical DM System Architecture (2)
Confluence of Multiple Disciplines

Database
Technology Statistics

Information Machine
Science Data Mining Learning

Visualization Other
Disciplines

• Not all “Data Mining System” performs true data mining

➢ machine learning system, statistical analysis (small amount of data)
➢ Database system (information retrieval, deductive querying…)

Data Mining: Concepts and Techniques 14

1.3 On What Kinds of Data?

• Database-oriented data sets and applications

• Relational database, data warehouse, transactional database

• Advanced data sets and advanced applications

• Object-Relational Databases
• Temporal Databases, Sequence Databases, Time-Series databases
• Spatial Databases and Spatiotemporal Databases
• Text databases and Multimedia databases
• Heterogeneous Databases and Legacy Databases
• Data Streams
• The World-Wide Web

Data Mining: Concepts and Techniques 15

Relational Databases
• DBMS – database management system, contains a collection of
interrelated databases
e.g. Faculty database, student database, publications database
• Each database contains a collection of tables and functions to
manage and access the data.
e.g. student_bio, student_graduation, student_parking
• Each table contains columns and rows, with columns as attributes of data and rows as
records.
• Tables can be used to represent the relationships between or among multiple tables.

Data Mining: Concepts and Techniques

Relational Databases (2) – AllElectronics store

Data Mining: Concepts and Techniques

Relational Databases (3)
• With a relational query language, e.g. SQL, we will be able to find
answers to questions such as:
• How many items were sold last year?
• Who has earned commissions higher than 10%?
• What is the total sales of last month for Dell laptops?
• When data mining is applied to relational databases, we can search for
trends or data patterns.
• Relational databases are one of the most commonly available and
rich information repositories, and thus are a major data form in our study.

Data Mining: Concepts and Techniques

Data Warehouses

• A repository of information collected from multiple sources, stored

under a unified schema, and that usually resides at a single site.
• Constructed via a process of data cleaning, data integration, data
transformation, data loading and periodic data refreshing.

Data Mining: Concepts and Techniques

Data Warehouses (2)

• Data are organized around major subjects, e.g. customer, item, supplier and
activity.
• Provide information from a historical perspective (e.g. from the past 5 – 10
years)
• Typically summarized to a higher level (e.g. a summary of the
transactions per item type for each store)
• User can perform drill-down or roll-up operation to view the data at
different degrees of summarization

Data Mining: Concepts and Techniques

Data Warehouses (3)
Transactional Databases

• Consists of a file where each record represents a transaction

• A transaction typically includes a unique transaction ID and a list of the
items making up the transaction.

• Either stored in a flat file or unfolded into relational tables

• Easy to identify items that are frequently sold together

Data Mining: Concepts and Techniques

1.4 Data Mining Functionalities
- What kinds of patterns can be mined?
• Concept/Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• E.g. classes of items – computers, printers, …
concepts of customers – bigSpenders, budgetSpenders, …
• How to describe these items or concepts?
• Descriptions can be derived via
• Data characterization – summarizing the general characteristics of a
target class of data.
• E.g. summarizing the characteristics of customers who spend more than $1,000 a year
at AllElectronics. Result can be a general profile of the customers, such as 40 – 50 years old,
employed, have excellent credit ratings.

Data Mining: Concepts and Techniques 23

1.4 Data Mining Functionalities
- What kinds of patterns can be mined?
• Data discrimination – comparing the target class with one or a set of
comparative classes
• E.g. Compare the general features of software products whole sales increase by 10% in the last year
with those whose sales decrease by 30% during the same period

• Or both of the above

• Mining Frequent Patterns, Associations and

Correlations
• Frequent itemset: a set of items that frequently appear
together in a transactional data set (e.g. milk and bread)
• Frequent subsequence: a pattern that customers tend to purchase product A,
followed by a purchase of product B

Data Mining: Concepts and Techniques 24

1.4 Data Mining Functionalities
- What kinds of patterns can be mined?
• Association Analysis: find frequent patterns
• E.g. a sample analysis result – an association rule:
buys(X, “computer”) => buys(X, “software”) [support = 1%, confidence = 50%]
(if a customer buys a computer, there is a 50% chance that she will buy software. 1% of all of
the transactions under analysis showed that computer and software
are purchased together. )
• Associations rules are discarded as uninteresting if they do not satisfy both a minimum
support threshold and a minimum confidence threshold.
• Correlation Analysis: additional analysis to find statistical correlations
between associated pairs

Data Mining: Concepts and Techniques 25

1.4 Data Mining Functionalities
- What kinds of patterns can be mined?
• Classification and Prediction
• Classification
• The process of finding a model that describes and distinguishes the data classes or concepts,
for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
• The derived model is based on the analysis of a set of training data (data objects whose class
label is known).
• The model can be represented in classification (IF-THEN) rules, decision trees,
neural networks, etc.
• Prediction
• Predict missing or unavailable numerical data values

Data Mining: Concepts and Techniques 26

1.4 Data Mining Functionalities
- What kinds of patterns can be mined?

Data Mining: Concepts and Techniques 27

Data Mining Functionalities (2)

• Cluster Analysis
• Class label is unknown: group data to form new classes
• Clusters of objects are formed based on the principle of maximizing intra-
class similarity & minimizing interclass similarity
• E.g. Identify homogeneous subpopulations of customers. These clusters may
represent individual target groups for marketing.

Data Mining: Concepts and Techniques 28

Data Mining Functionalities (2)

• Outlier Analysis
• Data that do no comply with the general behavior or model.
• Outliers are usually discarded as noise or exceptions.
• Useful for fraud detection.
• E.g. Detect purchases of extremely large amounts
• Evolution Analysis
• Describes and models regularities or trends for objects whose
behavior changes over time.
• E.g. Identify stock evolution regularities for overall stocks and for the stocks of
particular companies.

Data Mining: Concepts and Techniques 29

1.5 Are All of the Patterns Interesting?
• Data mining may generate thousands of patterns: Not all of them
are interesting
• A pattern is interesting if it is
• easily understood by humans
• valid on new or test data with some degree of certainty,
• potentially useful
• novel
• validates some hypothesis that a user seeks to confirm

• An interesting measure represents knowledge !

Data Mining: Concepts and Techniques 30

1.5 Are All of the Patterns Interesting?
• Objective measures
• Based on statistics and structures of patterns, e.g., support, confidence, etc. (Rules
that do not satisfy a threshold are considered uninteresting.)

• Subjective measures
• Reflect the needs and interests of a particular user.
• E.g. A marketing manager is only interested in characteristics of customers who shop
frequently.

• Based on user’s belief in the data.

• e.g., Patterns are interesting if they are unexpected, or can be used for strategic planning, etc

• Objective and subjective measures need to be combined.

Data Mining: Concepts and Techniques 31

1.5 Are All of the Patterns Interesting?
• Find all the interesting patterns: Completeness
• Unrealistic and inefficient
• User-provided constraints and interestingness measures should be used

• Search for only interesting patterns: An optimization problem

• Highly desirable
• No need to search through the generated patterns to identify truly
interesting ones.
• Measures can be used to rank the discovered patterns according their
interestingness.

Data Mining: Concepts and Techniques 32

1.6 Classification of data mining systems

Database
Technology Statistics

Information Machine
Science Data Mining Learning

Visualization Other
Disciplines
1.6 Classification of data mining systems

• Database
• Relational, data warehouse, transactional, stream, object-oriented/relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy, WWW
• Knowledge
• Characterization, discrimination, association, classification, clustering, trend/deviation,
outlier analysis, etc.
• Multiple/integrated functions and mining at multiple levels
• Techniques utilized
• Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.
• Applications adapted
• Retail, telecommunication, banking, fraud analysis, bio-data mining, stock
market analysis, text mining, Web mining, etc.
1.7 Data Mining Task Primitives

• How to construct a data mining query?

• The primitives allow the user to interactively communicate with

the data mining system during discovery to direct the mining

process, or examine the findings

Data Mining: Concepts and Techniques 35

1.7 Data Mining Task Primitives

• The primitives specify:

(1) The set of task-relevant data – which portion of the database to be used

• Database or data warehouse name

• Database tables or data warehouse cubes

• Condition for data selection

• Relevant attributes or dimensions

• Data grouping criteria

Data Mining: Concepts and Techniques 36

1.7 Data Mining Task Primitives

• The primitives specify:

(2) The kind of knowledge to be mined – what DB functions to be performed

• Characterization
• Discrimination
• Association
• Classification/prediction
• Clustering
• Outlier analysis
• Other data mining tasks

Data Mining: Concepts and Techniques 37

1.7 Data Mining Task Primitives

(3) The background knowledge to be used – what domain knowledge,

concept hierarchies, etc.

(4) Interestingness measures and thresholds – support, confidence, etc.

(5) Visualization methods – what form to display the result, e.g. rules,

tables, charts, graphs, …

Data Mining: Concepts and Techniques 38

• DMQL – Data Mining Query Language
• Designed to incorporate these primitives
• Allow user to interact with DM systems
• Providing a standardized language like SQL

Data Mining: Concepts and Techniques 39

An Example Query in DMQL

(1)
(3)
(2)
(1)
(1)

(1)

(2)
(1)

(5)
Data Mining: Concepts and Techniques 40
Why Data Mining Query Language?

• Automated vs. query-driven?

• Finding all the patterns autonomously in a database?—unrealistic because
the patterns could be too many but uninteresting
• Data mining should be an interactive process
• User directs what to be mined
• Users must be provided with a set of primitives to be used to
communicate with the data mining system
• Incorporating these primitives in a data mining query language
• More flexible user interaction
• Foundation for design of graphical user interface
• Standardization of data mining industry and practice

Data Mining: Concepts and Techniques 41

1.8 Integration of Data Mining and Data
Warehousing
• No coupling
– Flat file processing, no utilization of any functions of a DB/DW
system
– Not recommended
• Loose coupling
– Fetching data from DB/DW
– Does not explore data structures and query optimization methods pro
vided by DB/DW system
– Difficult to achieve high scalability and good performance with
large data sets

Data Mining: Concepts and Techniques 42

1.8 Integration of Data Mining and Data
Warehousing

• Semi-tight
– Efficient implementations of a few essential data mining primitives in a DB/
DW system are provided, e.g., sorting, indexing, aggregation,
histogram analysis, multiway join, precomputation of some stat
functions
– Enhanced DM performance
• Tight
– DM is smoothly integrated into a DB/DW system, mining query is
optimized based on mining query analysis, data structures, indexing, query
processing methods of a DB/DW system
– A uniform information processing environment, highly desirable
Data Mining: Concepts and Techniques 43
1.9 Major Issues in Data Mining
• Mining methodology and User interaction
• Mining different kinds of knowledge
• DM should cover a wide spectrum of data analysis and knowledge discovery tasks
• Enable to use the database in different ways
• Require the development of numerous data mining techniques
• Interactive mining of knowledge at multiple levels of abstraction
• Difficult to know exactly what will be discovered
• Allow users to focus the search, refine data mining requests
• Incorporation of background knowledge
• Guide the discovery process
• Allow discovered patterns to be expressed in concise terms and different levels of abstraction
• Data mining query languages and ad hoc data mining
• High-level query languages need to be developed
• Should be integrated with a DB/DW query language

Data Mining: Concepts and Techniques 44

1.9 Major Issues in Data Mining
• Presentation and visualization of results
• Knowledge should be easily understood and directly usable
• High level languages, visual representations or other expressive forms
• Require the DM system to adopt the above techniques
• Handling noisy or incomplete data
• Require data cleaning methods and data analysis methods that can handle noise
• Pattern evaluation – the interestingness problem
• How to develop techniques to access the interestingness of discovered patterns, especially
with subjective measures bases on user beliefs or expectations

Data Mining: Concepts and Techniques 45

1.9 Major Issues in Data Mining
• Performance Issues
• Efficiency and scalability
• Huge amount of data
• Running time must be predictable and acceptable
• Parallel, distributed and incremental mining algorithms
• Divide the data into partitions and processed in parallel
• Incorporate database updates without having to mine the entire data again from
scratch

• Diversity of Database Types

• Other database that contain complex data objects, multimedia data,
spatial data, etc.
• Expect to have different DM systems for different kinds of data
• Heterogeneous databases and global information systems
• Web mining becomes a very Data
challenging and fast-evolving field in data mining
Mining: Concepts and Techniques 46
Preprocessing in Data Mining

Data preprocessing is a data mining technique that is used to transform

the raw data in a useful and efficient format.
Steps Involved in Data Preprocessing

• 1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this
part, data cleaning is done. It involves handling of missing data, noisy
data etc.
• (a). Missing Data:
This situation arises when some data is missing in the data. It can be
handled in various ways.
Some of them are:
• Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and
multiple values are missing within a tuple.

• Fill the Missing values:

There are various ways to do this task. You can choose to fill the missing
values manually, by attribute mean or the most probable value.
• (b). Noisy Data:
Noisy data is meaningless data that can’t be interpreted by machines.It can
be generated due to faulty data collection, data entry errors etc. It can be
handled in following ways :
• Binning Method:
This method works on sorted data in order to smooth it. The whole data is
divided into segments of equal size and then various methods are performed
to complete the task. Each segmented is handled separately. One can replace
all data in a segment by its mean or boundary values can be used to complete
the task.

• Regression:
Here data can be made smooth by fitting it to a regression function.The
regression used may be linear (having one independent variable) or multiple
(having multiple independent variables).

• Clustering:
This approach groups the similar data in a cluster. The outliers may be
undetected or it will fall outside the clusters.
2. Data Transformation

• This step is taken in order to transform the data in appropriate forms

suitable for mining process. This involves following ways:
1.Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or
0.0 to 1.0)

2.Attribute Selection:
In this strategy, new attributes are constructed from the given set of
attributes to help the mining process.

3.Discretization:
This is done to replace the raw values of numeric attribute by interval levels
or conceptual levels.

4.Concept Hierarchy Generation:

Here attributes are converted from lower level to higher level in hierarchy.
For Example-The attribute “city” can be converted to “country”.
3. Data Reduction

• Since data mining is a technique that is used to handle huge amount of data. While working with
huge volume of data, analysis became harder in such cases. In order to get rid of this, we uses
data reduction technique. It aims to increase the storage efficiency and reduce data storage and
analysis costs.
• The various steps to data reduction are:
1. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data cube.

2. Attribute Subset Selection:

The highly relevant attributes should be used, rest all can be discarded. For performing attribute
selection, one can use level of significance and p- value of the attribute.the attribute having p-
value greater than significance level can be discarded.

3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression Models.

4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after
reconstruction from compressed data, original data can be retrieved, such reduction are called
lossless reduction else it is called lossy reduction. The two effective methods of dimensionality
reduction are:Wavelet transforms and PCA (Principal Component Analysis).
WHAT IS FREQUENT PATTERN MINING?

• Frequent Pattern Mining is also known as the Association Rule

Mining. Finding frequent patterns, causal structures and
associations in data sets and is an inquisitive process called
pattern mining. When a series of transactions are given, pattern
mining’s main motive is to find the rules that enable us to
speculate a certain item based on the happening of other items
in the transaction.
• For instance, a set of items, such as pen and ink, often appears
together in a set of data transactions, is called a recurrent item
set. Purchasing a personal computer, later a digital camera, and
then a hard disk, if all these events repeatedly occur in the
history of shopping database, it is a (frequent) sequential
pattern. If the occurrence of a substructure is regular in a graph
database, it is called a (frequent) structural pattern.
HOW DOES FREQUENT PATTERN MINING
SUPPORT BUSINESS ANALYSIS?
• Pattern mining is applicable in assessing data for varied business
operations and industries.
• Basket Data Analysis: It helps in scrutinizing the association between the
items purchased in a single purchase.
• Selling and Cross Marketing: To identify and operate with businesses that
complement our own business and disregard competitors. For instance,
manufacturers and vehicle dealerships get into cross-marketing campaigns
with gas and oil companies for apparent reasons.
• Catalog Design: Catalogs are designed in such a way that the items of
selection act as mutual complements, which results in buying of one item
will eventually lead to purchasing another, therefore act as complements or
are closely related.
• Medical Treatments: The listed and diagnosed set of illnesses of every
patient is depicted as a transaction, from which the diseases that are
probable to occur sequentially/ simultaneously can be anticipated.
TO UNDERSTAND THE VALUE OF THIS
APPLIED TECHNIQUE.
• Let’s look at two business scenarios:
• Case One
• Problem: A retail manager of a store to prepare a better strategy of
product bundling and product placement wants to administer a Market
Basket Analysis.
• Business Solution: Elicited from the rules of pattern mining, the income of
the store can be increased by strategically placing the complementary
products together or in series, which leads to an upswing in sales. Offers
like “Buy this and enjoy % off ”, “Buy this and get this free” or “Buy one
and get three” can be developed based on the rules designed.
• Case Two
• Problem: The fulfil the purpose of analyzing the products which are
sequentially and frequently purchased together beneficial to a bank-
marketing manager.
• Business solution: Based off of the rules of pattern mining, cross-selling of
bank products to every prospective or existing customer to increase and
manifold bank revenue and sales. For example, if personal loan, savings
and credit cards are sequentially bought, then along with a credit card and
personal loan, a new saving account can be cross-sold to a customer.
Mining Frequent Patterns, Association, and
Correlations

• Basic concepts and a road map

• Efficient and scalable frequent itemset mining methods
• Mining various kinds of association rules
• From association mining to correlation analysis
• Constraint-based association mining
• Summary
What Is Frequent Pattern Analysis?
• Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.)
that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining
• Motivation: Finding inherent regularities in data
• What products were often purchased together?— Beer and diapers?!
• What are the subsequent purchases after buying a PC?
• What kinds of DNA are sensitive to this new drug?
• Can we automatically classify web documents?
• Applications
• Basket data analysis, cross-marketing, catalog design, sale campaign analysis,
Web log (click stream) analysis, and DNA sequence analysis.
Why Is Freq. Pattern Mining Important?

• Discloses an intrinsic and important property of data sets

• Forms the foundation for many essential data mining tasks
• Association, correlation, and causality analysis
• Sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in spatiotemporal, multimedia, time-series,
and stream data
• Classification: associative classification
• Cluster analysis: frequent pattern-based clustering
• Data warehousing: iceberg cube and cube-gradient
• Semantic data compression: fascicles
• Broad applications
Basic Concepts: Frequent Patterns and Association
Rules

Transaction-id Items bought • Itemset X = {x1, …, xk}

10 A, B, D • Find all the rules X → Y with minimum
20 A, C, D support and confidence
30 A, D, E • support, s, probability that a
40 B, E, F transaction contains X  Y
50 B, C, D, E, F • confidence, c, conditional
probability that a transaction
Customer
buys both
Customer having X also contains Y
buys diaper

Let supmin = 50%, confmin = 50%

Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3}
Association rules:
Customer A → D (60%, 100%)
buys beer
D → A (60%, 75%)
Closed Patterns and Max-Patterns

• A long pattern contains a combinatorial number of sub-patterns,

e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) = 2100 – 1 =
1.27*1030 sub-patterns!
• Solution: Mine closed patterns and max-patterns instead
• An itemset X is closed if X is frequent and there exists no super-
pattern Y ‫ כ‬X, with the same support as X (proposed by Pasquier,
et al. @ ICDT’99)
• An itemset X is a max-pattern if X is frequent and there exists no
frequent super-pattern Y ‫ כ‬X (proposed by Bayardo @
SIGMOD’98)
• Closed pattern is a lossless compression of freq. patterns
• Reducing the # of patterns and rules
Closed Patterns and Max-Patterns

• Exercise. DB = {<a1, …, a100>, < a1, …, a50>}

• Min_sup = 1.
• What is the set of closed itemset?
• <a1, …, a100>: 1
• < a1, …, a50>: 2
• What is the set of max-pattern?
• <a1, …, a100>: 1
• What is the set of all patterns?
• !!
Scalable Methods for Mining Frequent Patterns

• The downward closure property of frequent patterns

• Any subset of a frequent itemset must be frequent
• If {beer, diaper, nuts} is frequent, so is {beer, diaper}
• i.e., every transaction having {beer, diaper, nuts} also contains
{beer, diaper}
• Scalable mining methods: Three major approaches
• Apriori (Agrawal & Srikant@VLDB’94)
• Freq. pattern growth (FPgrowth—Han, Pei & Yin @SIGMOD’00)
• Vertical data format approach (Charm—Zaki & Hsiao @SDM’02)
Apriori: A Candidate Generation-and-Test Approach

• Apriori pruning principle: If there is any item set which is

infrequent, its superset should not be generated/tested!
• Method:
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Test the candidates against DB
• Terminate when no frequent or candidate set can be generated
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
The Apriori Algorithm

• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Important Details of Apriori
• How to generate candidates?
• Step 1: self-joining Lk
• Step 2: pruning
• How to count supports of candidates?
• Example of Candidate-generation
• L3={abc, abd, acd, ace, bcd}
• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
• Pruning:
• acde is removed because ade is not in L3
• C4={abcd}
How to Count Supports of Candidates?

• Why counting supports of candidates a problem?

• The total number of candidates can be very huge
• One transaction may contain many candidates
• Method:
• Candidate itemsets are stored in a hash-tree
• Leaf node of hash-tree contains a list of itemsets and counts
• Interior node contains a hash table
• Subset function: finds all the candidates contained in a
transaction
Example: Counting Supports of Candidates

Subset function
Transaction: 1 2 3 5 6
3,6,9
1,4,7
2,5,8

1+2356

13+56 234
567
145 345 356 367
136 368
357
12+356
689
124
457 125 159
458
Challenges of Frequent Pattern Mining

• Challenges
• Multiple scans of transaction database
• Huge number of candidates
• Tedious workload of support counting for candidates
• Improving Apriori: general ideas
• Reduce passes of transaction database scans
• Shrink number of candidates
• Facilitate support counting of candidates
Reduce the Number of Candidates
• A k-itemset whose corresponding hashing bucket count is below
the threshold cannot be frequent
• Candidates: a, b, c, d, e
• Hash entries: {ab, ad, ae} {bd, be, de} …
• Frequent 1-itemset: a, b, d, e
• ab is not a candidate 2-itemset if the sum of the count of {ab,
ad, ae} is below the support threshold
Sampling for Frequent Patterns

• Select a sample of original database, mine frequent patterns

within sample using Apriori
• Scan database once to verify frequent itemsets found in
sample, only borders of closure of frequent patterns are
checked
• Example: check abcd instead of ab, ac, …, etc.
• Scan database again to find missed frequent patterns.
MaxMiner: Mining Max-patterns

• 1st scan: find frequent items Tid Items

• A, B, C, D, E 10 A,B,C,D,E
20 B,C,D,E,
• 2nd scan: find support for 30 A,C,D,F
• AB, AC, AD, AE, ABCDE
• BC, BD, BE, BCDE
Potential max-
• CD, CE, CDE, DE, patterns
• Since BCDE is a max-pattern, no need to check BCD, BDE, CDE in
later scan
• R. Bayardo. Efficiently mining long patterns from databases. In
SIGMOD’98
Mining Frequent Closed Patterns: CLOSET

• Flist: list of all frequent items in support ascending order

• Flist: d-a-f-e-c Min_sup=2

• Divide search space TID Items

10 a, c, d, e, f
• Patterns having d 20 a, b, e
30 c, e, f
• Patterns having d but no a, etc. 40 a, c, d, f
50 c, e, f
• Find frequent closed pattern recursively
• Every transaction having d also has cfa → cfad is a frequent
closed pattern
Mining Various Kinds of Association Rules

• Mining multilevel association

• Miming multidimensional association

• Mining quantitative association

• Mining interesting correlation patterns

Mining Multiple-Level Association Rules

• Items often form hierarchies

• Flexible support settings
• Items at the lower level are expected to have lower support
• Exploration of shared multi-level mining

uniform support reduced support

Level 1
Milk Level 1
min_sup = 5%
[support = 10%] min_sup = 5%

Level 2 2% Milk Skim Milk Level 2

min_sup = 5% [support = 6%] [support = 4%] min_sup = 3%
Multi-level Association: Redundancy Filtering

• Some rules may be redundant due to “ancestor” relationships

between items.
• Example
• milk  wheat bread [support = 8%, confidence = 70%]
• 2% milk  wheat bread [support = 2%, confidence = 72%]
• We say the first rule is an ancestor of the second rule.
• A rule is redundant if its support is close to the “expected” value,
based on the rule’s ancestor.
Mining Multi-Dimensional Association

• Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
• Multi-dimensional rules:  2 dimensions or predicates
• Inter-dimension assoc. rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X, “coke”)
• hybrid-dimension assoc. rules (repeated predicates)
age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
• Categorical Attributes: finite number of possible values, no
ordering among values—data cube approach
• Quantitative Attributes: numeric, implicit ordering among values—
discretization, clustering, and gradient approaches
Mining Quantitative Associations

• Techniques can be categorized by how numerical attributes,

such as age or salary are treated
1. Static discretization based on predefined concept hierarchies
(data cube methods)
2. Dynamic discretization based on data distribution
3. Clustering: Distance-based association
• one dimensional clustering then association
4. Deviation: Sex = female => Wage: mean=$7/hr (overall mean = $9)
Static Discretization of Quantitative Attributes

◼ Discretized prior to mining using concept hierarchy.

◼ Numeric values are replaced by ranges.
◼ In relational database, finding all frequent k-predicate sets
will require k or k+1 table scans.
◼ Data cube is well suited for mining. ()

◼ The cells of an n-dimensional

(age) (income) (buys)
cuboid correspond to the
predicate sets.
(age, income) (age,buys) (income,buys)
◼ Mining from data cubes
can be much faster.
(age,income,buys)
Constraint-based (Query-Directed) Mining

• Finding all the patterns in a database autonomously? —

unrealistic!
• The patterns could be too many but not focused!
• Data mining should be an interactive process
• User directs what to be mined using a data mining query
language (or a graphical user interface)
• Constraint-based mining
• User flexibility: provides constraints on what to be mined
• System optimization: explores such constraints for efficient
mining—constraint-based mining
Constraints in Data Mining

• Knowledge type constraint:

• classification, association, etc.
• Data constraint — using SQL-like queries
• find product pairs sold together in stores in Chicago in Dec.’02
• Dimension/level constraint
• in relevance to region, price, brand, customer category
• Rule (or pattern) constraint
• small sales (price < $10) triggers big sales (sum > $200)
• Interestingness constraint
• strong rules: min_support  3%, min_confidence  60%
Constrained Mining vs. Constraint-Based Search

• Constrained mining vs. constraint-based search/reasoning

• Both are aimed at reducing search space
• Finding all patterns satisfying constraints vs. finding some (or
one) answer in constraint-based search in AI
• Constraint-pushing vs. heuristic search
• It is an interesting research problem on how to integrate them
• Constrained mining vs. query processing in DBMS
• Database query processing requires to find all
• Constrained pattern mining shares a similar philosophy as
pushing selections deeply in query processing
Anti-Monotonicity in Constraint Pushing
TDB (min_sup=2)
• Anti-monotonicity TID Transaction

• When an intemset S violates the constraint, 10 a, b, c, d, f

so does any of its superset 20 b, c, d, f, g, h

30 a, c, d, e, f
• sum(S.Price)  v is anti-monotone 40 c, e, f, g
• sum(S.Price)  v is not anti-monotone
Item Profit
• Example. C: range(S.profit)  15 is anti- a 40
monotone b 0
c -20
• Itemset ab violates C
d 10
• So does every superset of ab e -30
f 30
g 20
h -10
Monotonicity for Constraint Pushing
TDB (min_sup=2)
TID Transaction
• Monotonicity
10 a, b, c, d, f
• When an intemset S satisfies the 20 b, c, d, f, g, h
constraint, so does any of its superset 30 a, c, d, e, f
40 c, e, f, g
• sum(S.Price)  v is monotone
• min(S.Price)  v is monotone Item Profit
a 40
• Example. C: range(S.profit)  15 b 0

• Itemset ab satisfies C c -20

d 10
• So does every superset of ab e -30
f 30
g 20
h -10
Succinctness

• Succinctness:
• Given A1, the set of items satisfying a succinctness constraint
C, then any set S satisfying C is based on A1 , i.e., S contains
a subset belonging to A1
• Idea: Without looking at the transaction database, whether
an itemset S satisfies constraint C can be determined based
on the selection of items
• min(S.Price)  v is succinct
• sum(S.Price)  v is not succinct
• Optimization: If C is succinct, C is pre-counting pushable
The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
Scan D
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Naïve Algorithm: Apriori + Constraint

Database D itemset sup.

L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
Scan D
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup Constraint:
{2 3 5} {2 3 5} 2 Sum{S.price} < 5
The Constrained Apriori Algorithm: Push an
Anti-monotone Constraint Deep
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
Scan D
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup Constraint:
{2 3 5} {2 3 5} 2 Sum{S.price} < 5
The Constrained Apriori Algorithm: Push a
Succinct Constraint Deep
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
Scan D
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
not immediately
{1 5} 1 {1 5} to be used
{2 3} 2
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2 {3 5}
{3 5} 2
C3 itemset Scan D L3 itemset sup Constraint:
{2 3 5} {2 3 5} 2 min{S.price } <= 1
Converting “Tough” Constraints

TDB (min_sup=2)
TID Transaction
• Convert tough constraints into anti-
10 a, b, c, d, f
monotone or monotone by properly ordering
20 b, c, d, f, g, h
items
30 a, c, d, e, f
• Examine C: avg(S.profit)  25 40 c, e, f, g

• Order items in value-descending order Item Profit

• <a, f, g, d, b, h, c, e> a 40
• If an itemset afb violates C b 0
• So does afbh, afb* c -20
• It becomes anti-monotone! d 10
e -30
f 30
g 20
h -10
Strongly Convertible Constraints

• avg(X)  25 is convertible anti-monotone w.r.t. item

value descending order R: <a, f, g, d, b, h, c, e>
• If an itemset af violates a constraint C, so does Item Profit
every itemset with af as prefix, such as afd a 40
• avg(X)  25 is convertible monotone w.r.t. item value b 0
ascending order R-1: <e, c, h, b, d, g, f, a> c -20
• If an itemset d satisfies a constraint C, so does d 10
itemsets df and dfa, which having d as a prefix e -30
f 30
• Thus, avg(X)  25 is strongly convertible
g 20
h -10
Can Apriori Handle Convertible Constraint?

• A convertible, not monotone nor anti-monotone nor

succinct constraint cannot be pushed deep into the an
Apriori mining algorithm
• Within the level wise framework, no direct pruning
based on the constraint can be made Item Value

• Itemset df violates constraint C: avg(X)>=25 a 40

b 0
• Since adf satisfies C, Apriori needs df to assemble
c -20
adf, df cannot be pruned
d 10
• But it can be pushed into frequent-pattern growth e -30
framework! f 30
g 20
h -10
Mining With Convertible Constraints
Item Value
• C: avg(X) >= 25, min_sup=2 a 40
f 30
• List items in every transaction in value descending order R:
g 20
<a, f, g, d, b, h, c, e>
d 10
• C is convertible anti-monotone w.r.t. R b 0

• Scan TDB once h -10

c -20
• remove infrequent items
e -30
• Item h is dropped
• Itemsets a and f are good, …
TDB (min_sup=2)
• Projection-based mining TID Transaction
• Imposing an appropriate order on item projection 10 a, f, d, b, c

• Many tough constraints can be converted into (anti)- 20 f, g, d, b, c

30 a, f, d, c, e
monotone
40 f, g, h, c, e
Handling Multiple Constraints

• Different constraints may require different or even conflicting

item-ordering
• If there exists an order R s.t. both C1 and C2 are convertible
w.r.t. R, then there is no conflict between the two convertible
constraints
• If there exists conflict on order of items
• Try to satisfy one constraint first
• Then using the order for the other constraint to mine
frequent itemsets in the corresponding projected database
What Constraints Are Convertible?

Convertible anti- Convertible Strongly

Constraint monotone monotone convertible

avg(S)  ,  v Yes Yes Yes

median(S)  ,  v Yes Yes Yes

sum(S)  v (items could be of any value,

Yes No No
v  0)
sum(S)  v (items could be of any value,
No Yes No
v  0)
sum(S)  v (items could be of any value,
No Yes No
v  0)
sum(S)  v (items could be of any value,
Yes No No
v  0)
……
Constraint-Based Mining—A General Picture

Constraint Antimonotone Monotone Succinct

vS no yes yes
SV no yes yes

SV yes no yes

min(S)  v no yes yes

min(S)  v yes no yes

max(S)  v yes no yes

max(S)  v no yes yes

count(S)  v yes no weakly

count(S)  v no yes weakly

sum(S)  v ( a  S, a  0 ) yes no no
sum(S)  v ( a  S, a  0 ) no yes no

range(S)  v yes no no
range(S)  v no yes no

avg(S)  v,   { =, ,  } convertible convertible no

support(S)   yes no no

support(S)   no yes no
A Classification of Constraints

Monotone
Antimonotone

Strongly
convertible
Succinct

Convertible Convertible
anti-monotone monotone

Inconvertible
Frequent-Pattern Mining: Summary

• Frequent pattern mining—an important task in data mining

• Scalable frequent pattern mining methods
• Apriori (Candidate generation & test)
• Projection-based (FPgrowth, CLOSET+, ...)
• Vertical format approach (CHARM, ...)
▪ Mining a variety of rules and interesting patterns
▪ Constraint-based mining
▪ Mining sequential and structured patterns
▪ Extensions and applications
Frequent-Pattern Mining: Research Problems

• Mining fault-tolerant frequent, sequential and structured

patterns
• Patterns allows limited faults (insertion, deletion, mutation)
• Mining truly interesting patterns
• Surprising, novel, concise, …
• Application exploration
• E.g., DNA sequence analysis and bio-pattern classification
• “Invisible” data mining
Thank You!

Telkom Claim Form - PDF - Adobe Acrobat Pro
67% (12)
Telkom Claim Form - PDF - Adobe Acrobat Pro
1 page
IDS Unit 5 Visualization
No ratings yet
IDS Unit 5 Visualization
24 pages
Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Chapter 06-Statistical Methods in Quality Management: True/False
No ratings yet
Chapter 06-Statistical Methods in Quality Management: True/False
20 pages
Sony X90J Final
No ratings yet
Sony X90J Final
3 pages
Data Mining: Concepts and Techniques: - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Introduction
44 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
CH 6
No ratings yet
CH 6
72 pages
Data Mining and Ware Housing
No ratings yet
Data Mining and Ware Housing
130 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
DMDW-Unit II
No ratings yet
DMDW-Unit II
19 pages
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
DataMining S
No ratings yet
DataMining S
103 pages
4.7.1 - Data Warehousing Mining & Business Intelligence
No ratings yet
4.7.1 - Data Warehousing Mining & Business Intelligence
3 pages
Lecture 4 - Introduction To UML
No ratings yet
Lecture 4 - Introduction To UML
29 pages
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Data Mining: Books
No ratings yet
Data Mining: Books
14 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
2 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
Unit 4 - Data Cube Technology
No ratings yet
Unit 4 - Data Cube Technology
27 pages
Unit-3: Non-Linear Data Structure
No ratings yet
Unit-3: Non-Linear Data Structure
23 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Data Mining
No ratings yet
Data Mining
27 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
2 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
Unit-3 DMDW
No ratings yet
Unit-3 DMDW
36 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
No ratings yet
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
15 pages
Unit 4 Part A
No ratings yet
Unit 4 Part A
51 pages
Bahria University, Islamabad Campus: Department of Computer Science
No ratings yet
Bahria University, Islamabad Campus: Department of Computer Science
3 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
3 Business Analysis in Data Mining L6 7 8-9-10
No ratings yet
3 Business Analysis in Data Mining L6 7 8-9-10
39 pages
2 Data Warehousing Components L3 L4 L5
No ratings yet
2 Data Warehousing Components L3 L4 L5
26 pages
MTE 1 Syllabus
No ratings yet
MTE 1 Syllabus
3 pages
Csa
No ratings yet
Csa
8 pages
Erro
No ratings yet
Erro
9 pages
Apple Product Information Sheet 20 Watt Hours
No ratings yet
Apple Product Information Sheet 20 Watt Hours
8 pages
Final Project Report Mobile Phone Jammer
No ratings yet
Final Project Report Mobile Phone Jammer
19 pages
220644-665-666-DLD Lab 09 F.. (2) .Docxooooooooooo
No ratings yet
220644-665-666-DLD Lab 09 F.. (2) .Docxooooooooooo
8 pages
Seven Direct SD50 PH Ion Meter Manual
No ratings yet
Seven Direct SD50 PH Ion Meter Manual
70 pages
2021 TST First Cycle Teaching Guide
No ratings yet
2021 TST First Cycle Teaching Guide
281 pages
Student Attendance Tracking Information System by Maureen
No ratings yet
Student Attendance Tracking Information System by Maureen
27 pages
Arduino Based Digital Thermometer
67% (3)
Arduino Based Digital Thermometer
3 pages
Presentation On Tally - ERP 9
No ratings yet
Presentation On Tally - ERP 9
10 pages
SuccessFactors With Microsoft 365
No ratings yet
SuccessFactors With Microsoft 365
41 pages
Idap 2019 8875953
No ratings yet
Idap 2019 8875953
6 pages
Office Manager - Admin Assistant Job Description1
No ratings yet
Office Manager - Admin Assistant Job Description1
2 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
3 pages
COMMUNICATION Skills 1
No ratings yet
COMMUNICATION Skills 1
232 pages
ZedBoard HW UG v1 1 PDF
No ratings yet
ZedBoard HW UG v1 1 PDF
38 pages
Wireless Stick Lite Pinout Diagram: Notes
100% (1)
Wireless Stick Lite Pinout Diagram: Notes
1 page
PHP Programming Unit III
No ratings yet
PHP Programming Unit III
23 pages
Structured COBOL Programming: Nassau Community College
No ratings yet
Structured COBOL Programming: Nassau Community College
62 pages
Readme
No ratings yet
Readme
67 pages
Disaster Management and Dam Monitoring System
No ratings yet
Disaster Management and Dam Monitoring System
55 pages
Breast Cancer Prediction Using Machine Learning
No ratings yet
Breast Cancer Prediction Using Machine Learning
11 pages
DBMS Final Print
No ratings yet
DBMS Final Print
10 pages
Fingerprint Lock System
No ratings yet
Fingerprint Lock System
9 pages
Font Type WP Hebrew David (TrueType)
No ratings yet
Font Type WP Hebrew David (TrueType)
1 page
DocuCentre-VI C3370
No ratings yet
DocuCentre-VI C3370
16 pages
Simplex - Installation Instructions TFX Addressable Loop Interface
No ratings yet
Simplex - Installation Instructions TFX Addressable Loop Interface
16 pages
Heba Compiler Design Book - 2025
No ratings yet
Heba Compiler Design Book - 2025
133 pages