0% found this document useful (0 votes)
54 views

Recommender System - Module 2 - Data Mining Techniques in Recommender System

This document provides an overview of data mining techniques used in recommender systems. It discusses the syllabus which includes introduction to data mining, data preprocessing, similarity measures, dimensionality reduction, clustering, classification algorithms and association rule mining. The objectives are to understand how these techniques can be applied in recommender systems and evaluate their use. The outcomes include being able to implement various data mining techniques and evaluate their use for recommender systems. It then provides details on data preprocessing, popular data mining techniques like clustering, classification and association rule mining.

Uploaded by

DainikMitra
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Recommender System - Module 2 - Data Mining Techniques in Recommender System

This document provides an overview of data mining techniques used in recommender systems. It discusses the syllabus which includes introduction to data mining, data preprocessing, similarity measures, dimensionality reduction, clustering, classification algorithms and association rule mining. The objectives are to understand how these techniques can be applied in recommender systems and evaluate their use. The outcomes include being able to implement various data mining techniques and evaluate their use for recommender systems. It then provides details on data preprocessing, popular data mining techniques like clustering, classification and association rule mining.

Uploaded by

DainikMitra
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Subject: Recommender System

Module 02 Name :- Data Mining Techniques in


Recommender System
Data Mining Techniques in Recommender System

Aim:
To facilitate students to understand data mining technique used in recommender system and implementation
of different rules uses in data mining.

2
Data Mining Techniques in Recommender System

Syllabus:
• Introduction to Data mining techniques
• data pre-processing,
• data mining techniques used in recommender system:
• similarity measures,
• sampling,
• Dimensionality reduction techniques,
• denoising, k – means clustering,
• support vector machine
• ensemble methods
• rule based classifiers,
• ANN
• Bayesian Classifiers
• association rule mining. 3
Data Mining Techniques in Recommender System

Table of Content
• Introduction to Data mining techniques
• data pre-processing,
• data mining techniques used in recommender system
• similarity measures,
• sampling,
• Dimensionality reduction techniques,
• denoising,
• k – means clustering,
• support vector machine
• ensemble methods
• rule based classifiers,
• ANN
• Bayesian Classifiers
• association rule mining. 4
Data Mining Techniques in Recommender System

Objective:
Objective of this unit is to learn:-
 Learn basic concept of data mining in recommender system.
 Understand different data mining in technique .
 Analysis of different data mining technique as per requirement in recommender system,
 Evaluate different data mining for recommender system as multi-disciplinary field.
 Understand data mining concept implement in emerging topics and challenges in recommender
system

5
Data Mining Techniques in Recommender System

Outcome:
Outcome after completion of this unit student able to:-
 Students will able how understand basic concept of data mining .
 Students will able to implement of different kinds of data mining.
 Students can evaluate different type of data mining technique in recommender system.
 Students can evaluate different data mining association rules use in recommender system as multi-
disciplinary field.
 Students will able to understand basic requirement of data mining technique implement with
emerging topics and challenges in recommender system.

6
Data Mining Techniques in Recommender System

Introduction to Data mining techniques:-


Data mining techniques are set of  algorithms intended to find the hidden knowledge from the data. Usage of
data mining techniques will purely depend on the problem we were going to solve. Some of the popular data
mining techniques are classification algorithms, prediction analysis algorithms, clustering techniques

Source: https://dataaspirant.com/data-mining/ 7
Introduction to Data mining techniques

8
Source https://data-flair.training/blogs/data-mining-and-data-science/
Introduction to Data mining techniques

• Mining in its casual terms refers to the extraction of valuable minerals. In the 21st century, Data is the
most expensive mineral. To extract usable data from a given set of raw data, we use Data Mining.
• Through Data Mining, we extract useful information in a given dataset to extract patterns and identify
relationships.
• The process of data mining is a complex process that involves intensive data warehousing as well as
powerful computational technologies.
• Furthermore, data mining is not only limited to the extraction of data but is also used for
transformation, cleaning, data integration, and pattern analysis. Another terminology for Data Mining
is Knowledge Discovery.

Source: https://data-flair.training/blogs/data-mining-and-data-science/

9
Introduction to Data mining techniques

There are various important parameters in Data Mining, such as association rules, classification,
clustering, and forecasting. Some of the key features of Data Mining are –
• Prediction of Patterns based on trends in the data.
• Calculating the predictions for the outcomes.
• Creating information in response to the analysis
• Focusing on greater databases.
• Clustering the visual data

Source: https://data-flair.training/blogs/data-mining-and-data-science/

10
Introduction to Data mining techniques

Data Mining Steps:-


• Step 1: Data Cleaning – In this step, data is cleaned such that there is no noise or irregularity present
within the data.
• Step 2: Data Integration – In the process of Data Integration, we combine multiple data sources into
one.
• Step 3: Data Selection – In this step, we extract our data from the database.
• Step 4: Data Transformation – In this step, we transform the data to perform summary analysis as
well as aggregatory operations.
• Step 5: Data Mining – In this step, we extract useful data from the pool of existing data.
• Step 6: Pattern Evaluation – We analyze several patterns that are present in the data.
• Step 7: Knowledge Representation – In the final step, we represent the knowledge to the user in the
form of trees, tables, graphs, and matrices.

Source: https://data-flair.training/blogs/data-mining-and-data-science/

11
Introduction to Data mining techniques

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html 12
Data pre-processing
When we talk about data, we usually think of some large datasets with huge number of rows and columns.
While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured
Tables, Images, Audio files, Videos etc..
Machines don’t understand free text, image or video data as it is, they understand 1s and 0s. So it probably
won’t be good enough if we put on a slideshow of all our images and expect our machine learning model to get
trained just by that!

Source: https://data-flair.training/blogs/data-mining-and-data-science/

13
Data pre-processing

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining 14
/
Data pre-processing

1. Data Cleaning: 
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It
involves handling of missing data, noisy data etc. 
(a) Missing Data: 
This situation arises when some data is missing in the data. It can be handled in various ways. 
Some of them are: Ignore the tuples: 
This approach is suitable only when the dataset we have is quite large and multiple values are missing
within a tuple. 
 
• Fill the Missing values: 
There are various ways to do this task. You can choose to fill the missing values manually, by attribute
mean or the most probable value. 

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining

  15
Data pre-processing

(b) Noisy Data: 


Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated due to faulty data
collection, data entry errors etc. It can be handled in following ways : 
• Binning Method: 
This method works on sorted data in order to smooth it. The whole data is divided into segments of equal
size and then various methods are performed to complete the task. Each segmented is handled separately.
One can replace all data in a segment by its mean or boundary values can be used to complete the task. 
• Regression: 
Here data can be made smooth by fitting it to a regression function.The regression used may be linear
(having one independent variable) or multiple (having multiple independent variables). 
• Clustering: 
This approach groups the similar data in a cluster. The outliers may be undetected or it will fall outside the
clusters. 

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining

  16
Data pre-processing

Data Transformation: 
This step is taken in order to transform the data in appropriate forms suitable for mining process. This involves
following ways: 
1. Normalization: 
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0) 
2. Attribute Selection: 
In this strategy, new attributes are constructed from the given set of attributes to help the mining process. 
3. Discretization: 
This is done to replace the raw values of numeric attribute by interval levels or conceptual levels. 
4. Concept Hierarchy Generation: 
Here attributes are converted from lower level to higher level in hierarchy. For Example-The attribute “city”
can be converted to “country”. 

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining

17
Data pre-processing

Data Reduction: 
Since data mining is a technique that is used to handle huge amount of data. While working with huge volume of
data, analysis became harder in such cases. In order to get rid of this, we uses data reduction technique. It aims to
increase the storage efficiency and reduce data storage and analysis costs. 
• The various steps to data reduction are: 
1. Data Cube Aggregation: 
Aggregation operation is applied to data for the construction of the data cube. 
2. Attribute Subset Selection: 
The highly relevant attributes should be used, rest all can be discarded. For performing attribute selection,
one can use level of significance and p- value of the attribute.the attribute having p-value greater than
significance level can be discarded. 
3. Numerosity Reduction: 
This enable to store the model of data instead of whole data, for example: Regression Models.

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining

18
Data pre-processing

4. Dimensionality Reduction: 
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after reconstruction from
compressed data, original data can be retrieved, such reduction are called lossless reduction else it is called
lossy reduction. The two effective methods of dimensionality reduction are:Wavelet transforms and PCA
(Principal Component Analysis). 

Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining

19
Data mining techniques used in recommender system:

(I) Similarity Measures


• Similarity is measured using the distance metric. Nearest points are the most similar and farthest points are the
least relevant. The similarity is subjective and is highly dependent on the domain and application. For example,
two movies are similar because of genre or length or cast. Care should be taken when calculating distance
across dimensions/features that are unrelated. The relative values of each element must be normalized, or one
feature could end up dominating the distance calculation.

Source: https://builtin.com/data-science/recommender-systems

20
Data mining techniques used in recommender system:

(II) Sampling,
We refer to a subset of observations from a larger population as a sample. Sampling, however, is used in
reference to the group of observations that data will be collected from for our research. Therefore, Data Sampling
may be defined as a technique leveraged by practitioners to select a subset of observations that is representative
of the larger population.
• There are 2 main data sampling strategies:
• Probability sampling involves random selection. All of the observations within the data have a chance to be
selected, therefore strong statistical inferences could be made about the entire group.
• Non-probability sampling does not involve random selection. Instead, it’s based on convenience or other
criteria. As a result, some observations do not have a chance of being selected regardless of the number of
samples that are built.

Source: https://builtin.com/data-science/recommender-systems

21
Dimensionality reduction techniques

We are generating a tremendous amount of data daily. In fact, 90% of the data in the world has been generated in
the last 3-4 years!. Below are just some of the examples of the kind of data being collected:
• Facebook collects data of what you like, share, post, places you visit, restaurants you like, etc.
• Your smartphone apps collect a lot of personal information about you
• Amazon collects data of what you buy, view, click, etc. on their site
• Casinos keep a track of every move each customer makes

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
https://builtin.com/data-science/recommender-systems

22
Dimensionality reduction techniques

• Why is Dimensionality Reduction required?


• Here are some of the benefits of applying dimensionality reduction to a dataset:
• Space required to store the data is reduced as the number of dimensions comes down
• Less dimensions lead to less computation/training time
• Some algorithms do not perform well when we have a large dimensions. So reducing these dimensions needs
to happen for the algorithm to be useful
• It helps in visualizing data. As discussed earlier, it is very difficult to visualize data in higher dimensions so
reducing our space to 2D or 3D may allow us to plot and observe patterns more clearly

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems

23
Dimensionality reduction techniques

Common Dimensionality Reduction Techniques


• Dimensionality reduction can be done in two different ways:
• By only keeping the most relevant variables from the original dataset (this technique is called feature selection)
• By finding a smaller set of new variables, each being a combination of the input variables, containing basically
the same information as the input variables (this technique is called dimensionality reduction)

(1) Missing Value Ratio


• Suppose you’re given a dataset. What would be your first step? You would naturally want to explore the data
first before building model. While exploring the data, you find that your dataset has some missing values. Now
what? You will try to find out the reason for these missing values and then impute them or drop the variables
entirely which have missing values (using appropriate methods)

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems

24
Dimensionality reduction techniques

(2) Low Variance Filter


• Consider a variable in our dataset where all the observations have the same value, say 1. If we use this variable,
do you think it can improve the model we will build? The answer is no, because this variable will have zero
variance.
• So, we need to calculate the variance of each variable we are given. Then drop the variables having low
variance as compared to other variables in our dataset. The reason for doing this, as I mentioned above, is that
variables with a low variance will not affect the target variable.

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems

25
Dimensionality reduction techniques

(3)High Correlation filter


High correlation between two variables means they have similar trends and are likely to carry similar
information. This can bring down the performance of some models drastically (linear and logistic regression
models, for instance). We can calculate the correlation between independent numerical variables that are
numerical in nature. If the correlation coefficient crosses a certain threshold value, we can drop one of the
variables (dropping a variable is highly subjective and should always be done keeping the domain in mind).

(4) Random Forest


Random Forest is one of the most widely used algorithms for feature selection. It comes packaged with in-built
feature importance so you don’t need to program that separately. This helps us select a smaller subset of
features.

26
Dimensionality reduction techniques

(5) Backward Feature Elimination


1. Follow the below steps to understand and use the ‘Backward Feature Elimination’ technique:
2. We first take all the n variables present in our dataset and train the model using them
3. We then calculate the performance of the model
4. Now, we compute the performance of the model after eliminating each variable (n times), i.e., we drop one
variable every time and train the model on the remaining n-1 variables
5. We identify the variable whose removal has produced the smallest (or no) change in the performance of the
model, and then drop that variable
6. Repeat this process until no variable can be dropped

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 27
Dimensionality reduction techniques

(6) Forward Feature Selection


• This is the opposite process of the Backward Feature Elimination we saw above. Instead of eliminating
features, we try to find the best features which improve the performance of the model. This technique works as
follows:
• We start with a single feature. Essentially, we train the model n number of times using each feature separately
• The variable giving the best performance is selected as the starting variable
• Then we repeat this process and add one variable at a time. The variable that produces the highest increase in
performance is retained
• We repeat this process until no significant improvement is seen in the model’s performance

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
28
Dimensionality reduction techniques

(7) Factor Analysis


Suppose we have two variables: Income and Education. These variables will potentially have a high correlation
as people with a higher education level tend to have significantly higher income, and vice versa
(8) Principal Component Analysis (PCA)
• PCA is a technique which helps us in extracting a new set of variables from an existing large set of variables.
These newly extracted variables are called Principal Components. 
• A principal component is a linear combination of the original variables
• Principal components are extracted in such a way that the first principal component explains maximum
variance in the dataset
• Second principal component tries to explain the remaining variance in the dataset and is uncorrelated to the
first principal component
• Third principal component tries to explain the variance which is not explained by the first two principal
components and so on

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 29
Dimensionality reduction techniques

(7) Factor Analysis


Suppose we have two variables: Income and Education. These variables will potentially have a high correlation
as people with a higher education level tend to have significantly higher income, and vice versa
(8) Principal Component Analysis (PCA)
• PCA is a technique which helps us in extracting a new set of variables from an existing large set of variables.
These newly extracted variables are called Principal Components. 
• A principal component is a linear combination of the original variables
• Principal components are extracted in such a way that the first principal component explains maximum
variance in the dataset
• Second principal component tries to explain the remaining variance in the dataset and is uncorrelated to the
first principal component
• Third principal component tries to explain the variance which is not explained by the first two principal
components and so on

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 30
Denoising

Denoising is any signal processing method which reconstruct a signal from a noisy one. Its goal is to remove
noise and preserve useful information
Image enhancement is an important research topic in image processing and computer vision
Noise model
• There are many sources of noise in images, and these noises come from various aspects such as image
acquisition, transmission, and compression. The types of noise are also different, such as salt and pepper noise,
Gaussian noise, etc. There are different processing algorithms for different noises.

Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems

31
k – means clustering

• A cluster refers to a collection of data points aggregated together because of certain similarities.
• You’ll define a target number k, which refers to the number of centroids you need in the dataset. A
centroid is the imaginary or real location representing the center of the cluster.
• Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares.
• In other words, the K-means algorithm identifies k number of centroids, and then allocates every data
point to the nearest cluster, while keeping the centroids as small as possible.
• The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
• K-means algorithm in data mining starts with a first group of randomly selected centroids, which are
used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to
optimize the positions of the centroids
• It halts creating and optimizing clusters when either:
• The centroids have stabilized — there is no change in their values because the clustering has been
successful.
• The defined number of iterations has been achieved.

32
support vector machine

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both
classification or regression challenges. However, it is mostly used in classification problems. In the SVM
algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you
have) with the value of each feature being the value of a particular coordinate. Then, we perform
classification by finding the hyper-plane that differentiates the two classes very well (look at the below
snapshot).

.
• .

Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a frontier which best
segregates the two classes (hyper-plane/ line)

33
support vector machine

• A simple linear SVM classifier works by making a straight line between two classes. That means all of
the data points on one side of the line will represent a category and the data points on the other side of
the line will be put into a different category. This means there can be an infinite number of lines to
choose from.
• What makes the linear SVM algorithm better than some of the other algorithms, like k-nearest
neighbors, is that it chooses the best line to classify your data points. It chooses the line that separates
the data and is the furthest away from the closet data points as possible.
• A 2-D example helps to make sense of all the machine learning jargon. Basically you have some data
points on a grid. You're trying to separate these data points by the category they should fit in, but you
don't want to have any data in the wrong category. That means you're trying to find the line between the
two closest points that keeps the other data points separated.
• So the two closest data points give you the support vectors you'll use to find that line. That line is called
the decision boundary.

34
support vector machine

Types of SVMs
There are two different types of SVMs, each used for different things:
 Simple SVM: Typically used for linear regression and classification problems.
 Kernel SVM: Has more flexibility for non-linear data because you can add more features to fit a hyperplane instead of a two-
dimensional space.
Pros:-
• Effective on datasets with multiple features, like financial or medical data.
• Effective in cases where number of features is greater than the number of data points.
• Uses a subset of training points in the decision function called support vectors which makes it memory efficient.
• Different kernel functions can be specified for the decision function. You can use common kernels, but it's also possible to
specify custom kernels.
Cons:-
• If the number of features is a lot bigger than the number of data points, avoiding over-fitting when choosing kernel functions
and regularization term is crucial.
• SVMs don't directly provide probability estimates. Those are calculated using an expensive five-fold cross-validation.
• Works best on small sample sets because of its high training time.

35
Ensemble methods
The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in
order to improve generalizability / robustness over a single estimator.

Two families of ensemble methods are usually distinguished:

In averaging methods, the driving principle is to build several estimators independently and then to average their predictions.
On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.

Examples: Bagging methods, Forests of randomized trees, …

By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined
estimator. The motivation is to combine several weak models to produce a powerful ensemble.

Examples: AdaBoost, Gradient Tree Boosting, …

Source: https://scikit-learn.org/stable/modules/ensemble.html

36
Rule-based classifiers
Rule-based classifiers are just another type of classifier which makes the class decision depending by using various “if..else”
rules. These rules are easily interpretable and thus these classifiers are generally used to generate descriptive models. The
condition used with “if” is called the antecedent and the predicted class of each rule is called the consequent.

Properties of rule-based classifiers:


Coverage: The percentage of records which satisfy the antecedent conditions of a particular rule.
The rules generated by the rule-based classifiers are generally not mutually exclusive, i.e. many rules can cover the same
record.
The rules generated by the rule-based classifiers may not be exhaustive, i.e. there may be some records which are not covered
by any of the rules.
The decision boundaries created by them is linear, but these can be much more complex than the decision tree because the
many rules are triggered for the same record.

Source: https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/#:~:text=Rule%2Dbased%20classifiers%20are%20just,used%20to%20generate%20descriptive%20models.

37
ANN

A neural network is a machine learning algorithm based on the model of a human neuron. The human brain consists
of millions of neurons. It sends and process signals in the form of electrical and chemical signals. These neurons are
connected with a special structure known as synapses. Synapses allow neurons to pass signals. From large numbers
of simulated neurons neural networks forms.
An Artificial Neural Network is an information processing technique. It works like the way human brain processes
information. ANN includes a large number of connected processing units that work together to process information.
They also generate meaningful results from it.
We can apply Neural network not only for classification. It can also apply for regression of continuous target
attributes.
Neural networks find great application in data mining used in sectors. For example economics, forensics, etc and
for pattern recognition. It can be also used for data classification in a large amount of data after careful training.
A neural network may contain the following 3 layers:
Input layer – The activity of the input units represents the raw information that can feed into the network.
Hidden layer – To determine the activity of each hidden unit. The activities of the input units and the weights on
the connections between the input and the hidden units. There may be one or more hidden layers.
Output layer – The behavior of the output units depends on the activity of the hidden units and the weights
between the hidden and output units
38
Source: https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/#:~:text=Rule%2Dbased%20classifiers%20are%20just,used%20to%20generate%20descriptive%20models.
ANN

Advantages and Disadvantages of Neural Networks


Let us see few advantages and disadvantages of neural networks:
Neural networks perform well with linear and nonlinear data but a common criticism of neural networks,
particularly in robotics, is that they require a large diversity of training for real-world operation. This is so because
any learning machine needs sufficient representative examples in order to capture the underlying structure that
allows it to generalize to new cases.
Neural networks works even if one or few units fail to respond to network but to implement large and effective
software neural networks, much processing and storage resources need to be committed. While the brain has
hardware tailored to the task of processing signals through a graph of neurons, simulating even a most simplified
form on Von Neumann technology may compel a neural network designer to fill millions of database rows for its
connections – which can consume vast amounts of computer memory and hard disk space.
Neural network learns from the analyzed data and does not require to reprogramming but they are referred to as
black box” models, and provide very little insight into what these models really do. The user just needs to feed it
input and watch it train and await the output.

Source: https://www.datasciencecentral.com/profiles/blogs/artificial-neural-network-ann-in-machine-learning

39
Bayesian classification
Bayesian classification uses Bayes theorem to predict the occurrence of any event. Bayesian classifiers are the
statistical classifiers with the Bayesian probability understandings. The theory expresses how a level of belief,
expressed as a probability.
Bayes theorem came into existence after Thomas Bayes, who first utilized conditional probability to provide an
algorithm that uses evidence to calculate limits on an unknown parameter.
Bayes's theorem is expressed mathematically by the following equation that is given below.
.

Where:
P(A|B) – the probability of event A occurring, given event B has occurred
P(B|A) – the probability of event B occurring, given event A has occurred
P(A) – the probability of event A
P(B) – the probability of event B
Source: https://corporatefinanceinstitute.com/resources/knowledge/other/bayes-theorem/ 40
Association rule mining
Association rule mining finds interesting associations and relationships among large sets of data items. This rule
shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show associations between items.It
allows retailers to identify relationships between the items that people buy together frequently.
Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of
other items in the transaction.
Association Rule Mining is one of the ways to find patterns in data. It finds:
• Features (dimensions) which occur together
• Features (dimensions) which are “correlated”
What does the value of one feature tell us about the value of another feature? For example, people who buy diapers
are likely to buy baby powder. Or we can rephrase the statement by saying: If (people buy diaper), then (they buy
baby powder). Note the if, then rule. This does not necessarily mean that if people buy baby powder, they buy
diaper. In General, we can say that if condition A tends to B it does not necessarily mean that B tends to A. Watch
the directionality!

Source: https://towardsdatascience.com/association-rule-mining-be4122fc1793

41
Data Mining Techniques in Recommender System

1. Developing an understanding of
a) the application domain
b) the relevant prior knowledge
c) the goals of the end-user
2. Creating a target data set: selecting a data set, or focusing on a subset of variables, or data samples,
on which discovery is to be performed.
3. Data cleaning and preprocessing.
a) Removal of noise or outliers.
b) Collecting necessary information to model or account for noise.
c) Strategies for handling missing data fields.
d) Accounting for time sequence information and known changes.

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/

42
Introduction to Data mining techniques

4. Data reduction and projection.


a) Finding useful features to represent the data depending on the goal of the task.
b) Using dimensionality reduction or transformation methods to reduce the effective number of
variables under consideration or to find invariant representations for the data.
5. Choosing the data mining task.
Deciding whether the goal of the KDD process is classification, regression, clustering, etc.
6. Choosing the data mining algorithm(s).
c) Selecting method(s) to be used for searching for patterns in the data.
d) Deciding which models and parameters may be appropriate.
e) Matching a particular data mining method with the overall criteria of the KDD process.

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/

43
Introduction to Data mining techniques

7. Data mining.
• Searching for patterns of interest in a particular representational form or a set of
such representations as classification rules or trees, regression, clustering, and
so forth.
8. Interpreting mined patterns.
9. Consolidating discovered knowledge.

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/

44
MCQ:

Q.1 Which of the following refers to the problem of finding abstracted patterns (or structures) in the
unlabeled data?
a) Supervised learning
b) Unsupervised learning
c) Hybrid learning
d) Reinforcement learning

Answer: b
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find
the hidden structured and patterns in the given unlabeled data

45
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:

Q. 2 Which one of the following refers to querying the unstructured textual data?
a) Information access
b) Information update
c) Information retrieval
d) Information manipulation

Answer: c
Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand
information retrieval as an activity (or process) in which the tasks of obtaining information from system
recourses that are relevant to the information required from the huge source of information.

46
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:

Q.3 Which of the following is an essential process in which the intelligent methods are applied to extract
data patterns?
a) Warehousing
b) Data Mining
c) Text Mining
d) Data Selection

Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract
meaningful data from the huge collection ( or set) of data.

47
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:

Q.4 What is KDD in data mining?


a) Knowledge Discovery Database
b) Knowledge Discovery Data
c) Knowledge Data definition
d) Knowledge data house

Answer: a
Explanation: The term KDD or Knowledge Discovery Database is refers to a broad process of discovering
the knowledge in the data and emphasizes the high-level applications of specific Data Mining techniques as
well.

48
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:

Q.5 For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
a) In order to maintain consistency
b) For authentication
c) For data access
d) To obtain the queries response

Answer: d
Explanation:
Whenever a query is fired, the response of the query would be put very earlier. So, for the query response,
the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details,
consider the following example:
Suppose that to get some information about something, you write a keyword in Google search. Google's
analytical tools will then pre-compute large amounts of data to provide a quick output related to the
keywords you have written.

49
MCQ:

Q.6 What are the functions of Data Mining?


a) Association and correctional analysis classification
b) Prediction and characterization
c) Cluster analysis and Evolution analysis
d) All of the above

Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of
tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and
evolution. Still, the association and correctional analysis classification are also one of the important
functionalities of data mining.

50
MCQ:

Q.7 Which one of the following statements about the K-means clustering is incorrect?
a) The goal of the k-means clustering is to partition (n) observation into (k) clusters
b) K-means clustering can be defined as the method of quantization
c) The nearest neighbor is the same as the K-means
d) All of the above

Answer: c
Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.

51
MCQ:

Q.8 The self-organizing maps can also be considered as the instance of _________ type of learning.
a) Supervised learning
b) Unsupervised learning
c) Missing data imputation
d) Both A & C

Answer: b
Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial
Neural Network which is trained through unsupervised learning

52
Assignment

1. What is role of data mining in recommender system?


2. Explain different data mining technique?
3. How similarity measure play important role in recommender process?
4. Write a short note: (I) k – means clustering (II) support vector machine
5. What do you mean by ANN? How it is used in data mining process
6. When Bayesian Classifiers is used in data mining process of recommender system

53
Summary
 Data pre processing is an information mining strategy that includes changing crude
information into a justifiable configuration
 A Bayesian classifier depends on the possibility that the job of a (characteristic) class is to
foresee the upsides of highlights for individuals from that class
 Rule based classifer calculations remove proficiencies as rules from the characterization
model, which are not difficult to appreciate and extremely expressive
 Association rule mining rules are made via scanning information for incessant assuming
examples and utilizing the standards backing and certainty to distinguish the main
connections.
Documents Links
Document Link

1. LECTURE NOTES ON DATA MINING& DATA https://www.vssut.ac.in/lecture_notes/lecture1428550844.pdf


WAREHOUSING

2. Data mining https://www.tutorialspoint.com/data_mining/data_mining_t


utorial.pdf

3. Data Mining: Practical Machine Learning Tools http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/Th


and Techniques e-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiaw
ei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-T
echniques-3rd-Edition-Morgan-Kaufmann-2011.pdf
Introduction to Web Security

• Video Links
1 3 Approaches To Building A Recommendation https://towardsdatascience.com/3-approaches-to-build-a-reco
System mmendation-system-ce6a7a404576

2. Recommender systems explained https://medium.com/recombee-blog/recommender-systems-


explained-d98e8221f468

3. Recommender Systems: Content-Based https://www.coursera.org/lecture/text-retrieval/lesson-6-5-r


Filtering - Part 1 ecommender-systems-content-based-filtering-part-1-QORNe

56
Introduction to Web Security

• E- Resource
Topic URL

https://www.cse.iitk.ac.in/users/nsrivast/HCC/Recom
Recommender Systems Handbook Francesco Ricci · mender_systems_handbook.pdf
Lior Rokach · Bracha Shapira · Paul B. Kantor Editors

Recommendation Systems http://infolab.stanford.edu/~ullman/mmds/ch9.pdf

Recommendation systems: Principles, methods and https://www.researchgate.net/publication/283180981


evaluation _Recommendation_systems_Principles_methods_and
_evaluation

57
58

You might also like