Recommender System - Module 2 - Data Mining Techniques in Recommender System
Recommender System - Module 2 - Data Mining Techniques in Recommender System
Aim:
To facilitate students to understand data mining technique used in recommender system and implementation
of different rules uses in data mining.
2
Data Mining Techniques in Recommender System
Syllabus:
• Introduction to Data mining techniques
• data pre-processing,
• data mining techniques used in recommender system:
• similarity measures,
• sampling,
• Dimensionality reduction techniques,
• denoising, k – means clustering,
• support vector machine
• ensemble methods
• rule based classifiers,
• ANN
• Bayesian Classifiers
• association rule mining. 3
Data Mining Techniques in Recommender System
Table of Content
• Introduction to Data mining techniques
• data pre-processing,
• data mining techniques used in recommender system
• similarity measures,
• sampling,
• Dimensionality reduction techniques,
• denoising,
• k – means clustering,
• support vector machine
• ensemble methods
• rule based classifiers,
• ANN
• Bayesian Classifiers
• association rule mining. 4
Data Mining Techniques in Recommender System
Objective:
Objective of this unit is to learn:-
Learn basic concept of data mining in recommender system.
Understand different data mining in technique .
Analysis of different data mining technique as per requirement in recommender system,
Evaluate different data mining for recommender system as multi-disciplinary field.
Understand data mining concept implement in emerging topics and challenges in recommender
system
5
Data Mining Techniques in Recommender System
Outcome:
Outcome after completion of this unit student able to:-
Students will able how understand basic concept of data mining .
Students will able to implement of different kinds of data mining.
Students can evaluate different type of data mining technique in recommender system.
Students can evaluate different data mining association rules use in recommender system as multi-
disciplinary field.
Students will able to understand basic requirement of data mining technique implement with
emerging topics and challenges in recommender system.
6
Data Mining Techniques in Recommender System
Source: https://dataaspirant.com/data-mining/ 7
Introduction to Data mining techniques
8
Source https://data-flair.training/blogs/data-mining-and-data-science/
Introduction to Data mining techniques
• Mining in its casual terms refers to the extraction of valuable minerals. In the 21st century, Data is the
most expensive mineral. To extract usable data from a given set of raw data, we use Data Mining.
• Through Data Mining, we extract useful information in a given dataset to extract patterns and identify
relationships.
• The process of data mining is a complex process that involves intensive data warehousing as well as
powerful computational technologies.
• Furthermore, data mining is not only limited to the extraction of data but is also used for
transformation, cleaning, data integration, and pattern analysis. Another terminology for Data Mining
is Knowledge Discovery.
Source: https://data-flair.training/blogs/data-mining-and-data-science/
9
Introduction to Data mining techniques
There are various important parameters in Data Mining, such as association rules, classification,
clustering, and forecasting. Some of the key features of Data Mining are –
• Prediction of Patterns based on trends in the data.
• Calculating the predictions for the outcomes.
• Creating information in response to the analysis
• Focusing on greater databases.
• Clustering the visual data
Source: https://data-flair.training/blogs/data-mining-and-data-science/
10
Introduction to Data mining techniques
Source: https://data-flair.training/blogs/data-mining-and-data-science/
11
Introduction to Data mining techniques
Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html 12
Data pre-processing
When we talk about data, we usually think of some large datasets with huge number of rows and columns.
While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured
Tables, Images, Audio files, Videos etc..
Machines don’t understand free text, image or video data as it is, they understand 1s and 0s. So it probably
won’t be good enough if we put on a slideshow of all our images and expect our machine learning model to get
trained just by that!
Source: https://data-flair.training/blogs/data-mining-and-data-science/
13
Data pre-processing
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining 14
/
Data pre-processing
1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is done. It
involves handling of missing data, noisy data etc.
(a) Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are: Ignore the tuples:
This approach is suitable only when the dataset we have is quite large and multiple values are missing
within a tuple.
• Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values manually, by attribute
mean or the most probable value.
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining
15
Data pre-processing
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining
16
Data pre-processing
Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining process. This involves
following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual levels.
4. Concept Hierarchy Generation:
Here attributes are converted from lower level to higher level in hierarchy. For Example-The attribute “city”
can be converted to “country”.
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining
17
Data pre-processing
Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working with huge volume of
data, analysis became harder in such cases. In order to get rid of this, we uses data reduction technique. It aims to
increase the storage efficiency and reduce data storage and analysis costs.
• The various steps to data reduction are:
1. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data cube.
2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded. For performing attribute selection,
one can use level of significance and p- value of the attribute.the attribute having p-value greater than
significance level can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression Models.
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining
18
Data pre-processing
4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after reconstruction from
compressed data, original data can be retrieved, such reduction are called lossless reduction else it is called
lossy reduction. The two effective methods of dimensionality reduction are:Wavelet transforms and PCA
(Principal Component Analysis).
Source: https://www.geeksforgeeks.org/data-preprocessing-in-data-mining
19
Data mining techniques used in recommender system:
Source: https://builtin.com/data-science/recommender-systems
20
Data mining techniques used in recommender system:
(II) Sampling,
We refer to a subset of observations from a larger population as a sample. Sampling, however, is used in
reference to the group of observations that data will be collected from for our research. Therefore, Data Sampling
may be defined as a technique leveraged by practitioners to select a subset of observations that is representative
of the larger population.
• There are 2 main data sampling strategies:
• Probability sampling involves random selection. All of the observations within the data have a chance to be
selected, therefore strong statistical inferences could be made about the entire group.
• Non-probability sampling does not involve random selection. Instead, it’s based on convenience or other
criteria. As a result, some observations do not have a chance of being selected regardless of the number of
samples that are built.
Source: https://builtin.com/data-science/recommender-systems
21
Dimensionality reduction techniques
We are generating a tremendous amount of data daily. In fact, 90% of the data in the world has been generated in
the last 3-4 years!. Below are just some of the examples of the kind of data being collected:
• Facebook collects data of what you like, share, post, places you visit, restaurants you like, etc.
• Your smartphone apps collect a lot of personal information about you
• Amazon collects data of what you buy, view, click, etc. on their site
• Casinos keep a track of every move each customer makes
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
https://builtin.com/data-science/recommender-systems
22
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
23
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
24
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
25
Dimensionality reduction techniques
26
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 27
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
28
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 29
Dimensionality reduction techniques
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems 30
Denoising
Denoising is any signal processing method which reconstruct a signal from a noisy one. Its goal is to remove
noise and preserve useful information
Image enhancement is an important research topic in image processing and computer vision
Noise model
• There are many sources of noise in images, and these noises come from various aspects such as image
acquisition, transmission, and compression. The types of noise are also different, such as salt and pepper noise,
Gaussian noise, etc. There are different processing algorithms for different noises.
Source: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/-science/recommender-systems
31
k – means clustering
• A cluster refers to a collection of data points aggregated together because of certain similarities.
• You’ll define a target number k, which refers to the number of centroids you need in the dataset. A
centroid is the imaginary or real location representing the center of the cluster.
• Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares.
• In other words, the K-means algorithm identifies k number of centroids, and then allocates every data
point to the nearest cluster, while keeping the centroids as small as possible.
• The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
• K-means algorithm in data mining starts with a first group of randomly selected centroids, which are
used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to
optimize the positions of the centroids
• It halts creating and optimizing clusters when either:
• The centroids have stabilized — there is no change in their values because the clustering has been
successful.
• The defined number of iterations has been achieved.
32
support vector machine
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both
classification or regression challenges. However, it is mostly used in classification problems. In the SVM
algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you
have) with the value of each feature being the value of a particular coordinate. Then, we perform
classification by finding the hyper-plane that differentiates the two classes very well (look at the below
snapshot).
.
• .
Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a frontier which best
segregates the two classes (hyper-plane/ line)
33
support vector machine
• A simple linear SVM classifier works by making a straight line between two classes. That means all of
the data points on one side of the line will represent a category and the data points on the other side of
the line will be put into a different category. This means there can be an infinite number of lines to
choose from.
• What makes the linear SVM algorithm better than some of the other algorithms, like k-nearest
neighbors, is that it chooses the best line to classify your data points. It chooses the line that separates
the data and is the furthest away from the closet data points as possible.
• A 2-D example helps to make sense of all the machine learning jargon. Basically you have some data
points on a grid. You're trying to separate these data points by the category they should fit in, but you
don't want to have any data in the wrong category. That means you're trying to find the line between the
two closest points that keeps the other data points separated.
• So the two closest data points give you the support vectors you'll use to find that line. That line is called
the decision boundary.
34
support vector machine
Types of SVMs
There are two different types of SVMs, each used for different things:
Simple SVM: Typically used for linear regression and classification problems.
Kernel SVM: Has more flexibility for non-linear data because you can add more features to fit a hyperplane instead of a two-
dimensional space.
Pros:-
• Effective on datasets with multiple features, like financial or medical data.
• Effective in cases where number of features is greater than the number of data points.
• Uses a subset of training points in the decision function called support vectors which makes it memory efficient.
• Different kernel functions can be specified for the decision function. You can use common kernels, but it's also possible to
specify custom kernels.
Cons:-
• If the number of features is a lot bigger than the number of data points, avoiding over-fitting when choosing kernel functions
and regularization term is crucial.
• SVMs don't directly provide probability estimates. Those are calculated using an expensive five-fold cross-validation.
• Works best on small sample sets because of its high training time.
35
Ensemble methods
The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in
order to improve generalizability / robustness over a single estimator.
In averaging methods, the driving principle is to build several estimators independently and then to average their predictions.
On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.
By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined
estimator. The motivation is to combine several weak models to produce a powerful ensemble.
Source: https://scikit-learn.org/stable/modules/ensemble.html
36
Rule-based classifiers
Rule-based classifiers are just another type of classifier which makes the class decision depending by using various “if..else”
rules. These rules are easily interpretable and thus these classifiers are generally used to generate descriptive models. The
condition used with “if” is called the antecedent and the predicted class of each rule is called the consequent.
Source: https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/#:~:text=Rule%2Dbased%20classifiers%20are%20just,used%20to%20generate%20descriptive%20models.
37
ANN
A neural network is a machine learning algorithm based on the model of a human neuron. The human brain consists
of millions of neurons. It sends and process signals in the form of electrical and chemical signals. These neurons are
connected with a special structure known as synapses. Synapses allow neurons to pass signals. From large numbers
of simulated neurons neural networks forms.
An Artificial Neural Network is an information processing technique. It works like the way human brain processes
information. ANN includes a large number of connected processing units that work together to process information.
They also generate meaningful results from it.
We can apply Neural network not only for classification. It can also apply for regression of continuous target
attributes.
Neural networks find great application in data mining used in sectors. For example economics, forensics, etc and
for pattern recognition. It can be also used for data classification in a large amount of data after careful training.
A neural network may contain the following 3 layers:
Input layer – The activity of the input units represents the raw information that can feed into the network.
Hidden layer – To determine the activity of each hidden unit. The activities of the input units and the weights on
the connections between the input and the hidden units. There may be one or more hidden layers.
Output layer – The behavior of the output units depends on the activity of the hidden units and the weights
between the hidden and output units
38
Source: https://www.geeksforgeeks.org/rule-based-classifier-machine-learning/#:~:text=Rule%2Dbased%20classifiers%20are%20just,used%20to%20generate%20descriptive%20models.
ANN
Source: https://www.datasciencecentral.com/profiles/blogs/artificial-neural-network-ann-in-machine-learning
39
Bayesian classification
Bayesian classification uses Bayes theorem to predict the occurrence of any event. Bayesian classifiers are the
statistical classifiers with the Bayesian probability understandings. The theory expresses how a level of belief,
expressed as a probability.
Bayes theorem came into existence after Thomas Bayes, who first utilized conditional probability to provide an
algorithm that uses evidence to calculate limits on an unknown parameter.
Bayes's theorem is expressed mathematically by the following equation that is given below.
.
Where:
P(A|B) – the probability of event A occurring, given event B has occurred
P(B|A) – the probability of event B occurring, given event A has occurred
P(A) – the probability of event A
P(B) – the probability of event B
Source: https://corporatefinanceinstitute.com/resources/knowledge/other/bayes-theorem/ 40
Association rule mining
Association rule mining finds interesting associations and relationships among large sets of data items. This rule
shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to show associations between items.It
allows retailers to identify relationships between the items that people buy together frequently.
Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of
other items in the transaction.
Association Rule Mining is one of the ways to find patterns in data. It finds:
• Features (dimensions) which occur together
• Features (dimensions) which are “correlated”
What does the value of one feature tell us about the value of another feature? For example, people who buy diapers
are likely to buy baby powder. Or we can rephrase the statement by saying: If (people buy diaper), then (they buy
baby powder). Note the if, then rule. This does not necessarily mean that if people buy baby powder, they buy
diaper. In General, we can say that if condition A tends to B it does not necessarily mean that B tends to A. Watch
the directionality!
Source: https://towardsdatascience.com/association-rule-mining-be4122fc1793
41
Data Mining Techniques in Recommender System
1. Developing an understanding of
a) the application domain
b) the relevant prior knowledge
c) the goals of the end-user
2. Creating a target data set: selecting a data set, or focusing on a subset of variables, or data samples,
on which discovery is to be performed.
3. Data cleaning and preprocessing.
a) Removal of noise or outliers.
b) Collecting necessary information to model or account for noise.
c) Strategies for handling missing data fields.
d) Accounting for time sequence information and known changes.
Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/
42
Introduction to Data mining techniques
Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/
43
Introduction to Data mining techniques
7. Data mining.
• Searching for patterns of interest in a particular representational form or a set of
such representations as classification rules or trees, regression, clustering, and
so forth.
8. Interpreting mined patterns.
9. Consolidating discovered knowledge.
Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html/
44
MCQ:
Q.1 Which of the following refers to the problem of finding abstracted patterns (or structures) in the
unlabeled data?
a) Supervised learning
b) Unsupervised learning
c) Hybrid learning
d) Reinforcement learning
Answer: b
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to find
the hidden structured and patterns in the given unlabeled data
45
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:
Q. 2 Which one of the following refers to querying the unstructured textual data?
a) Information access
b) Information update
c) Information retrieval
d) Information manipulation
Answer: c
Explanation: Information retrieval refers to querying the unstructured textual data. We can also understand
information retrieval as an activity (or process) in which the tasks of obtaining information from system
recourses that are relevant to the information required from the huge source of information.
46
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:
Q.3 Which of the following is an essential process in which the intelligent methods are applied to extract
data patterns?
a) Warehousing
b) Data Mining
c) Text Mining
d) Data Selection
Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract
meaningful data from the huge collection ( or set) of data.
47
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:
Answer: a
Explanation: The term KDD or Knowledge Discovery Database is refers to a broad process of discovering
the knowledge in the data and emphasizes the high-level applications of specific Data Mining techniques as
well.
48
Source: https://www.microsoft.com/en-us/research/lab/microsoft-research-asia/articles/personalized-recommendation-systems/
MCQ:
Q.5 For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
a) In order to maintain consistency
b) For authentication
c) For data access
d) To obtain the queries response
Answer: d
Explanation:
Whenever a query is fired, the response of the query would be put very earlier. So, for the query response,
the analysis tools pre-compute the summaries of the huge amount of data. To understand it in more details,
consider the following example:
Suppose that to get some information about something, you write a keyword in Google search. Google's
analytical tools will then pre-compute large amounts of data to provide a quick output related to the
keywords you have written.
49
MCQ:
Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of
tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization, and
evolution. Still, the association and correctional analysis classification are also one of the important
functionalities of data mining.
50
MCQ:
Q.7 Which one of the following statements about the K-means clustering is incorrect?
a) The goal of the k-means clustering is to partition (n) observation into (k) clusters
b) K-means clustering can be defined as the method of quantization
c) The nearest neighbor is the same as the K-means
d) All of the above
Answer: c
Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.
51
MCQ:
Q.8 The self-organizing maps can also be considered as the instance of _________ type of learning.
a) Supervised learning
b) Unsupervised learning
c) Missing data imputation
d) Both A & C
Answer: b
Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial
Neural Network which is trained through unsupervised learning
52
Assignment
53
Summary
Data pre processing is an information mining strategy that includes changing crude
information into a justifiable configuration
A Bayesian classifier depends on the possibility that the job of a (characteristic) class is to
foresee the upsides of highlights for individuals from that class
Rule based classifer calculations remove proficiencies as rules from the characterization
model, which are not difficult to appreciate and extremely expressive
Association rule mining rules are made via scanning information for incessant assuming
examples and utilizing the standards backing and certainty to distinguish the main
connections.
Documents Links
Document Link
• Video Links
1 3 Approaches To Building A Recommendation https://towardsdatascience.com/3-approaches-to-build-a-reco
System mmendation-system-ce6a7a404576
56
Introduction to Web Security
• E- Resource
Topic URL
https://www.cse.iitk.ac.in/users/nsrivast/HCC/Recom
Recommender Systems Handbook Francesco Ricci · mender_systems_handbook.pdf
Lior Rokach · Bracha Shapira · Paul B. Kantor Editors
57
58