0% found this document useful (0 votes)
3 views

A Collaborative Filtering Recommendation Algorithm Based on Item Genre and Rating Similarity

The document presents a collaborative filtering recommendation algorithm that improves upon traditional user-based and item-based methods by incorporating item genre and rating similarity to address issues of data sparsity. Through experiments, the proposed algorithm demonstrates reduced mean absolute error (MAE) and enhanced recommendation quality. The study utilizes a dataset from MovieLens to validate the effectiveness of the improved algorithm compared to standard methods.

Uploaded by

Rahim Mahruf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

A Collaborative Filtering Recommendation Algorithm Based on Item Genre and Rating Similarity

The document presents a collaborative filtering recommendation algorithm that improves upon traditional user-based and item-based methods by incorporating item genre and rating similarity to address issues of data sparsity. Through experiments, the proposed algorithm demonstrates reduced mean absolute error (MAE) and enhanced recommendation quality. The study utilizes a dataset from MovieLens to validate the effectiveness of the improved algorithm compared to standard methods.

Uploaded by

Rahim Mahruf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2009 International Conference on Computational Intelligence and Natural Computing

A Collaborative Filtering Recommendation Algorithm


Based on Item Genre and Rating Similarity

Ye Zhang Wei Song


School of Business School of Business
Bohai University Bohai University
Jinzhou, China Jinzhou, China
[email protected] [email protected]

Abstract—Aiming at the disadvantages of user-based ratings, and the error of similarity computation will be come
collaborative filtering algorithm and item-based collaborative out. Then it will affect the quality of recommendation.
filtering algorithm on the instance of user’s rating data’s
extreme sparseness, introducing the similarity of item genre II. RELATED WORK
and rating and improving on it. The high ratings of users
group can also affect similarity when calculating the A. Basic knowledge
similarities of item genre and ratings. Through the experiment
the improved algorithm can play down user’s mean absolute
The technology of collaborative filtering
error and improve the quality of recommendation. recommendation can get the similarity of users/items
through analyzing the ratings of users on items, and then
Keywords- collaborative filtering; recommendation systems; predict the ratings of users on unrated items. High ratings
MAE; E-commerce can be recommended to users. There will be three processes
for the technology of collaborative filtering
recommendation, the first step is that users rate on items;
I. INTRODUCTION the second is the nearest neighbors to be found and the
third is items are recommended. The process is showed in
Internet has become an indispensable tool on working,
Figure 1.
living and entertainment. As the correlative statistic data
indicate, the total pages of websites have been more than 80
I1 I2 … In
million till Jan.2008. It is hard to get information which
he/she wants from so much network resource. Nowadays U1 R11 R12 … R1n
people mainly use search engine to get information, this is a U2 R21 R22 … R2n
very passive way, and the information from using search … … … … …
engine may be not the right that people want, or people may Um Rm1 Rm2 … Rmn
browse large numbers of pages before getting the right
information. So it should be ineffective.
Generally there are three kinds of methods about
recommendation: personalized recommendation—a
recommendation which is based on personal action that has
been past, social recommendation—a recommendation Ux I1x I2x I3x … Imx
which is based on similar users that have been past, item
recommendation—a recommendation which is based on Figure 1:The process of recommendation
item itself. Personalized recommendation system is an active
There is a list of m users U = {u1, u2, … , um} and a list
information service system. It can make up the disadvantage
of n items I = {i1, i2, … , in}. They can be represented as an
that search engine getting information passively. Nowadays
m×n ratings matrix in the first part of Figure 1. Each user ui
almost all the websites of E-commerce use recommendation
has a list of items Iui, which the user has expressed his/her
systems, e.g., Amazon, CDNow, EBay, DangDang, douban
opinions about. User ui can give items ij a rating Rij.
and so on. The methods of recommendation are
Opinions can be explicitly given by the user as a rating
recommendation content-based and collaborative filtering
score, generally within a certain numerical scale.
recommendation [1]; the latter has been a successful
technology [2], but it has some problems such as sparsity, B. Similarity Computation
scalability and cold start, etc. The methods of similarity computation are almost the
Every user can not rate on every item because of the large same between item-based and user-based. There are three
number of users and items. So it will be sparsity about

978-0-7695-3645-3/09 $25.00 © 2009 IEEE 72


DOI 10.1109/CINC.2009.219
basic methods to compute the similarity[6]. They are increases day after day. But ratings of users on items may
denoted as not increase with it, this will make that the common ratings
(1) Correlation-based similarity between two users be few. So that maybe it will be 0
In this case, similarity between two items i and j is between two users if we user the user-based collaborative
measured by computing the Pearson-r correlation. To make filtering algorithm to compute the prediction. In fact the
the correlation computation accurate we must first isolate the similarity of two users may not be 0. Item-based
co-rated cases. Let the set of users who both rated i and j are collaborative filtering comes out. The basic idea of Item-
denoted by A then the correlation similarity between items i based collaborative filtering algorithm is choosing K most
and j denoted by sim(i,j) is given by similar items and getting the corresponding similarity
according to the similarity of rated item and target items.
Then we can compute the rating of predictions through the
∑ (R
a∈U i j
a,i − R i )(R a,j − R j ) formula with the ratings of the target user to the best several
sim(i,j)= similar neighbors and their similarity. For example, there are
∑ (R
a ∈U i j
a,i − R i )2 ∑ (R
a∈U i j
a,j − R j )2 three items tagged as i, j, k. We think that i and j are more
similar than i and k through the method of similarity
computation. In fact, i and k are the same genre but not i and
(1)
j. So it is wrong to predict the rating of user on item through
computing the similarity with ratings. The genre of items
Here Ra,i denotes the rating of user an on item i, Ri and can be proposed.
R j are the average ratings of the i-th item and j-th item. Single user has high rating on target item, and one of the
two items is rated highly and the other is rated lowly. The
(2) Cosine-based similarity two items have the same genre as target item, so we can not
In this case, two items are thought of as two vectors in the estimate which one has high similarity with target item. We
m dimensional user-space. The similarity between them is suppose that a group of users can be instead of single user.
measured by computing the cosine of the angle between High rated item can be thought has more similar with target
these two vectors. Formally, similarity between items i and j, item.
denoted by sim(i,j) is given by From the analysis above, we can improve the algorithm
of item-based collaborative filtering using genre similarity.
i× j In item-based collaborative filtering algorithm using genre
sim(i, j ) = cos(i, j ) = similarity, first the candidacy of neighbor items should be
(i j). (2)
selected based on genres of items, computing the similarities
between target items and the candidacy of neighbor items
through the ratings matrix and the set of nearest neighbors
Here × denotes the dot-product of the two vectors.
should be got out. We can combine the users when
(3) Adjusted-cosine similarity
computing the candidacy of neighbor items but not only
Computing similarity using basic cosine measure in item-
genres of items. If group of users rate on target item highly,
based case has one important drawback—the difference in
and also the users rate on other items highly which have the
rating scale between different users are not taken into
same genres with target item, then we can think that target
account. The adjusted cosine similarity offsets this drawback
item and the other items are similar much.
by subtracting the corresponding user average from each co-
rated pair. Formally, the similarity between items i and j B. Algorithm
using this scheme is given by The similarity of target item and neighbor items comes
from improving genres of items and ratings.

∑ (R
a) First, inputting a user randomly, and get a set of
a,i − R a )(R a,j − R a ) items as Iunrat which the user did not rate, then selecting a
a∈U i j target item as Iaim which should be attributed to Iunrat.
sim(i,j)= .(3) b) Selecting a group of users who rated target item
∑ (R a,i − R a )2 ∑ (R a,j − R a )2 highly in the set of training, and the group of users should
a ∈U i a∈ U j rate other items highly(we suppose that r is a threshold
value which users rate items and r>=4), Iother is a item
which target user has rated on in the set of training.
III. IMPROVED ALGORITHM c) Counting the genre number for every Iother,
computing the similarity simattri(i,j) between the genre of
A. Algorithm analysis Iother and target item.
Generally, traditional collaborative filtering d) Computing the similarity simrat(i,j) between target
recommendation algorithm is user-based and computes the item and Iother through the three methods of similarity
similarity between users. The number of users and items computation.

73
e) Computing the compositive similarity which is number (0 or 1) for a genre, 0 denotes one movie has not a
tagged as siminte(i,j), selecting the first N neighbors’items corresponding genre and 1 denotes one movie has a
as a set of the nearest neighbors’items NI for target item, corresponding genre. We only considered users that rated 20
and the formula is or more movies. The rating grade is integral and from 1 to 5.
siminte(i,j) = (1-α) simattri(i,j) + αsimrat(i,j) High number is expressed that the movie is preferred by the
user.
α denotes weighing coefficient which is between 0 -1.
We randomly selected 20891 ratings from the database
f) Computing the prediction P(user,Iaim) by the
for the experimentation which was 200 users rating on 368
siminte(i,j) and the user has rated the items which belong to
movies. The sparsity level of the movie data set is, therefore,
NI. Here we consider two such methods to predict rating
which is the rating of target user on unrated items, and 20891 , which is 0.7165. Every user has rated 50 or
1 −
200 × 368
denoted the prediction P(user,Iaim) as more, every movie has been rated by 30 or more users. A
Traditional: value of x = 0.8 would indicate 80% of the data was used as
training set and 20% of the data was used as test set.

∑ sim inte (I aim ,j) * (R user,j − R j ) B. Evaluation metrics


P(user,I aim )=R Iaim + j∈NI . (4) In the paper, we used Mean Absolute Error (MAE) [1]
∑ sim
j∈NI
inte (I aim ,j) for evaluating the quality of a recommendation system.
MAE is a measure of the deviation of recommendations
from their true user-specified values. The lower the MAE,
Weighted Sum: the more accurately the recommendation engine predicts
user ratings. For each ratings-prediction pair <pi, qi> this
metric treats the absolute error between them i.e., |pi - qi|
∑ sim inte (I aim ,j) * R user,j equally. The MAE is computing by first summing these
absolute errors of the N corresponding rating-prediction
P(user,I aim )=
j∈NI . (5)
pairs and then computing the average. Formally,
∑ sim
j∈ NI
inte (I aim ,j)
N

∑ p −q i i
Here R I aim and R J are the average ratings of Iaim- MAE = i =1
. (6)
th item and j-th item. NI is a set of the most similar items. N
siminte(i,j) is a similarity between Iaim-th item and j-th
item. Ruser , j is the rating of target user on item j. C. Experimental results
g) Repeating from b) to f), computing all the In order to validate the collaborative filtering
prediction that user did not rate on items. Sorting the recommendation algorithm based on improved item genre
prediction P(user,Iaim), recommending the first N prediction and rating similarity. We took two group of experimentation
to the user. to compare the result.
IV. EXPERIMENTATION Experimentation Ⅰ
A. Data set Because weighing coefficient α is diverse when
computing the compositive similarity between items, the
In this paper we use the experimental data form
value of αcan affect the quality of recommendation. In this
Movielens’ research website (http://MovieLens.umn.edu/) to
experimentation, we used the collaborative filtering
evaluate different algorithms. The data set should be
recommendation algorithm based on improved item genre
disposed through read into Access database for the
and rating similarity to predict the ratings on unrated items.
experimental data. Movielens is a recommender system
We used correlation-based similarity, cosine-based
which is based on web research, users can rate about similarity and adjusted-cosine similarity to compute the
movies on it. At the same time, the recommendation system similarity, and the ratings-predictions were traditional and
can recommend a set of movies to users. Up to now, the site weighted sum. The number of neighbor for target item is 20,
has over 45000 users who have expressed opinions on
we increased the weighting coefficientαsize from 0 to 1 in
6600+ different movies. It contains 10000 ratings that are
an increment of 0.1, and the sensitivity of MAE is showed in
943 users rating on 1682 movies. There are 19 genres
Figure 2.
(unknown |Action |Adventure |Animation |Children's
|Comedy |Crime |Documentary |Drama |Fantasy |Film-Noir
|Horror |Musical |Mystery |Romance |Sci-Fi |Thriller |War
|Western) about the movies. We converted into a movie-
genre matrix that had 1682 rows (i.e., 1682 movies) and 19
columns (i.e., 19 genres). In the matrix, every movie has a

74
Sensitivity of different weighing coefficient User-based
Sensitivity of the neighborhood
in different similarity(traditional)
size in different algorithms
Correlation-based Cosine-based Adjusted-cosine
1.2
Item-based
0.75 1

MAE
0.8
0.7
MAE

0.6 Item-based
0.4 using genre
0.65
0.2
0.6 0 improved item
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 genre and
5 10 15 20 25 30 35 40 45 50 rating
No. of Neighbors similarity

Sensitivity of different weighting coefficient


Figure 3: Comparison of prediction quality of different algorithms.
in different similarity(weighted sum)
Correlation-based Cosine-based Adjusted-cosine
V. CONCLUSION
0.85
The recommendation system has been used in many
0.8
industries. It mainly uses in E-commerce. A good
MAE

0.75 recommendation system can develop the economy rapidly.


0.7 But nowadays the recommendation system is not ideal. This
0.65 paper mainly improve item-based algorithm, and take
0.6 experimentations on different algorithms. It can provide
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 better quality of predictions. With the increased number of
items and users, it may take a little more time on computing
Figure 2: Comparison of prediction quality of different weighing
and we should ensure better algorithms.
coefficient in different similarity with two methods of prediction
computation.

From Figure 2, we can know that no matter which REFERENCES


method of similarity computation is chose, α=0.5 and using [1] Breese J, Heckerman D, Kadie C. Empirical Analysis of Predictive
the method of traditional ratings-prediction and the method Algorithms for Collaborative Filtering[A]. Proceedings of the 14th
Conference on Uncertainty in Artificial Intelligence[C]. Madison:
of correlation-based similarity, MAE is the lowest and the Morgan Kaufmann, p:43—52,1998.
quality of recommendation is the best. So we madeα=0.5, [2] Konstan J, Miller B, Maltz D, Herlocker J, et al. Grouplens:
and use the method of traditional ratings-prediction and the Applying Collaborative Filtering to Usenet News [J].
method of correlation-based similarity in the next Communications of the ACM, 40(3):77-87,1997
experimentation. [3] Li Yu, Lu Liu, Xuefeng Li. A Hybrid Collaborative Filtering Method
for Multiple-Interests and Mu1tipl Content Recommendation in E-
Commerce [J]. Expert Systems with Applications, 28(1):67-77, 2005.
Experimentation
Consequently, we can take experiments in the User- [4] Byeong Man Kim, Qing Li, Jong-Wan Kim, and Jinsoo Kim. A New
collaborative recommender system addressing three problems[C],
based, Item-based, as well as the general Item-based Proceedings of PRICAI 2004, p495-504, 2004
collaborative filtering algorithm with the combination of [5] Goldberg D, Nichols D, Oki B M, et al. Using collaborative filtering
item genre and the collaborative filtering recommendation to weave an information tapestry[J],Communications of the ACM,
algorithm based on improved item genre and rating 35(12):61-70, 1992
similarity. First we increased the neighborhood size from 5 [6] SARWAR B, KARYPIS G, KONSTAN J, et a1.Application of
to 50 in an increment of 5. The result of the experimentation dimensionality reduction in recommender systems: a case study:
is showed in Figure 3. proc. of the WebKDD Workshop at the ACM SIGKKD[C]. New
York: ACM, 2006.
From the above we can know that the MAE value of the
collaborative filtering recommendation algorithm based on
improved item genre and rating similarity is lower than the
item-based and user-based algorithm. Also there is a few
lower than the general Item-based collaborative filtering
algorithm with the combination of item genre. No matter
which algorithm is, it can be observed from the chart that the
value of MAE changes rapidly when the number of
neighbors is from 5 to 25.

75

You might also like