Using Singular Value Decomposition Approximation For Collaborati
Using Singular Value Decomposition Approximation For Collaborati
Traditional centralized recommendation systems have 4.1 Algorithms and Theoretical Analysis
problems such as users losing their privacy, retail monopo-
lies being favored, and diffusion of innovations being ham- We first present algorithms for computing aggregates and
pered [3]. Distributed collaborative filtering systems, where generating predictions and then a theoretical analysis of
users keep their rating profiles to themselves, have the po- their performance. Assume that there are c online users at
tential to correct these problems. However, in the dis- time point t, and that their rating profiles are denoted A(1)
tributed scenario there are two new problems that need to to A(c) . Algorithm 2 shows how to generate the aggregate.
be dealt with. The first problem is how to ensure that users’
data are not revealed to the server and other users. The sec- Algorithm 2 Computing the aggregate Gt
ond problem is how to ensure that users can get as accurate 1: for each user i, to each unknown entry Aij do
predictions as they do in the centralized scenario. This pa- 2: If Aij has been predicted before, replace Aij with
per is mainly focused on the second problem, and conse- the latest prediction.
quently we rely mechanisms shown in [3, 7] to address the 3: Else replace Aij with the average of user i’s ratings.
first problem. 4: end for
Since the server cannot directly see users’ rating profiles, 5: The server securely performs SVD on the matrix C (c-
it needs to compute an aggregate (a learning result based on by-n) formed by filled-in rating profiles.
user information) for making predictions. Figure 1 shows 6: Aggregate Gt is the matrix (n-by-c) formed by the top
our framework for collaborative filtering in distributed rec- k right singular vectors of C.
ommendation systems. At a certain time point t, the server
securely computes the aggregate (denoted as Gt ) from those When a user i asks for predictions, the server generates
users who are online at that time point (denoted as Ut ); “se- predictions as follows using the aggregate Gt .
curely” here means that users’ rating profiles are not dis-
closed to the server and other users. Between time point t
Algorithm 3 Generating predictions for user i
and t + 1, when a certain user (no matter whether she is
1: For each unknown entry Aij , if Aij has been predicted
in Ut or not) needs predictions, the server computes predic-
before, replace Aij with the latest prediction.
tions based on this user’s rating profile and the aggregate
2: Else replace Aij with the average of user i’s ratings.
Gt .
3: Multiply the filled-in rating profile vector (1-by-n) by
The reason of computing aggregates periodically is that
Gt GTt to generate predictions.
users’ rating profiles are dynamic. For any given user, the
probability that he is in Ut is independent of the probability
that he is in Ut+1 , so Ut and Ut+1 would be expected to For analysis, we make the following two assumptions.
have few users in common (given sufficiently many users). Assumption 1: there exists a constant β such that for
Therefore, it is hard to find a way to combine aggregates any m and for any user i, the filled-in rating profile vector
m
computed at different time points for predictions. A more (denoted as A∗(i) ) satisfies j=1 A∗(j) 2 /(m · A∗(i) 2 ) ≥
minor concern in this framework is how the server picks β. Recall that m is the total number of users. The soundness
time points for aggregate computations. Of course, time of this assumption is shown in Appendix A.
F
Assumption 2: there is a uniform, constant probability 0.08
T
that any user is online at any time point. 0.06 0.06
Assuming that c users are online (sampled) at a time 0.04 0.04
point according to Assumption 2, then such a sampling
0.02 0.02
method is similar to another sampling method that picks
also c samples in the following way: to choose a sample, 0
0 0.02 0.04
0
0 0.02 0.04
the probability that any user is picked is uniform (i.e., 1/m). Ratio of new rating cases Ratio of new users
NMAE
4.3 Preserving Privacy
0.165
This work is not focused on improvements in preserving
privacy in the distributed scenario, so to address the privacy
issue in Algorithm 2 and 3, we apply security schemes pro-
posed in [3, 7]. In Algorithm 2, a distributed secure SVD 0.16
computation is needed to ensure that users’ rating profiles 0 10 20 30 40 50
Iterations of EM procedure
are not revealed to other users or the server. Canny’s paper
[3] proposed a scheme to achieve this objective. The idea (a) Jester
is to reduce the SVD computation to an iterative calculation 0.195
2% approximation
requiring only the addition of vectors of user data, and use 5% approximation
homomorphic encryption to allow the sums of encrypted 0.19 10% approximation
no approximation
vectors to be computed and decrypted without exposing in-
dividual data. 0.185
In Algorithm 3, the multiplication of a user’s rating pro-
NMAE
file by Gt GTt should be securely computed both so that the 0.18
server cannot learn the rating profile and so that the user
cannot learn Gt GTt . Moreover, the multiplication result 0.175
NMAE
served ratings in EM procedure with SVD approximation is
0.166
generally increasing, although it is not monotonically in-
creasing. Thus, Figure 3 is a verification that the prediction
0.164
accuracy of Algorithm 1 generally increases when more it-
eration are used in the EM procedure. In both data sets, the 0.162
NMAE of Algorithm 1 with an approximation ratio of 5%
is less than 2% higher than the NMAE for the standard al- 0.16
gorithm. Moreover, the convergence rate of both algorithms 0 10 20 30 40 50
Times of aggregate computations
is nearly identical. Algorithm 1 takes only about one tenth
the time for each iteration compared with the standard al- (a) Jester
gorithm. All these points support the conclusion that our 0.2
2% users
algorithm is practical for real-world systems. 5% users
0.195 10% users
All users
5.2 Experiment 2 0.19
NMAE
In experiment 2, the performance of Algorithm 2 and 0.185
[1] Y. Azar, A. Fiat, A. Karlin, F. McSherry, and J. Saia. Spec- To verify the soundness of Assumption 1, an experi-
tral analysis of data. In Proceedings of the 33rd ACM Sym- ment was performed on a 5000-by-1427 rating matrix from
posium on Theory of Computing, 2001. EachMovie. Missing entries are filled in using the av-
[2] J. S. Breese, D. Heckerman, and C. Kadie. Empirical anal- erage of that user for all available entries. Let β ∗ =
m
ysis of predictive algorithms for collaborative filtering. In mini j=1 A∗(j) 2 /(m · A∗(i) 2 ). Table 1 displays the
Proceedings of the 14th Conference on Uncertainty in Arti- mean value and the standard deviation of β ∗ (from 20 tri-
ficial Intelligence, 1998. als) when m increases from 1000 to 5000. It shows that β ∗
[3] J. Canny. Collaborative filtering with privacy. In Proceed- is very stable when m increases.
ings of the IEEE Symposium on Security and Privacy, 2002.
[4] J. Canny. Collaborative filtering with privacy via factor anal-
ysis. In Proceedings of the 25th ACM SIGIR Conference,
Table 1. The mean value (“mean") and the
2002.
[5] P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay.
standard deviation (“std") of β ∗ from 20 tri-
Clustering of large graphs via the singular value decompo- als when m increases.
sition. IEEE Journal of Machine Learning, 56(1-3):9–33,
m 1000 2000 3000 4000 5000
2004.
[6] P. Drineas, I. Kerenidis, and P. Raghavan. Competitive rec- mean 0.406 0.407 0.407 0.407 0.407
ommendation systems. In Proceedings of the 34th ACM std 0.006 0.003 0.002 0.001 0.000
symposium on Theory of computing, 2002.
[7] W. Du and M. Atallah. Privacy-preserving cooperative sta-
tistical analysis. In Proceedings of the 17th Annual Com-
puter Security Applications Conference, 2001.
[8] Z. Ghahramani and M. I. Jordan. Learning from incomplete
data. Technical report, MIT, 1994.
[9] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigen-
taste: A constant time collaborative filtering algorithm. In-
formation Retrieval, 4(2):133–151, 2001.
[10] G. Golub and C. V. Loan. Matrix Computations (3rd edi-
tion). Johns Hopkins University Press, 1996.
[11] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl.
An algorithmic framework for performing collaborative fil-
tering. In Proceedings of the 22nd ACM SIGIR Conference,
1999.