Final Version on IEEE
Final Version on IEEE
Abstract—Blind source separation algorithms such as indepen- often involve a modest number of individuals and privacy
dent component analysis (ICA) are widely used in the analysis concerns can preclude sharing “raw” data with collaborators.
of neuroimaging data. To leverage larger sample sizes, different Performing a new joint analysis across the individual data
data holders/sites may wish to collaboratively learn feature rep-
resentations. However, such datasets are often privacy-sensitive, points requires access to individuals’ data. Therefore, research
precluding centralized analyses that pool the data at one site. In this groups often collaborate by performing meta-analyses limited
work, we propose a differentially private algorithm for performing to already-published aggregates/summaries of the data. For
ICA in a decentralized data setting. Due to the high dimension machine learning (ML) applications, each party/site may lack
and small sample size, conventional approaches to decentralized a sufficient number of samples to robustly estimate features
differentially private algorithms suffer in terms of utility. When
centralizing the data is not possible, we investigate the benefit of on their own, but the aggregate number of samples across
enabling limited collaboration in the form of generating jointly all sites can yield novel discoveries such as biomarkers for
distributed random noise. We show that such (anti) correlated noise diseases. Some noteworthy examples where individual research
improves the privacy-utility trade-off, and can reach the same level groups/sites may wish to collaborate, include:
of utility as the corresponding non-private algorithm for certain r a medical research consortium of several healthcare cen-
parameter choices. We validate this benefit using synthetic and real
ters/research labs for neuroimaging analysis [2]–[5]
neuroimaging datasets. We conclude that it is possible to achieve
meaningful utility while preserving privacy, even in complex signal
r a decentralized speech processing system to learn model
processing systems. parameters for speaker recognition
Index Terms—Differential privacy, decentralized computation, r a multi-party cyber-physical system for performing global
independent component analysis, correlated noise, fMRI. state estimation from sensor signals.
I. INTRODUCTION Although sending the data samples to a central repository or
aggregator can enable efficient feature learning, privacy con-
HARING data is a major challenge for researchers in a
S number of domains. In particular, human health studies
cerns and large communication overhead are often prohibitive
when sharing the data. Several previous works demonstrated
how modern signal processing and ML algorithms can po-
Manuscript received February 22, 2021; revised June 14, 2021, September
13, 2021, and October 28, 2021; accepted October 29, 2021. Date of publication tentially reveal information about individuals present in the
November 11, 2021; date of current version December 3, 2021. The associate dataset [6]–[8]. A mathematically rigorous framework for pro-
editor coordinating the review of this manuscript and approving it for publication tection against such information leaks is differential privacy [9].
was Dr. Alexander Bertrand. This work was supported in part by the US NIH un-
der Award 1R01DA040487, in part by the US NSF under Award CCF-1453432, Differentially private (DP) algorithms offer a quantifiable plau-
and in part by DARPA and SSC Pacific under Contract N66001-15-C-4070. sible deniability to the data owners regarding their participation.
This work significantly improves upon the preliminary work presented at Under differential privacy, the algorithm outputs are randomized
IEEE Annual Conference on Information Science and Systems, 2016. [DOI:
http://dx.doi.org/10.1109/CISS.2016.7460488] (Corresponding author: Hafiz in such a way that the presence or absence of any individual in
Imtiaz.) the dataset does not significantly affect the computation output.
Hafiz Imtiaz is with the Department of Electrical and Electronic Engineering, This randomization often takes the form of noise introduced
Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
(e-mail: [email protected]). somewhere in the computation, resulting in a loss in perfor-
Jafar Mohammadi is with the Nokia Bell Labs, 70435 Stuttgart, Germany mance or utility of the algorithm. Privacy risk is quantified by
(e-mail: [email protected]). a parameter or parameters, leading to a privacy-utility trade-off
Rogers Silva, Bradley Baker, Sergey M. Plis, and Vince D. Calhoun are with
the Tri-institutional Center for Translational Research in Neuroimaging and Data in DP algorithm design.
Science, Georgia State University, Georgia Institute of Technology and Emory In this paper, we consider blind source separation (BSS)
University, GA 30303 USA (e-mail: [email protected]; [email protected]; for neuroimaging, where several individual research groups
[email protected]; [email protected]).
Anand D. Sarwate is with the Department of Electrical and or sites wish to collaborate. The joint goal is to learn global
Computer Engineering, Rutgers University, NJ 08854 USA (e-mail: statistics/features utilizing data samples from all sites and en-
[email protected]). sure formal privacy guarantees. Unfortunately, conventional
This article has supplementary downloadable material available at
https://doi.org/10.1109/TSP.2021.3126546, provided by the authors. approaches to using differential privacy in decentralized settings
Digital Object Identifier 10.1109/TSP.2021.3126546 require introducing too much noise, leading to a poor trade-off.
1053-587X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6356 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
The high dimensional nature of neuroimaging data also poses requirements or sample sizes at the sites. We show how
a challenge, yielding a large gap between the centralized and CAPE can be applied to convex optimization problems,
decentralized cases. We develop a decentralized computation such as empirical risk minimization (ERM) and computa-
framework for differentially private signal processing and ML tion of loss functions that are separable across sites.
applications, which can partially close this gap. To do this, we r We use CAPE to design a novel algorithm capeDJICA
need an additional resource: if sites can generate (anti) correlated for (, δ)-DP decentralized joint ICA. capeDJICA signif-
random noise independent of their data, we see significant icantly improves upon our earlier work [1] by taking ad-
improvements in performance. Enabling this resource could be vantage of the CAPE scheme. To address the multi-round
provided using a trusted third party or cryptographic methods, nature of the capeDJICA algorithm and to provide a tighter
which may incur different privacy/security costs that we do not characterization of privacy under composition, we provide
address here. We employ the scheme in our BSS application for an analysis using Rényi Differential Privacy (RDP) [16]
neuroimaging data that guarantees differential privacy with an and the moments accountant [17].
improved privacy-utility trade-off. r We demonstrate the effectiveness of CAPE and
The particular BSS algorithm we are considering is the in- capeDJICA on real and synthetic data with varying pri-
dependent component analysis (ICA), one of the most popular vacy levels, number of samples and other key parameters.
BSS techniques for neuroimaging studies [10]. ICA assumes We show that the capeDJICA can provide utility very close
that the observed signals are mixtures of statistically indepen- to that of a non-private algorithm [13] for some parameter
dent sources and aims to decompose the mixed signals into choices. In the regime of meaningful utility, capeDJICA
those sources. ICA has been widely used to estimate intrinsic outperforms the existing privacy-preserving algorithm [1].
connectivity networks from brain imaging data (e.g., functional The assumed availability of correlated noise, together with
magnetic resonance imaging or fMRI) [11]. Successful ap- our improved accounting of privacy via the Rényi Differ-
plication of ICA on fMRI can be attributed to both sparsity ential Privacy, enable us to achieve such performance even
and spatial or temporal independence among the underlying for strict privacy requirements.
sources [10]. The goal of temporal ICA is to identify temporally Note that, we showed a preliminary version of the CAPE
independent components that represent activation of different protocol in [18]. The protocol in this paper is more robust against
brain regions over time [12]. However, it requires the aggregate site dropouts and does not require a trusted third-party.
temporal dimension (of all subjects) to be at least similar to Related Work: There is a vast literature [19]–[24] on solving
the voxel dimension [13]. In most cases, the data from a single optimization problems in decentralized settings, both with and
medical center may not suffice for such analysis. We focus on the without privacy constraints. In the signal processing/ML con-
recently proposed decentralized joint ICA (djICA) algorithm, text, the most relevant ones to our current work are those using
which can perform temporal ICA of fMRI data [13] by allowing ERM and stochastic gradient descent (SGD) [17], [25]–[32].
research groups to jointly learn the underlying sources in a non Additionally, several works studied decentralized DP learning
privacy-preserving way. for locally trained classifiers [33], [34]. One of the most common
Our Contribution: Conventional approaches to decentralized approaches for ensuring differential privacy in optimization
DP algorithms result in too much noise (see Appendix A for problems is to employ randomized gradient computations [27],
an illustration). We propose a novel framework for decentral- [30]. Other common approaches include employing output per-
ized DP computations – correlation assisted private estimation turbation [26] and objective perturbation [23], [26]. A newly
(CAPE), to mitigate the noise. We show that if the sites can proposed take on output perturbation [35] injects noise after
sample their noise from an (anti) correlated distribution, we model convergence, which imposes some additional constraints.
can achieve significantly better privacy-utility trade-offs. Using In addition to optimization problems, Smith [36] proposed a
a trusted “noise generator,” or multiparty computation for the general approach for computing summary statistics using the
noise generation, involves assumptions on the trust model for sample-and-aggregate framework and both the Laplace and
the system and may incur additional costs depending on the Exponential mechanisms [37].
choice of implementation. However, we show that the effort One approach to handling decentralized learning with privacy
to implement this sampling capability is very beneficial for constraints is federated learning [38] and in particular, cross-silo
applications with high-dimensional data in the moderate sample learning [39]. Many approaches use multiparty computation
regime. We summarize our contributions here: (MPC), such as Heikkilä et al. [40], who also studied the
r We propose our new CAPE framework for decentralized relationship of additive noise and sample size in a decentralized
DP computations and provide theoretical guarantees on its setting. In their model, S data holders communicate their data to
privacy and utility properties. In CAPE, we first choose the M computation nodes to compute a function. Differential pri-
noise variances (e.g., to meet a utility constraint) and then vacy provides different guarantees (see [41], [42] for thorough
analyze the resulting mechanism to show that it provides comparisons between Secure Multi-party Computation (SMC)
an (, δ) differential privacy guarantee, where and δ and differential privacy) although we can use MPC protocols
satisfy a specific relation. We actually prove that CAPE to implement part of our algorithm [43]. Other approaches to
satisfies (, δ) probabilistic differential privacy [14], [15], using DP in federated learning operate in different regimes, such
which in-turn implies (, δ) differential privacy. We ex- as learning from a large number of individual data holders, or
tend the CAPE scheme to include asymmetric privacy learning from silos with a large number of data points at each
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6357
site. This allows for privacy amplification by subsampling [17], related to WX. More specifically, the objective of Infomax
[44]–[46] or using a trusted shuffler [47], [48]. In our application ICA is: W∗ = argmaxW H(G(WX)). Here, G(·) is the sigmoid
1
involving neuroimaging analysis, such techniques do not scale function and is given by: G(z) = 1+exp(−z) . Additionally, H(z)
as well, since sites often have a small number of samples. Our is the (differential) entropy
of a random vector z with joint
work is inspired by the seminal work of Dwork et al. [49] that density q: H(z) = − q(z) log q(z)dz. Here, we evaluate it on
proposed distributed noise generation for preserving privacy. We a matrix, implying sample averaging over its columns. Note
employ a similar principle as Anandan and Clifton [50] to reduce that the function G(·) is applied element-wise for matrix-valued
the noise added for differential privacy. Our approach seeks to arguments. That is, G(Z) is a matrix with the same size as Z and
leverage properties of conditional Gaussian distributions to gain [G(Z)]ij = G([Z]ij )).
some privacy amplification when learning from decentralized The Decentralized Data Problem: We consider a
data, and is complementary to these other techniques. decentralized-data model with S sites. There is a central node
In addition to generalized optimization methods, a number of that acts as an aggregator. We assume an “honest but curious”
modified ICA algorithms exist for joining various data sets [51] threat model: all parties follow the protocol but a subset are
together and performing simultaneous decomposition of data “curious” and can collude (maybe with an external adversary)
from a number of subjects and modalities [52]. Note that ICA can to learn other sites’ data/function outputs. Now, for the
be performed by considering voxels as variables or time points as decentralized ICA problem, suppose each site s has a collection
variables, leading to temporal and spatial ICA, respectively [53], of data matrices {Xs,m ∈ RD×Nm : m = 1, . . . , Ms } each
[54]. For instance, group spatial ICA (GICA) is noteworthy consisting of a time course of length Nm time points over D
for performing multi-subject analysis of task- and resting-state voxels for each of Ms individuals. We assume the data samples
fMRI data [11], [54], [55]. It assumes that the spatial map in the local sites are disjoint and come from different individuals.
components are similar across subjects (i.e., the overall spatial Sites concatenate their local data matrices temporally to form a
networks are stable across subjects for the experiment duration). D × Nm M s data matrix Xs ∈ R
D×Ns
, where Ns = Nm Ms .
The joint ICA (jICA) [56] algorithm for multi-modal data fusion Let N = Ss=1 Ns be the total number of samples and
assumes that the mixing process is similar over a group of M = Ss=1 Ms be the total number of individuals (across
subjects. Group temporal ICA also assumes common spatial all sites). We assume a global mixing matrix A ∈ RD×R
maps but pursues statistical independence of timecourses (acti- generates the time courses in Xs from underlying sources
vation of certain neurological regions) [13]. Consequently, like Ss ∈ RR×Ns at each site. This yields the following model:
jICA, the common spatial maps from temporal ICA describe a X = [AS1 . . . ASS ] = [X1 . . . XS ] ∈ RD×N . We want to
common mixing process among subjects. While very interesting, compute the global unmixing matrix W ∈ RR×D in the
temporal ICA of fMRI is typically not investigated because of decentralized setting. Because sharing the raw data between
the small number of time points in each data set, which leads sites is often impossible due to privacy constraints, we
to unreliable estimates [13]. The decentralized jICA overcomes develop methods that guarantee differential privacy [9]. More
that limitation by leveraging datasets from multiple sites. specifically, our goal is to use DP estimates of the local gradients
to compute the DP global unmixing matrix W such that it
closely approximates the true global unmixing matrix.
II. DATA AND PRIVACY MODEL
Notation: We denote vectors, matrices and scalars with bold
A. Definitions
lower case letters (x), bold upper-case letters (X) and unbolded
letters (M ), respectively. We denote indices with lower-case In differential privacy we consider a domain D of databases
letters and they typically run from 1 to their upper-case versions consisting of N records and define D and D to be neighbors if
(m ∈ {1, 2, . . . , M } [M ]). The n-th column of the matrix X they differ in a single record.
is denoted as xn . We denote the Euclidean (or L2 ) norm of a vec- Definition 1 ((, δ)-Differential Privacy [9]): An algorithm
tor and the spectral norm of a matrix with · 2 and the Frobenius A : D → T provides (, δ)-differential privacy ((, δ)-DP) if
norm with · F . Finally, the density of √ the standard normal Pr[A(D) ∈ S] ≤ exp() Pr[A(D ) ∈ S] + δ, for all measur-
random variable is given by φ(x) = (1/ 2π) exp(−x2 /2). able S ⊆ T and all neighboring data sets D, D ∈ D.
The ICA Model: In this paper, we consider the generative One way to interpret this is that the probability of the out-
ICA model as in [1], [13]. In the centralized scenario, the put of an algorithm is not changed significantly if the input
independent sources S ∈ RR×N are composed of N observa- database is changed by one entry. This definition is also known
tions from R statistically independent components. We have a as the Bounded Differential Privacy (as opposed to unbounded
linear mixing process defined by a mixing matrix A ∈ RD×R differential privacy [58]). Here, (, δ) are privacy parameters:
with D ≥ R, which forms the observed data X ∈ RD×N as lower (, δ) ensure more privacy. The parameter δ can be in-
a product X = AS. Many ICA algorithms propose recover- terpreted as the probability that the algorithm fails to provide
ing the unmixing matrix W ∈ RR×D , corresponding to the privacy risk . Note that (, δ)-differential privacy is known as
Moore-Penrose pseudo-inverse of A, denoted A+ , by trying the approximate differential privacy and -differential privacy
to maximize independence between rows of the product WX. (-DP) is known as pure differential privacy. In general, we
The maximal information transfer (infomax) [57] is a popular denote approximate (bounded) differentially private algorithms
heuristic for estimating W that maximizes an entropy functional with DP. There are several mechanisms for formulating a DP
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6358 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6359
aggregator broadcasts Ss=1 ês to all the sites. Each site then sets simplicity, we assume that all sites have equal number of samples
(i.e., Ns = N 2 2
es = ês − S1 Ss =1 ês to achieve Ss=1 es = 0. We show the S ) and τs = τ .
complete noise generation procedure in Algorithm 1. Although To infer the private data of the sites s ∈ SH , the adversary
it is shown for scalars, it can be readily extended for array-valued can observe â = [â1 , . . . , âSH ] ∈ RSH and ê = s∈SH ês .
zero-sum noise terms. Note that the adversary can learn the partial sum ê because
Details of CAPE Protocol: We they can get the sum s ês from the aggregator and the noise
observe that the variance of terms {êSH +1 , . . . , êS } from the colluding sites. Therefore, the
es is given by τe2 = E[(ês − S1 Ss =1 ês )2 ] = (1 − S1 )τs2 . Ad-
2 adversary observes the vector y = [â , ê] ∈ RSH +1 to make
ditionally, we choose τg2 = τSs . Each site then generates the noise
inference about the non-colluding sites. To prove differential
gs ∼ N (0, τg2 ) independently and sends âs = f (xs ) + es + gs g(y|a)
privacy guarantee, we must show that | log g(y|a ) | ≤ holds
to the aggregator. Note that neither of the terms es or gs has
with probability (over the randomness of the mechanism) at
large enough variance to provide (, δ)-DP guarantee to f (xs ).
least 1 − δ. Here, a = [f (x1 ), . . . , f (xSH )] , and g(·|a) and
However, we chose the variances of es and gs to ensure that the
g(·|a ) are the probability density functions of y under a and a ,
es + gs is sufficient to ensure a DP guarantee to f (xs ) at site
respectively. The vectors a and a differ in only one coordinate
s. The chosen variance of gs also ensures that the output from
(neighboring). Without loss of generality, we assume that a
the aggregator would have the same noise variance as the DP
and a differ in the first coordinate. We note that the maximum
pooled-data scenario – observe that we compute the following
at
difference is N1s as the sensitivity of the function f (xs ) is N1s .
the aggregator (in Step 2 of Algorithm 2): acape = S1 Ss=1 âs =
1
S 1
S Recall that we release âs = f (xs ) + es + gs from each site.
S s=1 f (xs ) + S s=1 gs , where we used s es = 0. The We observe ∀s ∈ [S]: E(âs ) = f (xs ), var(âs ) = τ 2 . Addition-
2 τ2 2
variance of the estimator acape is τcape = S · Sg2 = τpool , which ally, ∀s1 = s2 ∈ [S], we have: E(âs1 âs2 ) = f (xs1 )f (xs2 ) −
τ2
S . That is, the random variable â is N (a, Σâ ), where Σâ =
is exactly the same as if all the data were pooled at the aggregator.
2
This claim is formalized in Lemma 1. We show the complete (1 + S1 )τ 2 I − 11 τS ∈ RSH ×SH and 1 is a vector of all ones.
algorithm in Algorithm 2. The privacy of Algorithm 2 is given Without loss of generality, we can assume [59] that a = 0
by Theorem 1. The communication cost of the scheme is shown and a = a − v, where v = [ N1s , 0, . . . , 0] . Additionally, the
in Appendix K in the Supplement.
random variable ê is N (0, τê2 ), where τê2 = SH τ2 . Therefore,
Theorem 1 (Privacy of CAPE Algorithm (Algorithm 2)): Σâ Σâê
Consider Algorithm 2 in the decentralized data setting of Sec- g(y|a) is the density of N (0, Σ), where Σ = Σ τê2
∈
âê
2 2
tion I with Ns = N S and τs = τ for all sites s ∈ [S]. Suppose R (SH +1)×(SH +1)
. With some simple algebra, we can find the
that at most SC sites can collude after execution. Then Algorithm expression for Σâê : Σâê = (1 − SSH )τ 2 1 ∈ RSH . If we denote
2 guarantees (, δ)-differential privacy for each site, where (, δ) ṽ = [v , 0] ∈ RSH +1 then we observe
satisfy the relation δ = 2 −μ σz
σz ), ∈ (0, 1) and (μz , σz )
φ( −μ z
z
g(y|a) 1
are given by log
= − y Σ−1 y − (y + ṽ) Σ−1 (y + ṽ)
g(y|a ) 2
9
2
S3 S − SC + 2 S−SC SC 1
μz = 2 2 + 2 , = 2y Σ−1 ṽ + ṽ Σ−1 ṽ
2τ N (1 + S) S − SC S(1 + S) − 3SC 2
(1) 1
= y Σ−1 ṽ + ṽ Σ−1 ṽ = |z|,
σz2 = 2μz . (2) 2
where z = y Σ−1 ṽ + 12 ṽ Σ−1 ṽ. Using the matrix inversion
Remark 1: As mentioned in the Introduction, CAPE takes the
lemma for block matrices [61, Section 0.7.3] and some algebra,
target noise variances as inputs rather than the privacy parame-
we have
ters (, δ). We can think of this as setting a constraint on the utility
at the input and then using Theorem 1 to specify the privacy −1 Σ−1 1 −1 −1
â + K Σâ Σâê Σâê Σâ −K1 −1
Σâ Σâê
Σ = 1 −1 1
,
guarantee. This approach allows us to leverage composition −K Σâê Σâ K
results using Rényi-DP for iterative algorithms to optimize
for a target δ. where Σ−1 S 2 2
â = (1+S)τ 2 (I + SH 11 ) and K = τê − Σâê Σâ
−1
Remark 2: Theorem 1 is stated for the symmetric setting: Σâê . Note that z is a Gaussian random variable N (μz , σz2 ) with
2 2
Ns = N S and τs = τ ∀s ∈ [S]. Additionally, as with many parameters μz = 12 ṽ Σ−1 ṽ and σz2 = ṽ Σ−1 ṽ given by (1) and
algorithms using the approximate differential privacy, the guar- (2), respectively. Now, we observe
antee holds for a range of (, δ) pairs subject to a trade-off
g(y|a)
constraint between and δ, as in the simple case in Definition 3. Pr log ≤ = Pr [|z| ≤ ] = 1 − 2 Pr [z > ]
Proof: As mentioned before, we identify the SH non- g(y|a )
colluding sites with s ∈ {1, . . . , SH } SH and the SC collud- − μz
ing sites with s ∈ {SH + 1, . . . , S} SC . The adversary can = 1 − 2Q
σz
observe the outputs from each site (including the aggregator).
Additionally, the colluding sites can share their private data and σz − μz
>1−2 φ ,
the noise terms, ês and gs for s ∈ SC , with the adversary. For − μz σz
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6360 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
where Q(·) is the Q-function [62] and φ(·) is the density for with δconv and δpool in Appendix H in the Supplement. Here,
standard Normal random variable. The last inequality follows δconv and δpool are the smallest δ guarantees we can afford in
from the bound Q(x) < φ(x) x [62]. Therefore, the proposed the conventional decentralized DP scheme and the pooled-data
CAPE ensures (, δ)-DP with δ = 2 −μ σz
z σz ) for each site,
φ( −μ z scenario to achieve the same noise variance as the pooled-data
assuming that the number of colluding sites is at-most SC . As scenario for a given . Additionally, we empirically compare δ,
the local datasets are disjoint and differential privacy is invariant δconv and δpool for weaker collusion assumptions in Appendix H
under post processing, the release of acape also satisfies (, δ) in the Supplement. In both cases, we observe that δ is always
g(y|a)
differential privacy. Note that, we proved that | log g(y|a ) | ≤
smaller than δconv and smaller than δpool for some τ values.
holds with probability (over the randomness of the mechanism) That is, for achieving the same noise level at the aggregator
at least 1 − δ. This actually proves probabilistic differential as the pooled-data scenario, we are ensuring a much better
privacy [14], [15], which in-turn implies approximate or (, δ)- privacy guarantee by employing the CAPE scheme over the
differential privacy [63], [64]. Furthermore, for meaningful pri- conventional approach.
vacy guarantee, we require μz ≤ in addition to ∈ (0, 1), as we Lemma 1: Consider the symmetric setting: Ns = N S and
ensured in our experiments. The scheme fails to provide formal τs = τ 2 for all sites s ∈ [S]. Let the variances of the noise
2
privacy if the number of colluding sites exceeds SC , which is terms es and gs (Step 2 of Algorithm 2) be τe2 = (1 − S1 )τ 2 and
2
determined by the choice of the scheme used for computing τg2 = τS , respectively. If we denote the variance of the additive
S
ê
s=1 s . noise (for preserving privacy) in the pooled data scenario by
2
Note that, traditional privacy mechanisms specify the mini- τpool and the variance of the estimator acape (Step 2 of Algorithm
2
mum noise needed for a given (, δ) pair. As mentioned before, 2) by τcape then Algorithm 2 achieves the same noise variance
2 2
there are infinitely many (, δ) pairs that yield the same noise as the pooled-data scenario (i.e., τpool = τcape ).
variance τ 2 . Additionally, there are several real-world applica- Proof: The proof is given in Appendix B.
tions, especially in medical and human health research, where Proposition 1: (Performance improvement) If the local noise
we need to ensure a given utility for the privacy-preserving variances are {τs2 } for s ∈ [S] then the CAPE scheme provides
2
technique. For example, ICA is widely used in neuroimaging a reduction G = ττconv
2 = S in noise variance over conventional
cape
applications. One performance index for successful ICA is the
decentralized DP scheme in the symmetric setting (Ns = N S and
normalized gain index q NGI [13], [65] that quantizes the quality
τs2 = τ 2 ∀s ∈ [S]), where τconv
2 2
and τcape are the noise variances
of the unmixing matrix. For practical usability of the recov-
of the final estimate at the aggregator in the conventional scheme
ered mixing matrix, we need to achieve q NGI ≤ 0.1 [13]. As
and the CAPE scheme, respectively.
mentioned before, our CAPE algorithm is motivated by such
Proof: The proof is given in Appendix C.
scenarios that are common in human health research among
Remark 4 (Unequal Sample Sizes at Sites): The CAPE al-
scientific research collaborators. Therefore, we designed the
gorithm achieves the same noise variance as the pooled-data
CAPE scheme with τ 2 as the input parameter and computed 2 2
scenario (i.e., τcape = τpool ) in the symmetric setting: Ns = N
the best we can achieve for a given 0 < δ < 1 (or vice-versa). 2
τcape
S
We present an empirical analysis of asymptotic growth of (, δ) and τs2 = τ 2 ∀ s ∈ [S]. In general, the ratio H(n) = 2
τpool
, where
parameters for the CAPE mechanism in Appendix G in the n [N1 , N2 , . . . , NS ], is a function of the sample sizes in the
Supplement. 2 S
1
sites. We observe: H(n) = N S3 s=1 Ns2 . As H(n) is a Schur-
Remark 3: We presented Theorem 1 for the scalar case for
convex function, it can be shown using majorization theory [66]
simplicity of presentation and understanding. However, it can 2
1
that 1 ≤ H(n) ≤ N S 3 ( (N −S+1)2 + S − 1), where the minimum
be readily extended to high-dimensional settings (e.g., f (x) ∈
Rd ). One can use the fact that the distribution of a spherically is achieved for the symmetric setting (i.e., Ns = N S ). That is,
symmetric normal is independent of the orthogonal basis from CAPE achieves the smallest noise variance at the aggregator in
which its constituent normals are drawn [59]. Therefore, one the symmetric setting.
can work in a basis that is aligned to v. Considering such a Remark 5 (Site Dropouts): If one chooses to use the
basis {b1 , . . . , bd }, and assuming that b1 is parallel to v, the SecureAgg
in Algorithm 1, the CAPE scheme achieves
analysis can be shown [59] to be reduced to the scalar case. We e
s s = 0 even in the case of site drop-out, as long as the
refrain from including the details into the current manuscript number of active sites is above some threshold (see Bonawitz
because of space constraints and to improve the coherence of et al. [43] for details). Therefore, the performance improvement
the presentation. of CAPE (Proposition 1) remains the same irrespective of the
number of dropped-out sites, as long as the number of colluding
sites does not exceed SC = /ceil ∗ S3 − 1.
A. Utility Analysis
The goal is to ensure (, δ) differential privacy for each site
2 2 B. Applicability of CAPE
and achieve τcape = τpool at the aggregator (see Lemma 1).
The CAPE protocol guarantees (, δ) differential privacy with As mentioned before, joint learning across datasets can yield
δ = 2 −μ
σz
z σz ). We claim that this δ guarantee is much
φ( −μ z
discoveries that are impossible to obtain from a single site.
better than the δ guarantee in the conventional decentralized However, privacy regulations prevent sites from sharing local
DP scheme. We empirically validate this claim by comparing δ raw data. Our CAPE algorithm is motivated by such scenarios,
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6361
2 2
which are common in human health research among scientific conventional approach, we need τes + τgs = τs2 , where τes
2
is
research collaborators. It can benefit computations with sensitiv- 2
the variance of es and is a function of σs . With these constraints,
ities satisfying some conditions (see Proposition 2). In addition we can formulate a feasibility problem to solve for the unknown
to simple averages, many functions of interest have sensitivities noise variances {σs2 , τgs
2
} as
that satisfy such conditions. Examples include the empirical
average loss functions used in ML and deep neural networks.
S
2 2
minimize 0 subject to τes + τgs = τs2 ; μ2s τgs
2 2
= τpool
Moreover, we can use the Stone-Weierstrass theorem [67] to
s=1
approximate a loss function in decentralized setting applying
CAPE and then use off-the-shelf optimizers. Additional appli- for all s ∈ [S], where {μs }, τpool and {τs } are known to the
cations include optimization algorithms, k-means clustering and aggregator. For this problem, multiple solutions are possible.
estimating probability distributions. We present one solution here along with the privacy analysis.
2
Proposition 2: Consider a decentralized setting with S > Solution: We observe that the variance τes of the zero-
1
S
1 sites in which site s ∈ [S] has a dataset Ds of Ns sam- mean random variable es = ês − μs S i=1 μi êi can be com-
S S
ples and Ss=1 Ns = N . Suppose the sites are employing the puted as τes 2
= Var[ês − i=1
μ ê
i i
] = (1 − S2 )σs2 + i=1 i iμ2 σ 2
.
μ2s S 2
CAPE scheme to compute a function f (D) with L2 sensi- S μs2S 2 2
tivity Δ(N ). Denote n= [N1 , N2 , . . . , NS ] and observe the
Note that we need s=1 μs τgs = τpool . One solution is to
2
τcape S 2
s=1 Δ (Ns )
set τgs 2
= μ21S τpool
2
. Using the constraint τes 2 2
+ τgs = τs2 and
ratio H(n) = 2 =S 3 Δ2 (N ) . Then the CAPE protocol
s
τpool
the expressions for τes 2
and τgs 2
, we have (1 − S1 )2 σs2 +
achieves H(n) = 1, if i) Δ( N
S ) = SΔ(N ) for convex Δ(N ); 1 2 2 2 1 2
i=s μi σi = τs − μ2s S τpool . We can write this expres-
μ2s S 2
and ii) S 3 Δ2 (N ) = Ss=1 Δ2 (Ns ) for general Δ(N ). sion for all s ∈ [S] in matrix form and solve for [σ12 σ22 . . . σS2 ]
Proof: The proof is given in Appendix E as
⎡ 2 μ22 μ2S ⎤−1 ⎡ 2 τpool 2 ⎤
1 − S1 μ21 S 2
··· μ21 S 2
τ1 − μ2 S
C. Extension of CAPE: Unequal Sample Sizes/Privacy ⎢ ⎥ ⎢ 1 ⎥
⎥ ⎢ ⎥
2
⎢ μ21 1 2 μ2S 2 τpool
Requirements at Sites ⎢ μ2 S 2 2 1 − · · · 2 2 ⎥ ⎢ ⎢ τ 2 − μ2 S ⎥
2
⎢
S μ2 S
⎥ ⎢ ⎥
⎢ .. .. .. .. ⎥ .. ⎥
Recall that CAPE achieves the smallest noise variance at the ⎣ . . . . ⎦ ⎣ ⎢ . ⎥
aggregator in the symmetric setting (see Remark 4). However, 2
⎦
μ21 μ22 1 2
τpool
2
μ S 2 2
μ S 2 · · · 1 − S τ 2
− 2
in practice, there would be scenarios where different sites have S S S μS S
different privacy requirements and/or sample sizes. Addition- Privacy Analysis in Asymmetric Setting: We present an
ally, sites may want the aggregator to use different weights analysis of privacy for the aforementioned scheme in asym-
for different sites (possibly according to the quality of the metric setting. Recall that theadversary can observe â =
output from a site). A simple scheme for doing so is shown [â1 , . . . , âSH ] ∈ RSH and ê = s∈SH ês . In other words, the
in [18]. In this work, we propose a generalization of the CAPE adversary observes the vector y = [â , ê] ∈ RSH +1 to make
scheme that can be applied in asymmetric settings. Additionally, inference about the non-colluding sites. As before, we must
the proposed scheme in this paper is more robust against site g(y|a)
show that | log g(y|a ) | ≤ holds with probability (over the
dropouts and does not require a trusted third-party. Note that the
challenge of this analysis is due to the correlated noise terms randomness of the mechanism) at least 1 − δ for guaranteeing
with different variances (or sample sizes). differential privacy. Recall that we release âs = f (xs ) + es +
Let us assume that site s requires local noise standard devi- gs from each site. We observe E(âs ) = f (xs ), Var(âs ) =
μs σ 2
ation τs . To initiate the CAPE protocol, each site will generate τs2 , ∀s ∈ [S] and E(âs1 âs2 ) = f (xs1 )f (xs2 ) − μ1s Ss1 −
2
ês ∼ N (0, σs2 ) and gs ∼ N (0, τgs2
). The aggregator intends to 2
μ s 2 σs 2 1
S 2 2
compute a weighted average of each site’s data/output with μ s1 S + μ s1 μ s2 S 2 i=1 μi σi , ∀s1 = s2 ∈ [S]. Without loss
weights selected according to some quality measure. For ex- of generality, we can assume [59] that a = 0 and a = a − v,
ample, if the aggregator knows that a particular site is suf- where v = [ N1s , 0, . . . , 0] . That is, the random variable â is
fering from more noisy observations than other sites, it can N (0, Σâ ), where
choose to give the output from that site less weight while ⎡ ⎤
τ12 Ψ(1, 2) · · · Ψ(1, S)
combining the site results. Let us denote the weights by {μs } ⎢ ⎥
⎢ Ψ(2, 1) τ22 · · · Ψ(2, S)⎥
such that Ss=1 μs = 1 and μs ≥ 0. First, the aggregator com- ⎢
Σâ = ⎢ . .. ⎥
.. .. ⎥,
putes Ss=1 μs ês using Algorithm 1 and broadcasts it to all ⎣ .. . . . ⎦
sites. Each site then sets es = ês − μs1S Si=1 μi êi , to achieve Ψ(S, 1) Ψ(S, 2) · · · τS2H
S
s=1 μs es = 0 and releases â s = f (xs ) + es + gs . Now, the
S
μi σi2 μj σj2 2 2
and Ψ(i, j) = − S1 ( + s=1 μs σs
aggregator computes acape = Ss=1 μs âs = Ss=1 μs f (xs ) + μj μi ) + μi μj S 2 .
Additionally,
S S H 2
the random variable ê is N (0, τê2 ),
where = Ss=1 σs . τê2
s=1 μs gs , where we used s=1 μs es = 0. In order to achieve
2
the same utility as the pooled data scenario (i.e. τpool 2
= τcape ), Therefore,
g(y|a)
is the density of N (0, Σ), where Σ =
S 2
S 2 2 2 Σâ Σâê
we need Var[ s=1 μs gs ] = τpool ⇒ s=1 μs τgs = τpool . Ad- ∈ R(SH +1)×(SH +1) . With some simple algebra,
ditionally, for guaranteeing the same local noise variance as Σ
âê τ 2
ê
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6362 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
IV. IMPROVED DIFFERENTIALLY PRIVATE DJICA sitivity of the function f (Xs ) = Gs is ΔsG = 2B G
Ms . In addition
In this section, we propose an algorithm that improves to the unmixing matrix W, we update a bias term b using a
upon our previous decentralized DP djICA algorithm [1] and gradient descent [13]. The gradient of the empirical average
achieves the same noise variance as the DP pooled-data scenario loss function
with respect to the bias at site s is given [13] by
in certain regimes. Recall that we are considering the joint ICA hs = N1s N n=1 s,n . Similar to the case of Gs , we can find the
s
ŷ
(jICA) [56] of decentralized fMRI data, which assumes a global L2 sensitivity of the function f (Xs ) = hs as Δsh = 2B h
Ms , where
mixing process (common spatial maps). More specifically, the ŷs,n 2 ≤ Bh . Note that for other neighborhood definitions
global mixing matrix A ∈ RD×R is assumed to generate the (time point level instead of subject level), one should consider the
time courses in Xs from underlying sources Ss ∈ RR×Ns at temporal correlation in the data [70]. According to the Gaussian
each site s ∈ [S]. Each site has data from Ms individuals, which mechanism [59], computing (, δ) DP approximates of Gs and
are concatenated temporally to form the local data matrix Xs ∈ hs requires noise standard deviations τG s
and τhs satisfy
RD×Ns . That is: X = [AS1 . . . ASS ] ∈ RD×N . We estimate
the DP global unmixing matrix W ∈ RR×D ≈ A+ by solving ΔsG 1.25 s Δs 1.25
s
τG = 2 log , τh = h 2 log . (3)
the Infomax ICA problem (see Section II) in the decentralized δ δ
setting with a multi-round gradient descent that employs CAPE.
Neuroimaging data is generally very high dimensional. As mentioned before, we employ the CAPE protocol to combine
We therefore use the DP decentralized PCA algorithm the gradients from the sites at the aggregator to achieve the same
(capePCA) [18] as an efficient and privacy-preserving utility level as that of the pooled data scenario. More specif-
dimension-reduction step of our proposed capeDJICA algo- ically, each site generates two noise terms: EG s ∈R
R×R
and
rithm. For simplicity, we assume that the observed samples es ∈ R , collectively among all sites (element-wise, according
h R
are mean-centered. We present a slightly modified version of to Algorithm 1) at each iteration round. Additionally, each site s
generates the following two noise terms locally at each iteration:
the original capePCA algorithm in Algorithm 3 (Appendix L r KG s ∈R
R×R
; [KG s ]ij i.i.d. ∼ N (0, τ
2 2
); τGk s2
= S1 τG
in the Supplement) to match the robust CAPE scheme from r khs ∈ RR ; [khs ]i i.i.d. ∼ N (0, τ 2 ); τ Gk
2 1 s2
Section III. Note that the scheme proposed in [68] was limited hk hk = S τh .
by the larger variance of the additive noise at the sites due to At each iteration round, the sites compute the noisy esti-
the smaller sample size. The capePCA alleviates this problem mates of the gradients of W and b: Ĝs = Gs + EG s + Ks ,
G
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6363
A. Privacy Analysis Using Rényi Differential Privacy offered a “pure” DP djICA procedure, there are a few short-
comings. The cost of achieving pure differential-privacy (i.e.,
We now analyze the capeDJICA algorithm with Rényi Dif-
ferential Privacy [16]. Analyzing the total privacy loss of a employing the Laplace mechanism [9]) was that the neighboring
dataset condition was met by restricting the L2 -norm of the
multi-shot algorithm, each stage of which is DP, is a chal-
samples to satisfy xn 2 ≤ 2√1D , which can be too limiting for
lenging task. It has been shown [16], [17] that the advanced
composition theorem [59] for (, δ)-differential privacy can be datasets with large ambient dimensions. The effect of this is
apparent from the experiments. Last but not the least, the DP
loose. The main reason is that one can formulate infinitely
PCA preprocessing step was less fault tolerant because of the
many (, δ)-DP algorithms for a given noise variance τ 2 . RDP
offers a much simpler composition rule that is shown to be pass the parcel or cyclic style message passing among the sites,
where site dropouts are more drastic than the one employed in
tight [16]. We review some necessary properties of RDP in
Appendix D. Recall that at each iteration j of capeDJICA, this paper (a certain number of site dropouts is permitted [43]).
we compute the noisy estimates of the gradients: ΔW (j) and By employing the CAPE protocol in the preprocessing stage and
also in the optimization process, we expect to gain a significant
Δb (j). As we employed the CAPE scheme in the symmetric
setting, the variances of noise at the aggregator for ΔW (j) performance boost. We validate the performance gain in the
pool
ρ2 τ G
2
ρ2 τhpool
2 Experimental Results (Section V).
2
and Δb (j) are: σW = ΔG and σb2 = Δh , respectively, Convergence of capeDJICA Algorithm: We note that the
ΔsG Δs
where ΔG = Sand Δh = Sh . From Proposition 5, we have gradient estimate at the aggregator
(Step 14 in Algorithm 4)
that the computation of ΔW (j) is (α, α/(2σW2
))-RDP. Sim- essentially contains the noise Sρ Ss=1 KG s , which is zero mean.
ilarly, the computation of Δb (j) is (α, α/(2σb2 ))-RDP. By Although this does not provide guarantees on the excess error,
Proposition 4, we have that each iteration step of capeDJICA the estimate of the gradient converges in expectation to the
is (α, α2 ( σ12 + σ12 ))-RDP. Denoting the number of required true gradient [71]. However, if the batch size is too small,
W b
iterations for convergence by J ∗ then, under J ∗ -fold compo- the noise can be too high for the algorithm to converge [27].
∗
sition of RDP, the overall capeDJICA algorithm is (α, 2σαJ )- Since the total additive noise variance is smaller for capeDJICA
2
1
RDP than the conventional case by a factor of S, the convergence rate
RDP, where 2
σRDP
= ( σ12 + 1
σb2 ). From Proposition 3, we can is faster. Note that a theoretical analysis of intricate relation
W
∗
conclude that the capeDJICA algorithm satisfies ( 2σαJ
2 + between the excess error and the privacy parameters is beyond
RDP
1 the scope of the current paper. We refer the reader Bassily et
logδr
, δr )-differential privacy for any 0 < δr < 1. For a given
α−1 al. [30] for further details.
δr , we find the optimal αopt as: αopt = 1 + J2∗ σRDP 2 log δ1r . Communication Cost of capeDJICA: We analyze the total
α J∗ log δ1r communication cost associated with the proposed capeDJICA
Therefore, capeDJICA algorithm is ( 2σopt
2 + αopt −1 , δr )-DP algorithm. At each iteration round, we need to generate two
RDP
for any 0 < δr < 1. zero-sum noise terms, which entails O(S + R2 ) communica-
tion complexity of the sites and O(S 2 + SR2 ) communication
B. Privacy Accounting Using Moments Accountant complexity of the aggregator [43]. Each site computes the noisy
In this section, we use the moments accountant [17] frame- gradient and sends one R × R matrix and one R dimensional
work to compute the overall privacy loss of our capeDJICA vector to the aggregator. And finally, the aggregator sends the
algorithm. Moments accountant can be used to achieve a much R × R updated weight matrix and R dimensional bias estimate
smaller overall than the strong composition theorem [59]. As to the sites. The total communication cost is O(S + R2 ) for the
mentioned before, naïvely employing the additive nature of the sites and O(S 2 + SR2 ) for the central node. This is expected as
privacy loss results in the worst case analysis, i.e., assumes that we are estimating an R × R matrix in a decentralized setting.
each iteration step exposes the worst privacy risk and this exag-
gerates the total privacy loss. However, in practice, the privacy V. EXPERIMENTAL RESULTS
loss is a random variable that depends on the dataset and is
typically well-behaved (concentrated around its expected value). In this section, we empirically show the effectiveness of our
Due to space constraints, we presented the detailed analysis of proposed CAPE scheme through the capeDJICA algorithm,
capeDJICA in Appendix I in the Supplement. Briefly, we can demonstrating the benefit of enabling correlated noise gener-
formulate a quadratic equation in terms of and then find the best ation. We note the intricate relationship between and δ (see
2 ∗ 2
for a given δtarget : 2 Jσ∗ Δ2 2 − + J8σΔ2 + log δtarget = 0. Theorem 2) due to the correlated noise scheme and the challenge
2
Here, the noise variance σ consists of two parts: σW 2
and σb2 of characterizing the overall privacy loss in our multi-round
for capeDJICA. capeDJICA algorithm. We designed the experiments to better
demonstrate the trade-off between performance and several pa-
rameters: , δ and M . We show the simulation results to compare
C. Performance Improvement With Correlated Noise
the performance of our capeDJICA algorithm with the existing
The existing DP djICA algorithm [1] achieved J ∗ - DP djICA algorithm [1] (DP-djICA), the non-private djICA
differential privacy (where J ∗ is the total number of iterations algorithm [13] and a DP ICA algorithm operating on only
required for convergence) by adding a noise term to the local local data (local DP-ICA). We modified the base non-private
estimate of the source (i.e., Zs (j)). Although the algorithm djICA algorithm to incorporate the gradient bounds BG and
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6364 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
Bh . Although we are proposing an algorithm for decentralized (with solid lines on the right y-axis) along with q NGI (with
setting, we included the performance indices for the local setting dashed lines on the left y-axis) as a means for visualizing how
to show the effect of smaller sample sizes on the performance. the privacy-utility trade-off varies with different parameters. For
We note that the DP-djICA algorithm [1] offers -differential a given privacy budget (performance requirement), the user can
privacy as opposed to (, δ)-differential privacy offered by use the overall plot on the right y-axis, shown with solid
capeDJICA. For both synthetic and real datasets, we consider lines, (q NGI plot on the left y-axis, shown with dashed lines)
the symmetric setting (i.e., Ns = N S , τG = τG and τh = τh ).
s s
to find the required i or M on the x-axis and thereby, find the
We limited the maximum number of iterations J to be 1000 corresponding performance (overall ). Although we ensured
(however, the number of iterations varies with the algorithm i ∈ (0, 1) [59], we are more interested in the overall spent ,
and amount
√ of noise). We chose the norm bounds BG = 30, which we computed using the RDP technique (Section IV-A)
Bh = BG , number of sites S = 4, SC = /ceil ∗ S3 − 1 [43], for the capeDJICA and the local DP-ICA algorithms. For the
0.015
the target δ = 10−5 and the learning rate ρ = log(R) . We show DP-djICA algorithm, we used the composition theorem [59].
the average performance over 10 independent runs. Note that, the Note that, the primary reason for using the RDP/MA techniques
choice of hyper-parameters is non-trivial [72] and corresponding is the inherent ambiguity of the approximate DP algorithms
end-to-end privacy analysis is still an open problem. (existence of infinitely many (, δ) pairs that yield the same
Synthetic Data: We generated the synthetic data from the noise variance τ 2 ). For pure (, 0)-DP, this is not an issue. If
same model as [13]. The source signals S were simulated using the end goal is to have a pure (, 0)-DP algorithm, we need to
the generalized autoregressive (AR) conditional heteroscedastic stick to the basic composition [59]. We are reporting the privacy
(GARCH) model [73], [74]. We used M = 1024 simulated sub- spent during the course of the gradient descent. The total spent
jects in our experiments. For each subject, we generated R = 20 including PCA would be slightly higher.
time courses with 250 time points. The data samples are equally Performance Variation with i : First, we explore how the
divided into S = 4 sites. For each subject, the fMRI images are privacy-utility trade-off between q NGI and the overall “privacy
30 × 30 dimensional. We employ the capePCA algorithm [18] risk” varies with i . As mentioned before, we compare the
(Appendix L in the Supplement) as a preprocessing stage to performance of capeDJICA with those of the djICA, the
reduce the sample dimension from D = 900 to R = 20. The DP-djICA and local DP-ICA. In Figs. 1(a)–(d), we show the
capeDJICA is carried out upon the R-dimensional samples. variation of q NGI and overall for different algorithms with
Real Data: We use the same data and preprocessing as Baker i on synthetic and real data. For both datasets, we show the
et al. [13]: the data were collected using a 3-T Siemens Trio performance indices for two different M values: M = 256 and
scanner with a 12-channel radio frequency coil, according to the M = 1024. We observe from the figures that the proposed
protocol in Allen et al. [55]. In the dataset, the resting-state scan capeDJICA outperforms the existing DP-djICA by a large
durations range from 2 min 8 sec to 10 min 2 sec, with an average margin for the range of i values that results in q NGI ≤ 0.1.
of 5 min 16 sec [13]. We used a total of M = 1548 subjects This is expected as DP-djICA suffers from too much noise (see
from the dataset and estimated R = 50 independent compo- Section IV-C for the explanation). capeDJICA also guaran-
nents using the algorithms under consideration. For details on tees the smallest overall among the privacy-preserving meth-
the preprocessing, please see [13]. We also projected the data ods. capeDJICA can reach the utility level of the non-private
onto a 50-dimensional PCA subspace estimated using pooled djICA for some parameter choices and naturally outperforms
non-private PCA. As we do not have the ground truth for the real local DP-ICA as estimation of the sources is much accurate
data, we computed a pseudo ground truth [13] by performing when more samples are available. For the same privacy loss (i.e.,
a pooled non-private analysis on the data and estimating the for a fixed ), one can achieve better performance by increasing
unmixing matrix. The performance of capeDJICA, djICA, M . For both synthetic and real data, we note that assigning
DP-djICA and local DP-ICA algorithms are evaluated against a higher i may provide a good q NGI but does not guarantee
this pseudo ground truth. a small overall . This is because the overall is an implicit
Δs 1.25 2
Performance Index: We set τG s
= iG 2 log 10 −2 and τh =
s function of the added noise variance at each iteration σRDP and
the total number of iterations required for convergence J ∗ (see
Δsh 1.25
i 2 log 10 −2 for our experiments, where i is the privacy Section IV-A for details). The user needs to choose i based on
parameter per iteration, ΔsG and Δsh are the L2 sensitivities of privacy budget and required performance.
Gs and hs , respectively. To evaluate the performance of the Performance Variation with M : Next, in Figs. 2(a)–(d), we
algorithms, we consider the quality of the estimated unmixing show the variation of q NGI and the overall with the total
matrix W. More specifically, we utilize the normalized gain number of subjects M for two different i values on synthetic
index q NGI [13], [65] that quantizes the quality of W. The and real data. We observe similar trends in performance as in
normalized gain index q NGI varies from 0 to 1, with lower the case of varying i . The capeDJICA algorithm outperforms
values indicating a better estimation of a set of ground-truth the DP-djICA and the local DP-ICA: with respect to both q NGI
components (i.e. the unmixing matrix times the mixing matrix and the overall . For the q NGI , the capeDJICA performs very
is closer to an identity matrix [65]). For practical usability of the closely to the non-private djICA, even for moderate M values,
recovered A, we need to achieve q NGI ≤ 0.1 [13]. We consider while guaranteeing the smallest overall . The performance
the overall as a performance index. We plotted the overall gain over DP-djICA is particularly noteworthy. For a fixed M ,
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6365
Fig. 1. Variation of q NGI and overall with privacy parameter i for: (a)–(b) synthetic fMRI data, (c)–(d) real fMRI data. For capeDJICA, higher i results
a smaller q NGI , but not necessarily a small overall , i.e., an optimal i can be chosen based on q NGI or overall requirement.
Fig. 2. Variation of q NGI and overall with total number of subjects M for: (a)–(b) synthetic fMRI data, (c)–(d) real fMRI data. For capeDJICA, higher M
results a smaller q NGI and a smaller overall .
Fig. 3. Recovered spatial maps from synthetic data: the ground truth and the ones resulting from djICA, local DP-ICA, DP-djICA and capeDJICA.
Fig. 4. Spatial maps (synthetic data) resulting from capeDJICA for different parameters. capeDJICA estimates spatial maps that closely resemble the true ones,
even for strict privacy guarantee (small overall ).
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6366 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
increasing results in a slightly better utility, albeit at the cost empirically compared the performance of the proposed algo-
of greater privacy loss. We show the performance variation with rithms with those of conventional, non-private and local algo-
δ in Appendix J in the Supplement. rithms on synthetic and real datasets. We varied privacy param-
Reconstructed Spatial Maps: Finally, we intend to demon- eters and relevant dataset parameters to show that the proposed
strate how the estimated spatial maps (the estimated global mix- algorithms outperformed the conventional and local algorithms
ing matrix A, see Section II) actually look like, as interpretability comfortably and matched the performance of the non-private
is one of the most important concerns for fMRI applications. algorithms for some parameter choices. In general, the proposed
In Fig. 3, we show the true spatial map, the ones estimated algorithm offered very good utility even for strong privacy guar-
from the non-private djICA [13], local DP-ICA, DP-djICA antees – indicating we may be able to achieve meaningful privacy
and capeDJICA algorithms. It is evident from the figure that even without losing much utility. An interesting future work
the spatial map recovered by the proposed capeDJICA is very could be to extend the CAPE framework to (, 0)-differential
close to that of the ground truth. The overall is also very privacy, perhaps using the Staircase Mechanism [75] for differ-
small. The local DP-ICA, although can achieve a small , cannot ential privacy. Another possible direction is to extend CAPE for
recover the spatial maps well enough for practical purposes. arbitrary tree-structured networks.
However, increasing the i and/or increasing the number of
subjects would certainly improve the quality of the spatial maps.
Finally, for the DP-djICA, the algorithm fails to converge to APPENDIX A
anything meaningful due to excessive amount of noise. In Fig. 4, CONVENTIONAL DECENTRALIZED DP COMPUTATIONS
we show the estimated spatial maps resulting from the proposed As mentioned before, DP algorithms often introduce noise
capeDJICA algorithm along with the overall for a variety of in the computation pipeline to induce the randomness. For
combinations of i and M . We observe that when sufficiently additive noise mechanisms, the standard deviation of the noise
large number of subjects are available, the estimated spatial maps is scaled to the sensitivity of the computation
[59]. To illustrate,
closely resemble the true one, even for strict privacy guarantee consider estimating the mean f (x) = N1 N n=1 xn of N scalars
(small overall ). For smaller number of samples, we may need to x = [x1 , . . . , xN −1 , xN ] with each xi ∈ [0, 1]. The sensitiv-
compensate by allowing larger values to achieve good utility. In ity of the function f (x) is N1 . Therefore, for computing the
general, we observe that capeDJICA can achieve very good ap- (, δ)-DP estimate of the average a = f (x), we can follow the
proximate to the true spatial map, almost indistinguishable from Gaussian mechanism [9] to release âpool = a + epool , where
the non-private spatial map. This emphasizes the effectiveness 2
epool ∼ N (0, τpool ) and τpool = N1 2 log 1.25 δ .
of the proposed capeDJICA in the sense that very meaningful
Suppose now that the N samples are equally distributed
utility can be achieved even with strict privacy guarantee.
among S sites. That is, each site s ∈ {1, . . . , S} holds a disjoint
dataset xs of Ns = N/S samples. An aggregator wishes to
VI. CONCLUSION
estimate and publish the mean of all the samples. For pre-
We proposed a novel decentralized DP computation scheme, serving privacy, the conventional DP approach is for each site
CAPE, which uses correlated randomness at the sites to im- to release (or send to the aggregator node) an (, δ)-DP esti-
prove the performance of decentralized signal processing and s ) as: âs = f (x
mate of the function f (x s ) + es , where es ∼
ML applications involving locally held private data. Example N (0, τs2 ) and τs = N1s 2 log 1.25 2 log 1.25
δ = N
S
δ . The ag-
scenarios include health care research with legal and ethical
gregator can then compute the (, δ)-DP approximate average as
limitations on the degree of sharing the “raw” data. CAPE can
greatly improve the privacy-utility trade-off when (a) all parties âconv = S1 Ss=1 âs = S1 Ss=1 as + S1 Ss=1 es . The variance
τs2 τs2 2
follow the protocol and (b) the number of colluding sites is of the estimator âconv is S · S2 = S τconv . We observe the
2
not more than S/3 − 1 [43]. Our proposed CAPE protocol is ratio:
τpool
2 = τs2 /S 2
= S1 . That is, the decentralized DP averag-
τconv τs2 /S
based on an estimation-theoretic analysis of the noise addition ing scheme will always result in a poorer performance than the
process for differential privacy and therefore, provides different pooled data case.
guarantees than cryptographic approaches such as SMC. CAPE
guarantees (, δ) probabilistic differential privacy and hence
(, δ) differential privacy, where and δ satisfy a specific re- APPENDIX B
lation. It can achieve the same level of additive noise variance as VARIANCE OF THE ESTIMATOR UNDER CAPE AND
the pooled data scenario in certain regimes. We further extend the POOLED-DATA SCENARIO
CAPE scheme to asymmetric network/privacy settings. These Proof of Lemma 1: We prove the lemma according to [18].
benefits of CAPE come in-part from assuming that sites can Recall that in the pooled data scenario, the sensitivity of
generate zero-sum correlated noise. This can be accomplished the function f (x) is N1 , where x = [x1 , . . . , xS ]. There-
by various methods, each of which comes with additional costs fore, to approximate f (x) satisfying (, δ)-DP, we need to
or assumptions on trust. We applied CAPE to DP decentralized
joint independent component analysis (capeDJICA) for col- additive Gaussian noise with standard deviation τpool =
have
1
laborative source separation. To handle the privacy composition N 2 log 1.25
δ [59]. Next, consider the decentralized data set-
for multi-round algorithms, we analyzed capeDJICA using the ting (as in Section
II) with local noise standard deviation given
1 1.25 1.25
Rényi differential privacy and the moments accountant. We by τs = Ns 2 log δ = S
N 2 log δ = τ . We observe
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6367
2
2
τpool = τSs ⇒ τpool = Sτ 2 . We will now show that the CAPE Definition 5 (Majorization): Consider two vectors a ∈ RS
algorithm will yield the same noise variance of the estimator at and b ∈ RS with non-increasing entries (i.e., ai ≥ aj and
the aggregator. Recall that at the aggregator we compute acape = bi ≥ bj for i < j). Then a is majorized by b, denoted a ≺ b,
S N S
1
s=1 âs = N
1
n=1 xn + S
1
s=1 gs . The variance of the
if and only if the following holds: Ss=1 as = Ss=1 bs and
S J J
s=1 as ≤ s=1 bs ∀J ∈ [S]
2 τ2 τ2 2
estimator acape is: τcape S · Sg2 = Sg = Sτ 2 , which is exactly
the same as the pooled data scenario. Therefore, the CAPE al- Consider nsym N S [1, . . . , 1] ∈ R for some positive N .
S
gorithm allows us to achieve the same additive noise variance as Then any vector n = [N1 , . . . , NS ] ∈ RS with non-increasing
the pooled data scenario in the symmetric setting (Ns = N entries and Ss=1 |Ns | = N majorizes nsym , or nsym ≺ n.
S and
τs2 = τ 2 ∀s ∈ [S]), while satisfying (, δ) differential privacy at Definition 6 (Schur-convex functions): The function K :
the sites and for the final output from the aggregator, where (, δ) RS → R is Schur-convex if K(a) ≤ K(b) ∀ a ≺ b ∈ RS .
satisfy δ = 2 −μ
σz
z σz ).
φ( −μ z
Lemma 2: If K is symmetric and convex, then K is Schur-
convex. The converse does not hold.
APPENDIX C Proof of Proposition 2: As the sites are computing the func-
PERFORMANCE IMPROVEMENT OF CAPE tion f with L2 sensitivity Δ(N ), the local noise standard devia-
tion for preserving privacy is proportional to Δ(Ns ) by Gaussian
Proof of Proposition 1: The local noise variances are mechanism [9]. It can be written as: τs = Δ(Ns )C, where C is
{τs2 } for s ∈ [S]. In the conventional decentralized DP a constant for a given (, δ) pair. Similarly, the noise standard
scheme,
1
S we compute 1
S the following at the aggregator: aconv = deviation in the pooled data scenario can be written as: τpool =
a
s=1 s + es . The variance of the estimator is: Δ(N )C. Now, the final noise variance at the aggregator for
S
s Sτs2 s=11 s
τconv = s=1 S 2 = S 2 s=1 τs2 . In the CAPE scheme, we
2
CAPE protocol is: τcape2 τ2
= Ss=1 Sg2 = S13 Ss=1 Δ2 (Ns )C 2 .
compute the following quantity at the aggregator: acape = τ2
S
Δ2 (N )
1
S 1
S 1
S Observe: H(n) = τcape = Ss=1 s
. As we want to achieve
s=1 as + S s=1 es + S
2 3 Δ2 (N )
S s=1 gs . The variance of the pool
s τg the same noise variance as the pooled-data scenario, we need
2
estimator is: τcape = s=1 S 2 = S13 ss=1 τs2 . Therefore, the
2
2
S 3 Δ2 (N ) = Ss=1 Δ2 (Ns ), which proves the case for general
CAPE scheme provides a reduction G = ττconv
2 = S in noise sensitivity function Δ(N ). Now, if Δ2 (N ) is convex then the
cape
variance over conventional decentralized DP approach in the by Lemma 2 the function K(n) = Ss=1 Δ2 (Ns ) is Schur-
2 2
symmetric setting (Ns = NS and τs = τ ∀s ∈ [S]). convex. Thus, the minimum of K(n) is obtained when n =
S 2 N
nsym = [ N S , . . . , S ]. We observe: Kmin (n) =
N
s=1 Δ ( S ) =
2 N
APPENDIX D S · Δ ( S ). Therefore, for convex Δ(N ), we have H(n) = 1 if
ADDITIONAL BACKGROUND CONCEPTS Δ( NS ) = SΔ(N ).
Rényi Differential Privacy: Analyzing the total privacy loss
of a multi-shot algorithm, each stage of which is DP, is a REFERENCES
challenging task. It has been shown [16], [17] that the advanced
[1] H. Imtiaz et al., “Privacy-preserving source separation for distributed data
composition theorem [59] for (, δ)-differential privacy can be using independent component analysis,” in Proc. Annu. Conf. Inf. Sci.
loose. The main reason is that one can formulate infinitely many Syst., 2016, pp. 123–127. [Online]. Available: http://dx.doi.org/10.1109/
(, δ)-DP algorithms for a given noise variance τ 2 . RDP offers CISS.2016.7460488
[2] P. M. Thompson et al., “ENIGMA and the individual: Predicting fac-
a much simpler composition rule that is shown to be tight [16]. tors that affect the brain in 35 countries worldwide,” Neuroimage,
We review some properties of RDP [16]. vol. 145, pp. 389–408, 2017. [Online]. Available: https://doi.org/10.1016/
Proposition 3 (From RDP to differential privacy [16]): If j.neuroimage.2015.11.057
[3] S. M. Plis et al., “COINSTAC: A privacy enabled model and prototype
A is an (α, r )-RDP mechanism, then it also satisfies (r + for leveraging and processing decentralized brain imaging data,” Front.
log δ1r Neurosci., vol. 10, 2016, Art. no. 365. [Online]. Available: https://doi.org/
α−1 , δr )-differential privacy for any 0 < δr < 1. 10.3389/fnins.2016.00365
Proposition 4 (Composition of RDP [16]): Let A : D → T1 [4] K. W. Carter et al., “ViPAR: A software platform for the virtual pooling and
be (α, r1 )-RDP and B : T1 × D → T2 be (α, r2 )-RDP, then analysis of research data,” Int. J. Epidemiol., vol. 45, no. 2, pp. 408–416,
2015. [Online]. Available: https://doi.org/10.1093/ije/dyv193
the mechanism defined as (X, Y ), where X ∼ A(D) and Y ∼ [5] A. Gaye et al., “DataSHIELD: Taking the analysis to the data. not the data
B(X, D), satisfies (α, r1 + r2 )-RDP. to the analysis,” Int. J. Epidemiol., vol. 43, no. 6, pp. 1929–1944, 2014.
Proposition 5 (RDP and Gauss. Mech. [16]): If A has L2 sen- [Online]. Available: https://doi.org/10.1093/ije/dyu188
[6] A.Narayanan and V. Shmatikov, “How to break anonymity of the Netflix
sitivity 1, then the Gaussian mechanism Gσ A(D) = A(D) + E prize dataset,” 2006. [Online]. Available: http://arxiv.org/abs/cs/0610105
satisfies (α, 2σα2 )-RDP, and a composition of J Gaussian mech- [7] L.Sweeney, “Only you, your doctor, and many others may know,” Technol.
2
anisms satisfies (α, 2σ αJ
2 )-RDP, where E ∼ N (0, σ ).
Sci., vol. 2015092903, no. 9, p. 29, 2015. [Online]. Available: https://
techscience.org/a/2015092903
The proofs of the Propositions 3, 4 and 5 are provided in [16]. [8] J. L. Ny and G. J. Pappas, “Differentially private Kalman filter-
ing,” in Proc. 50th Annu. Allerton Conf. Commun., Control, Com-
put., 2012, pp. 1618–1625. [Online]. Available: https://doi.org/10.1109/
APPENDIX E Allerton.2012.6483414
APPLICABILITY OF CAPE IN MACHINE LEARNING [9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to
sensitivity in private data analysis,” in Proc. 3rd Conf. Theory Cryptog-
First, we review some definitions and lemmas [66, Proposition raphy, 2006, pp. 265–284. [Online]. Available: http://dx.doi.org/10.1007/
C.2] necessary for the proof. 11681878_14
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6368 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
[10] V. D. Calhoun et al., “Independent component analysis for brain fMRI [33] A. Rajkumar and S. Agarwal, “A differentially private stochastic gradient
does indeed select for maximal independence,” PLoS One, vol. 8, 2013, descent algorithm for multiparty classification,” in Artif. Intell. Statist.,
Art. no. e73309. [Online]. Available: http://dx.doi.org/10.1371/journal. 2012, pp. 933–941.
pone.0073309 [34] R. Bassily et al., “Practical locally private heavy hitters,” in Adv. Neural
[11] V. D.Calhoun and T. Adali, “Multisubject independent component analysis Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R.
of fMRI: A decade of intrinsic networks, default mode, and neurodiagnos- Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc.,
tic discovery,” IEEE Rev. Biomed. Eng., vol. 5, pp. 60–73, Aug. 2012. 2017, pp. 2288–2296. [Online]. Available: http://papers.nips.cc/paper/
[Online]. Available: https://doi.org/10.1109/RBME.2012.2211076 6823-practical-locally-private-heavy-hitters.pdf
[12] P. Comon, “Independent component analysis, a new concept?,” Signal [35] X. Wu et al., “Bolt-on differential privacy for scalable stochastic gradient
Process., vol. 36, no. 3, pp. 287–314, 1994. descent-based analytics,” in Proc. ACM Int. Conf. Manage. Data. New
[13] B. T. Baker et al., “Decentralized temporal independent component analy- York, NY, USA, 2017, pp. 1307–1322. [Online]. Available: http://doi.acm.
sis: Leveraging fMRI data in collaborative settings,” NeuroImage, vol. 186, org/10.1145/3035918.3064047
pp. 557–569, 2019. [36] A.Smith, “Privacy-preserving statistical estimation with optimal conver-
[14] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber, gence rates,” in Proc. 43rd Annu. ACM Symp. Theory Comput., New York,
“Privacy: Theory meets practice on the map,” in Proc. IEEE 24th Int. NY, USA, 2011, pp. 813–822. [Online]. Available: http://doi.acm.org/10.
Conf. Data Eng., 2008, pp. 277–286. 1145/1993636.1993743
[15] S. Meiser, “Approximate and probabilistic differential privacy definitions,” [37] F.McSherry and K. Talwar, “Mechanism design via differential privacy,”
IACR Cryptol. ePrint Arch., vol. 2018, 2018, Art. no. 277. in Proc. 48th Annu. IEEE Symp. Found. Comput. Sci., 2007, pp. 94–103.
[16] I.Mironov, “Rényi differential privacy,” in Proc. 30th Comput. Secur. [Online]. Available: http://dx.doi.org/10.1109/FOCS.2007.41
Found. Symp. (CSF), 2017, pp. 263–275. [38] B. McMahan et al., “Communication-efficient learning of deep networks
[17] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., Ser.
SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, 2016, Proc. Mach. Learn. Res., A. Singh and J. Zhu, Eds., vol. 54. Fort Laud-
pp. 308–318. [Online]. Available: http://doi.acm.org/10.1145/2976749. erdale, FL, USA, 20-22 Apr. 2017, pp. 1273–1282. [Online]. Available:
2978318 http://proceedings.mlr.press/v54/mcmahan17a.html
[18] H.Imtiaz and A. D. Sarwate, “Distributed differentially-private algorithms [39] P. Kairouz et al., “Advances and open problems in federated learning,”
for matrix and tensor factorization,” IEEE J. Sel. Topics Signal Process., Dec. 2019. [Online]. Available: https://arxiv.org/abs/1912.04977
vol. 12, no. 6, pp. 1449–1464, Dec. 2018. [Online]. Available: https://doi. [40] M. Heikkilä, E. Lagerspetz, S. K. Kaski, S. Shimizu Tarkoma, and A.
org/10.1109/JSTSP.2018.2877842 Honkela, “Differentially private bayesian learning on distributed data,” in
[19] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. H. Bengio, R.
optimization and statistical learning via the alternating direction method Wallach, S. Fergus Vishwanathan, and R. Garnett, Eds. Curran Associates,
of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, Inc., 2017, pp. 3229–3238.
Jan. 2011. [Online]. Available: http://dx.doi.org/10.1561/2200000016 [41] S. Goryczka, L. Xiong, and V. Sunderam, “Secure multiparty aggregation
[20] D. K. Molzahn et al., “A survey of distributed optimization and control with differential privacy: A comparative study,” in Proc. Joint EDBT/ICDT
algorithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, Workshops. New York, NY, USA, 2013, pp. 155–163. [Online]. Available:
no. 6, pp. 2941–2962, Nov. 2017. http://doi.acm.org/10.1145/2457317.2457343
[21] C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedić, “Optimal algorithms for [42] P. Kairouz, S. Oh, and P. Viswanath, “Secure multi-party differential
distributed optimization,” 2017, arXiv:1712.00232. privacy,” in Adv. Neural Inf. Process. Syst., C. Cortes, N. D. Lawrence,
[22] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributed D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc.,
constrained optimization,” IEEE Trans. Autom. Control, vol. 62, no. 1, 2015, pp. 2008–2016.
pp. 50–64, Jan. 2017. [43] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
[23] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private distributed machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur..
convex optimization via objective perturbation,” in Proc. Amer. Control New York, NY, USA, 2017, pp. 1175–1191. [Online]. Available: http://doi.
Conf., 2016, pp. 2061–2066. acm.org/10.1145/3133956.3133982
[24] J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed [44] S. A. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A.
online algorithms over time-varying directed networks,” IEEE Trans. Smith, “What can we learn privately?,” in Proc. IEEE 49th Annu. IEEE
Signal Inf. Process. Netw., vol. 4, no. 1, pp. 4–17, Mar. 2018. Symp. Found. Comput. Sci., 2008, pp. 531–540. [Online]. Available: http:
[25] K. Chaudhuri and C. Monteleoni, “Privacy-preserving logistic regression,” //dx.doi.org/10.1109/FOCS.2008.27
in Proc. 21st Int. Conf. Neural Inf. Process. Syst. 21, Vancouver, British [45] A. Beimel, K. Nissim, and U. Stemmer, “Characterizing the sample
Columbia, Canada: Curran Associates, Inc., 2008, pp. 289–296. complexity of private learners,” in Proc. 4th Conf. Innov. Theor. Com-
[26] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially pri- put. Sci., New York, NY, USA: Association for Computing Machinery,
vate empirical risk minimization,” J. Mach. Learn. Res., vol. 12, 2013, pp. 97–110. [Online]. Available: https://doi.org/10.1145/2422436.
pp. 1069–1109, Jul. 2011. [Online]. Available: http://dl.acm.org/citation. 2422450
cfm?id=1953048.2021036 [46] B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by sub-
[27] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradient descent sampling: Tight analyses via couplings and divergences,” in Adv. Neural
with differentially private updates,” in Proc. IEEE Glob. Conf. Signal Inf. Inf. Process. Syst., S. Bengio, H. Wallach, H. Larochelle, K. Grauman,
Process., 2013, pp. 245–248. N. Cesa-Bianchi, and R. Garnett, Eds. Curran Associates, Inc., 2018,
[28] Z. Ji, Z. C. Lipton, and C. Elkan, “Differential privacy and machine pp. 6277–6287.
learning: A survey and review,” 2014, arXiv:1412.7584. [47] B. B., B. J., G. A., and N. K., “The privacy blanket of the shuffle model,”
[29] C. Li, P. Zhou, L. Xiong, Q. Wang, and T. Wang, “Differentially private in Adv. Cryptology - CRYPTO 2019. CRYPTO 2019, ser. Lecture Notes
distributed online learning,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 8, Comput. Sci., B. A. and M. D., Eds. Cham: Springer, 2019, vol. 11693.
pp. 1440–1453, Aug. 2018, doi: 10.1109/TKDE.2018.2794384. [48] Ú. Erlingsson et al., “Encode, shuffle, analyze privacy revisited: Formal-
[30] R. Bassily, A. Smith, and A. Thakurta, “Private empirical risk minimiza- izations and empirical evaluation,” Jan. 2020, arXiv:2001.03618.
tion: Efficient algorithms and tight error bounds,” in Proc. IEEE 55th Annu. [49] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our
Symp. Found. Comput. Sci., Washington, DC, USA, 2014, pp. 464–473. data, ourselves: privacy via distributed noise generation,” in Adv. Cryp-
[Online]. Available: http://dx.doi.org/10.1109/FOCS.2014.56 tology, vol. 4004. Saint Petersburg, Russia: Springer Verlag, May 2006,
[31] K. Ligett, S. A. Neel, B. Roth Waggoner, and S. Z. Wu, “Accuracy first: pp. 486–503.
Selecting a differential privacy level for accuracy constrained ERM,” in [50] B. Anandan and C. Clifton, “Laplace noise generation for two-party
Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. H. Bengio, R. computational differential privacy,” in Proc. 13th Annu. Conf. Privacy,
Wallach, S. Fergus Vishwanathan, and R. Garnett, Eds. Curran Associates, Secur. Trust, 2015, pp. 54–61.
Inc., 2017, pp. 2563–2573. [51] J. Sui, T. Adalı, G. D. Pearlson, and V. D. Calhoun, “An ICA-based
[32] D. Wang, M. Ye, and J. Xu, “Differentially private empirical risk min- method for the identification of optimal fMRI features and components
imization revisited: Faster and more general,” in Proc. Adv. Neural Inf. using combined group-discriminative techniques,” NeuroImage, vol. 46,
Process. Syst., Long Beach, California, USA, 2017, pp. 2719–2728. no. 1, pp. 73–86, 2009. [Online]. Available: http://dx.doi.org/10.1016/j.
neuroimage.2009.01.026
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6369
[52] J.Liu and V. Calhoun, “Parallel independent component analysis for mul- [73] R.Engle, “Autoregressive conditional heteroscedasticity with estimates
timodal analysis: Application to fMRI and EEG data,” in Proc. 4th IEEE of the variance of United Kingdom inflation,” Econometrica, vol. 50,
Int. Symp. Biomed. Imaging: From Nano to Macro, 2007, pp. 1028–1031. no. 4, pp. 987–1007, 1982. [Online]. Available: http://dx.doi.org/10.2307/
[Online]. Available: http://dx.doi.org/10.1109/ISBI.2007.357030 1912773
[53] C. Bordier, M. Dojat, and P. L. de Micheaux, “Temporal and spatial [74] T.Bollerslev, “Generalized autoregressive conditional heteroskedasticity,”
independent component analysis for fMRI data sets embedded in a r. J. Econometrics, vol. 31, pp. 307–327, 1986. [Online]. Available: http:
package,” 2010, arXiv:1012.0269. //dx.doi.org/10.1.1.161.7380
[54] V. Calhoun, T. Adali, G. Pearlson, and J. Pekar„ “A method for making [75] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, “The staircase mechanism
group inferences from functional MRI data using independent component in differential privacy,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 7,
analysis,” Hum. Brain Mapping, vol. 14, no. 3, pp. 140–151, 2001. [On- pp. 1176–1184, Oct. 2015.
line]. Available: https://doi.org/10.1002/hbm.1048 [76] A.Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11,
[55] E. A. Allen et al., “A baseline for the multivariate comparison of resting pp. 612–613, Nov. 1979. [Online]. Available: http://doi.acm.org/10.1145/
state networks,” Front. Syst. Neurosci., vol. 5, no. 2, 2011. [Online]. 359168.359176
Available: http://dx.doi.org/10.3389/fnsys.2011.00002 [77] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Trans.
[56] V. Calhoun et al., “Method for multimodal analysis of independent source Inf. Theory, vol. 22, no. 6, pp. 644–654, Nov. 1976.
differences in schizophrenia: Combining gray matter structural and audi- [78] J. So, B. Guler, and A. Salman Avestimehr, “CodedPrivateML: A fast and
tory oddball functional data,” Hum. Brain Mapping, vol. 27, no. 1, pp. 47– privacy-preserving framework for distributed machine learning,” IEEE J.
62, 2006. [Online]. Available: http://dx.doi.org/10.1002/hbm.20166 Sel. Areas Inf. Theory, vol. 2, no. 1, pp. 441–451, 2021.
[57] A. J.Bell and T. J. Sejnowski, “An information-maximization approach to [79] I.Mironov, “On significance of the least significant bits for differential
blind separation and blind deconvolution,” Neural Computation, vol. 7, privacy,” in Proc. ACM Conf. Comput. Commun. Secur., New York,
no. 6, pp. 1129–1159, Nov. 1995. [Online]. Available: http://dx.doi.org/ NY, USA, 2012, pp. 650–661. [Online]. Available: http://doi.acm.org/10.
10.1162/neco.1995.7.6.1129 1145/2382196.2382264
[58] C. Dwork, “Differential privacy,” in Automata, Lang. Program., Berlin, [80] V. Balcer and S. Vadhan, “Differential privacy on finite computers,” in
Heidelberg: Springer, 2006, pp. 1–12. Proc. 9th Innov. Theor. Comput. Sci. Conf., Leibniz Int. Proc. Inform.,
[59] C.Dwork and A. Roth, “The algorithmic foundations of differential pri- vol. 94. Dagstuhl, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer In-
vacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3-4, pp. 211–407, formatik, 2018, pp. 43: 1–43:21.
2013. [Online]. Available: http://dx.doi.org/10.1561/0400000042 [81] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential
[60] G. R. Kurri, V. M. Prabhakaran, and A. D. Sarwate, “Coordination privacy,” in Proc. IEEE 51st Annu. Symp. Found. Comput. Sci., 2010,
through shared randomness,” IEEE Trans. Inf. Theory, vol. 67, no. 8, pp. 51–60.
pp. 4948–4974, Aug. 2021. [Online]. Available: https://doi.org/10.1109/ [82] H. Imtiaz, J. Mohammadi, and A. D. Sarwate, “Distributed differentially
TIT.2021.3091604 private computation of functions with correlated noise,” Apr. 2019. [On-
[61] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. New York, NY, line]. Available: https://arxiv.org/abs/1904.10059
USA: Cambridge Univ. Press, 2012. [83] S. Song, K. Chaudhuri, and A. D. Sarwate, “Learning from data with
[62] S. Malluri and V. K. Pamula, “Gaussian q-function and its approxima- heterogeneous noise using SGD,” in Proc. 18th Int. Conf. Artif. Intell.
tions,” in Proc. Int. Conf. Commun. Syst. Netw. Technol., 2013, pp. 74–77. Statist., 2015, vol. 38, pp. 894–902. [Online]. Available: http://jmlr.org/
[63] D.Desfontaines and B. Pejó, “SoK: Differential privacies,” Proc. Privacy proceedings/papers/v38/song15.html
Enhancing Technol., vol. 2020, no. 2, pp. 288–313, 2020. [Online]. Avail-
able: https://doi.org/10.2478/popets-2020-0028
[64] J. Zhao et al., “Reviewing and improving the gaussian mechanism for
differential privacy,” 2019. [Online]. Available: http://arxiv.org/abs/1911. Hafiz Imtiaz received the B.Sc. and first M.Sc. de-
12060 grees from the Bangladesh University of Engineer-
[65] K. Nordhausen, E. Ollila, and H. Oja, “On the performance indices of ICA ing and Technology (BUET), Dhaka, Bangladesh,
and blind source separation,” in Proc. 12th IEEE Int. Workshop Signal in 2009 and 2011, respectively, the second M.Sc.
Process. Adv. Wireless Commun., 2011, pp. 486–490. [Online]. Available: degree from Rutgers University New Brunswick, NJ,
http://dx.doi.org/10.1109/SPAWC.2011.5990458 USA, in 2017, and the Ph.D. degree from Rutgers
[66] A. W.Marshall, l. Olkin, and B. C. Arnold, Inequalities: Theory of Ma- University, New Brunswick, NJ, USA, in 2020. He is
jorization and Its Applications. New York, NY, USA: Springer-Verlag, currently an Assistant Professor with the Department
1979. [Online]. Available: https://link.springer.com/book/10.1007/978-0- of Electrical and Electronic Engineering, BUET. Pre-
387-68276-1 viously, he worked as an Intern with Qualcomm and
[67] W.Rudin, Principles of Mathematical Analysis. McGraw-Hill Higher Intel Labs, focusing on activity/image analysis and
Education, 1976. [Online]. Available: https://www.mheducation. adversarial attacks on neural networks, respectively. His primary area of research
com/highered/product/principles-mathematical-analysis-rudin/ interests include developing privacy-preserving machine learning algorithms for
M007054235X.html decentralized data settings. More specifically, he focuses on matrix and tensor
[68] H.Imtiaz and A. D. Sarwate, “Differentially private distributed principal factorization, and optimization problems, which are core components of many
component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech Signal modern machine learning algorithms.
Process., 2018, pp. 2206–2210. [Online]. Available: https://doi.org/10.
1109/ICASSP.2018.8462519
[69] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm Jafar Mohammadi received the Doctoral degree in
for blind signal separation,” in Proc. 8th Int. Conf. Neural Inf. Process. electrical engineering from the Technical University
Syst., Cambridge, MA, USA: MIT Press, 1995, pp. 757–763. [Online]. of Berlin, Germany, in 2016. He has been with Hein-
Available: http://dl.acm.org/citation.cfm?id=2998828.2998935 rich Hertz Institute Fraunhofer since 2011 as a Doc-
[70] Y. Cao, M. Yoshikawa, Y. Xiao, and L. Xiong, “Quantifying differential toral Candidate and later as a Researcher. In 2017,
privacy under temporal correlations,” in Proc. IEEE 33rd Int. Conf. Data he joined Rutgers University, NJ, USA, as a Post-
Eng., 2017, pp. 821–832. doctoral Researcher working on differential privacy
[71] L.Bottou, “On-line Learning in Neural Networks,” in On-line Learn. Neu- for distributed machine learning. He is currently a
ral Netw., D. Saad, Ed New York, NY, USA: Cambridge Univ. Press, 1998, Researcher with Nokia Bell Labs working on the
ch. On-line learning and stochastic approximations, pp. 9–42. [Online]. intersection of machine learning and wireless com-
Available: http://dl.acm.org/citation.cfm?id=304710.304720 munications. During his career, he was contributing
[72] K. Chaudhuri and S. A. Vinterbo, “A stability-based Validation procedure to flagship European funded projects, such as mmMAGIC and Hexa-X. His
for differentially private machine learning,” in Adv. Neural Inf. Process. main areas of interest could be summarized as using mathematical and machine
Syst., C. J. C. L. Burges, M. Bottou, Z. Welling Ghahramani, and K. Q. learning tools to optimize wireless communications systems.
Weinberger, Eds. Curran Associates, Inc., 2013, pp. 2652–2660.
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6370 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021
Rogers Silva (Member, IEEE) received the B.Sc. Anand D. Sarwate (Senior Member, IEEE) re-
degree in electrical engineering in 2003 from the ceived the B.S. degrees in electrical engineering and
Pontifical Catholic University, Porto Alegre, Brazil, computer science and mathematics from the Mas-
and both the M.S. degree in computer engineering sachusetts Institute of Technology, Cambridge, MA,
(with minors in statistics and in mathematics) in 2011, USA, in 2002, and the M.S. and Ph.D. degrees in
and the Ph.D. degree in computer engineering (with electrical engineering from the Department of Electri-
distinction) in 2017, from The University of New cal Engineering and Computer Sciences, University
Mexico, Albuquerque, NM, USA. He is currently a of California, Berkeley (U.C. Berkeley), Berkeley,
Research Scientist with the Tri-institutional Center CA, USA. He is a currently an Associate Professor
for Translational Research in Neuroimaging and Data with the Department of Electrical and Computer En-
Science (TReNDS) with Georgia State University, gineering, The State University of New Jersey, New
Georgia Institute of Technology, and Emory University. He is also the Leader of Brunswick, NJ, USA, where he has been since January 2014. He was previously
the #BSIsubspace Section of the Brain Space Initiative. Previously, he was a Research Assistant Professor from 2011 to 2013 with the Toyota Technological
a Postdoctoral Fellow with The Mind Research Network, a Data Scientist with Institute at Chicago. Prior to this, he was a Postdoctoral Researcher from 2008
Datalytic Solutions, and worked as an Engineer, Lecturer, and Consultant. As a to 2011 with the University of California, San Diego, CA, USA. His research
Multidisciplinary Scientist, he develops algorithms for statistical and machine interests include information theory, machine learning, signal processing, opti-
learning, image analysis, numerical optimization, memory efficient large scale mization, and privacy and security. Dr. Sarwate was the recipient of the Rutgers
data reduction, and distributed analyses, focusing on multimodal, multi-subject Board of Governors Research Fellowship for Scholarly Excellence in 2020 and
neuroimaging data from thousands of subjects. the NSF CAREER award in 2015. He is a member of Phi Beta Kappa and Eta
Kappa Nu.
Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.