0% found this document useful (0 votes)

5 views

Final Version on IEEE

Uploaded by

billahmotasim2605

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Final Version on IEEE

Uploaded by

billahmotasim2605

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

69, 2021 6355

A Correlated Noise-Assisted Decentralized

Differentially Private Estimation Protocol,
and its Application to fMRI Source Separation
Hafiz Imtiaz , Jafar Mohammadi, Rogers Silva, Member, IEEE, Bradley Baker, Sergey M. Plis ,
Anand D. Sarwate , Senior Member, IEEE, and Vince D. Calhoun , Fellow, IEEE

Abstract—Blind source separation algorithms such as indepen- often involve a modest number of individuals and privacy
dent component analysis (ICA) are widely used in the analysis concerns can preclude sharing “raw” data with collaborators.
of neuroimaging data. To leverage larger sample sizes, different Performing a new joint analysis across the individual data
data holders/sites may wish to collaboratively learn feature rep-
resentations. However, such datasets are often privacy-sensitive, points requires access to individuals’ data. Therefore, research
precluding centralized analyses that pool the data at one site. In this groups often collaborate by performing meta-analyses limited
work, we propose a differentially private algorithm for performing to already-published aggregates/summaries of the data. For
ICA in a decentralized data setting. Due to the high dimension machine learning (ML) applications, each party/site may lack
and small sample size, conventional approaches to decentralized a sufficient number of samples to robustly estimate features
differentially private algorithms suffer in terms of utility. When
centralizing the data is not possible, we investigate the benefit of on their own, but the aggregate number of samples across
enabling limited collaboration in the form of generating jointly all sites can yield novel discoveries such as biomarkers for
distributed random noise. We show that such (anti) correlated noise diseases. Some noteworthy examples where individual research
improves the privacy-utility trade-off, and can reach the same level groups/sites may wish to collaborate, include:
of utility as the corresponding non-private algorithm for certain r a medical research consortium of several healthcare cen-
parameter choices. We validate this benefit using synthetic and real
ters/research labs for neuroimaging analysis [2]–[5]
neuroimaging datasets. We conclude that it is possible to achieve
meaningful utility while preserving privacy, even in complex signal
r a decentralized speech processing system to learn model
processing systems. parameters for speaker recognition
Index Terms—Differential privacy, decentralized computation, r a multi-party cyber-physical system for performing global
independent component analysis, correlated noise, fMRI. state estimation from sensor signals.
I. INTRODUCTION Although sending the data samples to a central repository or
aggregator can enable efficient feature learning, privacy con-
HARING data is a major challenge for researchers in a
S number of domains. In particular, human health studies
cerns and large communication overhead are often prohibitive
when sharing the data. Several previous works demonstrated
how modern signal processing and ML algorithms can po-
Manuscript received February 22, 2021; revised June 14, 2021, September
13, 2021, and October 28, 2021; accepted October 29, 2021. Date of publication tentially reveal information about individuals present in the
November 11, 2021; date of current version December 3, 2021. The associate dataset [6]–[8]. A mathematically rigorous framework for pro-
editor coordinating the review of this manuscript and approving it for publication tection against such information leaks is differential privacy [9].
was Dr. Alexander Bertrand. This work was supported in part by the US NIH un-
der Award 1R01DA040487, in part by the US NSF under Award CCF-1453432, Differentially private (DP) algorithms offer a quantifiable plau-
and in part by DARPA and SSC Pacific under Contract N66001-15-C-4070. sible deniability to the data owners regarding their participation.
This work significantly improves upon the preliminary work presented at Under differential privacy, the algorithm outputs are randomized
IEEE Annual Conference on Information Science and Systems, 2016. [DOI:
http://dx.doi.org/10.1109/CISS.2016.7460488] (Corresponding author: Hafiz in such a way that the presence or absence of any individual in
Imtiaz.) the dataset does not significantly affect the computation output.
Hafiz Imtiaz is with the Department of Electrical and Electronic Engineering, This randomization often takes the form of noise introduced
Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
(e-mail: [email protected]). somewhere in the computation, resulting in a loss in perfor-
Jafar Mohammadi is with the Nokia Bell Labs, 70435 Stuttgart, Germany mance or utility of the algorithm. Privacy risk is quantified by
(e-mail: [email protected]). a parameter or parameters, leading to a privacy-utility trade-off
Rogers Silva, Bradley Baker, Sergey M. Plis, and Vince D. Calhoun are with
the Tri-institutional Center for Translational Research in Neuroimaging and Data in DP algorithm design.
Science, Georgia State University, Georgia Institute of Technology and Emory In this paper, we consider blind source separation (BSS)
University, GA 30303 USA (e-mail: [email protected]; [email protected]; for neuroimaging, where several individual research groups
[email protected]; [email protected]).
Anand D. Sarwate is with the Department of Electrical and or sites wish to collaborate. The joint goal is to learn global
Computer Engineering, Rutgers University, NJ 08854 USA (e-mail: statistics/features utilizing data samples from all sites and en-
[email protected]). sure formal privacy guarantees. Unfortunately, conventional
This article has supplementary downloadable material available at
https://doi.org/10.1109/TSP.2021.3126546, provided by the authors. approaches to using differential privacy in decentralized settings
Digital Object Identifier 10.1109/TSP.2021.3126546 require introducing too much noise, leading to a poor trade-off.
1053-587X © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6356 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

The high dimensional nature of neuroimaging data also poses requirements or sample sizes at the sites. We show how
a challenge, yielding a large gap between the centralized and CAPE can be applied to convex optimization problems,
decentralized cases. We develop a decentralized computation such as empirical risk minimization (ERM) and computa-
framework for differentially private signal processing and ML tion of loss functions that are separable across sites.
applications, which can partially close this gap. To do this, we r We use CAPE to design a novel algorithm capeDJICA
need an additional resource: if sites can generate (anti) correlated for (, δ)-DP decentralized joint ICA. capeDJICA signif-
random noise independent of their data, we see significant icantly improves upon our earlier work [1] by taking ad-
improvements in performance. Enabling this resource could be vantage of the CAPE scheme. To address the multi-round
provided using a trusted third party or cryptographic methods, nature of the capeDJICA algorithm and to provide a tighter
which may incur different privacy/security costs that we do not characterization of privacy under composition, we provide
address here. We employ the scheme in our BSS application for an analysis using Rényi Differential Privacy (RDP) [16]
neuroimaging data that guarantees differential privacy with an and the moments accountant [17].
improved privacy-utility trade-off. r We demonstrate the effectiveness of CAPE and
The particular BSS algorithm we are considering is the in- capeDJICA on real and synthetic data with varying pri-
dependent component analysis (ICA), one of the most popular vacy levels, number of samples and other key parameters.
BSS techniques for neuroimaging studies [10]. ICA assumes We show that the capeDJICA can provide utility very close
that the observed signals are mixtures of statistically indepen- to that of a non-private algorithm [13] for some parameter
dent sources and aims to decompose the mixed signals into choices. In the regime of meaningful utility, capeDJICA
those sources. ICA has been widely used to estimate intrinsic outperforms the existing privacy-preserving algorithm [1].
connectivity networks from brain imaging data (e.g., functional The assumed availability of correlated noise, together with
magnetic resonance imaging or fMRI) [11]. Successful ap- our improved accounting of privacy via the Rényi Differ-
plication of ICA on fMRI can be attributed to both sparsity ential Privacy, enable us to achieve such performance even
and spatial or temporal independence among the underlying for strict privacy requirements.
sources [10]. The goal of temporal ICA is to identify temporally Note that, we showed a preliminary version of the CAPE
independent components that represent activation of different protocol in [18]. The protocol in this paper is more robust against
brain regions over time [12]. However, it requires the aggregate site dropouts and does not require a trusted third-party.
temporal dimension (of all subjects) to be at least similar to Related Work: There is a vast literature [19]–[24] on solving
the voxel dimension [13]. In most cases, the data from a single optimization problems in decentralized settings, both with and
medical center may not suffice for such analysis. We focus on the without privacy constraints. In the signal processing/ML con-
recently proposed decentralized joint ICA (djICA) algorithm, text, the most relevant ones to our current work are those using
which can perform temporal ICA of fMRI data [13] by allowing ERM and stochastic gradient descent (SGD) [17], [25]–[32].
research groups to jointly learn the underlying sources in a non Additionally, several works studied decentralized DP learning
privacy-preserving way. for locally trained classifiers [33], [34]. One of the most common
Our Contribution: Conventional approaches to decentralized approaches for ensuring differential privacy in optimization
DP algorithms result in too much noise (see Appendix A for problems is to employ randomized gradient computations [27],
an illustration). We propose a novel framework for decentral- [30]. Other common approaches include employing output per-
ized DP computations – correlation assisted private estimation turbation [26] and objective perturbation [23], [26]. A newly
(CAPE), to mitigate the noise. We show that if the sites can proposed take on output perturbation [35] injects noise after
sample their noise from an (anti) correlated distribution, we model convergence, which imposes some additional constraints.
can achieve significantly better privacy-utility trade-offs. Using In addition to optimization problems, Smith [36] proposed a
a trusted “noise generator,” or multiparty computation for the general approach for computing summary statistics using the
noise generation, involves assumptions on the trust model for sample-and-aggregate framework and both the Laplace and
the system and may incur additional costs depending on the Exponential mechanisms [37].
choice of implementation. However, we show that the effort One approach to handling decentralized learning with privacy
to implement this sampling capability is very beneficial for constraints is federated learning [38] and in particular, cross-silo
applications with high-dimensional data in the moderate sample learning [39]. Many approaches use multiparty computation
regime. We summarize our contributions here: (MPC), such as Heikkilä et al. [40], who also studied the
r We propose our new CAPE framework for decentralized relationship of additive noise and sample size in a decentralized
DP computations and provide theoretical guarantees on its setting. In their model, S data holders communicate their data to
privacy and utility properties. In CAPE, we first choose the M computation nodes to compute a function. Differential pri-
noise variances (e.g., to meet a utility constraint) and then vacy provides different guarantees (see [41], [42] for thorough
analyze the resulting mechanism to show that it provides comparisons between Secure Multi-party Computation (SMC)
an (, δ) differential privacy guarantee, where and δ and differential privacy) although we can use MPC protocols
satisfy a specific relation. We actually prove that CAPE to implement part of our algorithm [43]. Other approaches to
satisfies (, δ) probabilistic differential privacy [14], [15], using DP in federated learning operate in different regimes, such
which in-turn implies (, δ) differential privacy. We ex- as learning from a large number of individual data holders, or
tend the CAPE scheme to include asymmetric privacy learning from silos with a large number of data points at each

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
IMTIAZ et al.: CORRELATED NOISE-ASSISTED DECENTRALIZED DIFFERENTIALLY PRIVATE ESTIMATION PROTOCOL 6357

site. This allows for privacy amplification by subsampling [17], related to WX. More specifically, the objective of Infomax
[44]–[46] or using a trusted shuffler [47], [48]. In our application ICA is: W∗ = argmaxW H(G(WX)). Here, G(·) is the sigmoid
1
involving neuroimaging analysis, such techniques do not scale function and is given by: G(z) = 1+exp(−z) . Additionally, H(z)
as well, since sites often have a small number of samples. Our is the (differential) entropy
of a random vector z with joint
work is inspired by the seminal work of Dwork et al. [49] that density q: H(z) = − q(z) log q(z)dz. Here, we evaluate it on
proposed distributed noise generation for preserving privacy. We a matrix, implying sample averaging over its columns. Note
employ a similar principle as Anandan and Clifton [50] to reduce that the function G(·) is applied element-wise for matrix-valued
the noise added for differential privacy. Our approach seeks to arguments. That is, G(Z) is a matrix with the same size as Z and
leverage properties of conditional Gaussian distributions to gain [G(Z)]ij = G([Z]ij )).
some privacy amplification when learning from decentralized The Decentralized Data Problem: We consider a
data, and is complementary to these other techniques. decentralized-data model with S sites. There is a central node
In addition to generalized optimization methods, a number of that acts as an aggregator. We assume an “honest but curious”
modified ICA algorithms exist for joining various data sets [51] threat model: all parties follow the protocol but a subset are
together and performing simultaneous decomposition of data “curious” and can collude (maybe with an external adversary)
from a number of subjects and modalities [52]. Note that ICA can to learn other sites’ data/function outputs. Now, for the
be performed by considering voxels as variables or time points as decentralized ICA problem, suppose each site s has a collection
variables, leading to temporal and spatial ICA, respectively [53], of data matrices {Xs,m ∈ RD×Nm : m = 1, . . . , Ms } each
[54]. For instance, group spatial ICA (GICA) is noteworthy consisting of a time course of length Nm time points over D
for performing multi-subject analysis of task- and resting-state voxels for each of Ms individuals. We assume the data samples
fMRI data [11], [54], [55]. It assumes that the spatial map in the local sites are disjoint and come from different individuals.
components are similar across subjects (i.e., the overall spatial Sites concatenate their local data matrices temporally to form a
networks are stable across subjects for the experiment duration). D × Nm M s data matrix Xs ∈ R
D×Ns
, where Ns = Nm Ms .
The joint ICA (jICA) [56] algorithm for multi-modal data fusion Let N = Ss=1 Ns be the total number of samples and

assumes that the mixing process is similar over a group of M = Ss=1 Ms be the total number of individuals (across
subjects. Group temporal ICA also assumes common spatial all sites). We assume a global mixing matrix A ∈ RD×R
maps but pursues statistical independence of timecourses (acti- generates the time courses in Xs from underlying sources
vation of certain neurological regions) [13]. Consequently, like Ss ∈ RR×Ns at each site. This yields the following model:
jICA, the common spatial maps from temporal ICA describe a X = [AS1 . . . ASS ] = [X1 . . . XS ] ∈ RD×N . We want to
common mixing process among subjects. While very interesting, compute the global unmixing matrix W ∈ RR×D in the
temporal ICA of fMRI is typically not investigated because of decentralized setting. Because sharing the raw data between
the small number of time points in each data set, which leads sites is often impossible due to privacy constraints, we
to unreliable estimates [13]. The decentralized jICA overcomes develop methods that guarantee differential privacy [9]. More
that limitation by leveraging datasets from multiple sites. specifically, our goal is to use DP estimates of the local gradients
to compute the DP global unmixing matrix W such that it
closely approximates the true global unmixing matrix.
II. DATA AND PRIVACY MODEL
Notation: We denote vectors, matrices and scalars with bold
A. Definitions
lower case letters (x), bold upper-case letters (X) and unbolded
letters (M ), respectively. We denote indices with lower-case In differential privacy we consider a domain D of databases
letters and they typically run from 1 to their upper-case versions consisting of N records and define D and D to be neighbors if
(m ∈ {1, 2, . . . , M } [M ]). The n-th column of the matrix X they differ in a single record.
is denoted as xn . We denote the Euclidean (or L2 ) norm of a vec- Definition 1 ((, δ)-Differential Privacy [9]): An algorithm
tor and the spectral norm of a matrix with · 2 and the Frobenius A : D → T provides (, δ)-differential privacy ((, δ)-DP) if
norm with · F . Finally, the density of √ the standard normal Pr[A(D) ∈ S] ≤ exp() Pr[A(D ) ∈ S] + δ, for all measur-
random variable is given by φ(x) = (1/ 2π) exp(−x2 /2). able S ⊆ T and all neighboring data sets D, D ∈ D.
The ICA Model: In this paper, we consider the generative One way to interpret this is that the probability of the out-
ICA model as in [1], [13]. In the centralized scenario, the put of an algorithm is not changed significantly if the input
independent sources S ∈ RR×N are composed of N observa- database is changed by one entry. This definition is also known
tions from R statistically independent components. We have a as the Bounded Differential Privacy (as opposed to unbounded
linear mixing process defined by a mixing matrix A ∈ RD×R differential privacy [58]). Here, (, δ) are privacy parameters:
with D ≥ R, which forms the observed data X ∈ RD×N as lower (, δ) ensure more privacy. The parameter δ can be in-
a product X = AS. Many ICA algorithms propose recover- terpreted as the probability that the algorithm fails to provide
ing the unmixing matrix W ∈ RR×D , corresponding to the privacy risk . Note that (, δ)-differential privacy is known as
Moore-Penrose pseudo-inverse of A, denoted A+ , by trying the approximate differential privacy and -differential privacy
to maximize independence between rows of the product WX. (-DP) is known as pure differential privacy. In general, we
The maximal information transfer (infomax) [57] is a popular denote approximate (bounded) differentially private algorithms
heuristic for estimating W that maximizes an entropy functional with DP. There are several mechanisms for formulating a DP

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6358 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

algorithm. Additive noise mechanisms such as the Gaussian or

Algorithm 1. Generate Zero-Sum Noise.
Laplace mechanisms [9], [59] and random sampling using the
exponential mechanism [37] are among the most common ones. Require: Local noise variances {τs2 }
1: Each site generates ês ∼ N (0, τs2 )
For additive noise mechanisms, the standard deviation of the
noise is scaled to the sensitivity of the computation. 2: Aggregator computes Ss=1 ês using a non-colluding
Definition 2 (L2 -sensitivity [9]): The L2 -sensitivity of party (c.f. [18, Section III]) or using a secure MPC
a vector-valued function f (D) is Δ := maxD,D f (D) − protocol (such as SecureAgg
[43])
f (D )2 , where D, D ∈ D are neighboring datasets. 3: Aggregator broadcasts Ss=1 ês to all sites s ∈ [S]

Definition 3 (Gaussian Mechanism [59]): Let f : D → RD 4: Each site computes es = ês − S1 Ss =1 ês
be an arbitrary function with L2 -sensitivity Δ. The Gaussian 5: return es
Mechanism with parameter τ adds noise scaled to N (0, τ 2 ) to
and satisfies (, δ) differential
each of the D entries of the output Algorithm 2: Correlation Assisted Private Estimation
privacy for ∈ (0, 1) if τ ≥ Δ 2 log 1.25 (CAPE).
δ .
Note that, for any given (, δ) pair, we can calculate a noise Require: Data samples {xs }, local noise variances {τs2 }
variance τ 2 such that addition of a noise term drawn from 1: for s = 1, . . . , S do at each site
N (0, τ 2 ) guarantees (, δ)-differential privacy. There are in- 2: Generate es according to Algorithm 1
2
finitely many (, δ) pairs that yield the same τ 2 . Therefore, we 3: Generate gs ∼ N (0, τg2 ) with τg2 = τSs
parameterize our methods using τ 2 [18] in this paper. 4: Compute and send âs ← f (xs ) + es + gs
Definition 4 (Rényi Differential Privacy [16]): A ran- 5: end for
domized mechanism A : D → T is (α, r )-Rényi differen- 6: Compute acape ← S1 Ss=1 âs at the aggregator
tially private if, for any adjacent D, D ∈ D, the follow- 7: return acape
ing holds: Dα (A(D)A(D )) ≤ r . Here, Dα (P (x)Q(x)) =
1 P (x) α
α−1 log Ex∼Q ( Q(x) ) , α > 1 and P (x) and Q(x) are probabil- outputs from each site, as well as the output from the aggregator.
ity density functions defined on T . Additionally, the adversary can know everything about the col-
We use RDP to perform an analysis of the capeDJICA and luding sites (including their private data). We denote the number
convert those guarantees into an (, δ)-DP guarantee. Conven- of non-colluding sites with SH such that S = SC + SH . Without
tional privacy analysis of multi-shot algorithms tend to exagger- loss of generality, we designate the non-colluding sites with
ate the total privacy loss [16], [17]. RDP offers a much simpler {1, . . . , SH }.

composition rule that is shown to be tight [16]. Consider estimating the mean f (x) = N1 N n=1 xn of N
scalars x = [x1 , . . . , xN −1 , xN ] with each xi ∈ [0, 1]. The
sensitivity of the function f (x) is N1 . Assume that we have
III. CORRELATION ASSISTED PRIVATE ESTIMATION (CAPE) S sites with each site s ∈ [S] holding a disjoint dataset xs
In this section, we describe our CAPE scheme that can benefit of Ns = N/S samples. An aggregator wishes to estimate and
a broad class of function computations, including empirical publish the mean of all the samples. In a conventional decen-
average loss functions used in ML. We first discuss the CAPE tralized DP approach, sites add independent noise, resulting in
scheme and analyze the privacy and utility in the symmetric a large noise variance at the output (see Appendix A). In this
setting (equal sample sizes and privacy requirements at the sites). work, we ask: what additional resources are needed to achieve
Recognizing practical applications involving unequal sample better privacy-utility trade-offs? We show that if sites can sample
size/privacy requirements at sites, we then extend the CAPE noise from a joint distribution, then we can achieve significant
scheme to incorporate such scenarios. To use CAPE, the sites improvements, suggesting that correlated randomness can be
need to jointly generate a noise term which sum to zero. Enabling an important resource in decentralized settings and worth the
this functionality (which is not present in the conventional additional cost of implementing such a resource. Each site s
scheme) is what allows us to obtain improvements. To do this sends âs = f (xs ) + es + gs to the aggregator, where es and gs
in practice we can run the SecureAgg scheme [43] (details are two noise terms whose variances are chosen to ensure that the
in Appendix F in the Supplement), which securely computes noise es + gs is sufficient to guarantee (, δ)-differential privacy
sums of vectors in an honest-but-curious setup using a constant to f (xs ). Each site generates the noise gs ∼ N (0, τg2 ) locally
number of rounds with low communication overhead. Using this and the noise es ∼ N (0, τe2 ) jointly with all other sites such that
S
or any other scheme may have (non-differential) privacy costs: s=1 es = 0 (see Algorithm 1).
we focus on the differential privacy guarantees here, and our Generating Zero-sum Noise: In our proposed scheme, each
privacy guarantees should be read in this context. site s ∈ [S] generates a noise term ês ∼ N (0, τs2 ) indepen-

Trust/Collusion Model: In our proposed CAPE scheme, we dently. The aggregator then computes Ss=1 ês . There are sev-
assume that all of the S sites and the central node follow the eral options for this: i) using a non-colluding party acting as
protocol honestly. However, up to SC sites can collude with a noise generator (see Section III of Imtiaz and Sarwate [18])
an adversary to learn about some site’s data/function output. or ii) using a secure MPC protocol (see the SecureAgg proto-
The central node is also honest-but-curious (and therefore, col [43]) or iii) potentially using sampling from existing shared
can collude with an adversary). An adversary can observe the randomness (see Kurri et al. [60] and references therein). The

aggregator broadcasts Ss=1 ês to all the sites. Each site then sets simplicity, we assume that all sites have equal number of samples
(i.e., Ns = N 2 2
es = ês − S1 Ss =1 ês to achieve Ss=1 es = 0. We show the S ) and τs = τ .
complete noise generation procedure in Algorithm 1. Although To infer the private data of the sites s ∈ SH , the adversary
it is shown for scalars, it can be readily extended for array-valued can observe â = [â1 , . . . , âSH ] ∈ RSH and ê = s∈SH ês .
zero-sum noise terms. Note that the adversary can learn the partial sum ê because
Details of CAPE Protocol: We they can get the sum s ês from the aggregator and the noise
observe that the variance of terms {êSH +1 , . . . , êS } from the colluding sites. Therefore, the
es is given by τe2 = E[(ês − S1 Ss =1 ês )2 ] = (1 − S1 )τs2 . Ad-
2 adversary observes the vector y = [â , ê] ∈ RSH +1 to make
ditionally, we choose τg2 = τSs . Each site then generates the noise
inference about the non-colluding sites. To prove differential
gs ∼ N (0, τg2 ) independently and sends âs = f (xs ) + es + gs g(y|a)
privacy guarantee, we must show that | log g(y|a ) | ≤ holds
to the aggregator. Note that neither of the terms es or gs has
with probability (over the randomness of the mechanism) at
large enough variance to provide (, δ)-DP guarantee to f (xs ).
least 1 − δ. Here, a = [f (x1 ), . . . , f (xSH )] , and g(·|a) and
However, we chose the variances of es and gs to ensure that the
g(·|a ) are the probability density functions of y under a and a ,
es + gs is sufficient to ensure a DP guarantee to f (xs ) at site
respectively. The vectors a and a differ in only one coordinate
s. The chosen variance of gs also ensures that the output from
(neighboring). Without loss of generality, we assume that a
the aggregator would have the same noise variance as the DP
and a differ in the first coordinate. We note that the maximum
pooled-data scenario – observe that we compute the following
at
difference is N1s as the sensitivity of the function f (xs ) is N1s .
the aggregator (in Step 2 of Algorithm 2): acape = S1 Ss=1 âs =
1
S 1
S Recall that we release âs = f (xs ) + es + gs from each site.
S s=1 f (xs ) + S s=1 gs , where we used s es = 0. The We observe ∀s ∈ [S]: E(âs ) = f (xs ), var(âs ) = τ 2 . Addition-
2 τ2 2
variance of the estimator acape is τcape = S · Sg2 = τpool , which ally, ∀s1 = s2 ∈ [S], we have: E(âs1 âs2 ) = f (xs1 )f (xs2 ) −
τ2
S . That is, the random variable â is N (a, Σâ ), where Σâ =
is exactly the same as if all the data were pooled at the aggregator.
2
This claim is formalized in Lemma 1. We show the complete (1 + S1 )τ 2 I − 11 τS ∈ RSH ×SH and 1 is a vector of all ones.
algorithm in Algorithm 2. The privacy of Algorithm 2 is given Without loss of generality, we can assume [59] that a = 0
by Theorem 1. The communication cost of the scheme is shown and a = a − v, where v = [ N1s , 0, . . . , 0] . Additionally, the
in Appendix K in the Supplement.
random variable ê is N (0, τê2 ), where τê2 = SH τ2 . Therefore,
Theorem 1 (Privacy of CAPE Algorithm (Algorithm 2)): Σâ Σâê
Consider Algorithm 2 in the decentralized data setting of Sec- g(y|a) is the density of N (0, Σ), where Σ = Σ τê2
∈
âê
2 2
tion I with Ns = N S and τs = τ for all sites s ∈ [S]. Suppose R (SH +1)×(SH +1)
. With some simple algebra, we can find the
that at most SC sites can collude after execution. Then Algorithm expression for Σâê : Σâê = (1 − SSH )τ 2 1 ∈ RSH . If we denote
2 guarantees (, δ)-differential privacy for each site, where (, δ) ṽ = [v , 0] ∈ RSH +1 then we observe
satisfy the relation δ = 2 −μ σz
σz ), ∈ (0, 1) and (μz , σz )
φ( −μ z
z
g(y|a) 1
are given by log
= − y Σ−1 y − (y + ṽ) Σ−1 (y + ṽ)
g(y|a ) 2
9

2
S3 S − SC + 2 S−SC SC 1
μz = 2 2 + 2 , = 2y Σ−1 ṽ + ṽ Σ−1 ṽ
2τ N (1 + S) S − SC S(1 + S) − 3SC 2
(1) 1
= y Σ−1 ṽ + ṽ Σ−1 ṽ = |z|,
σz2 = 2μz . (2) 2
where z = y Σ−1 ṽ + 12 ṽ Σ−1 ṽ. Using the matrix inversion
Remark 1: As mentioned in the Introduction, CAPE takes the
lemma for block matrices [61, Section 0.7.3] and some algebra,
target noise variances as inputs rather than the privacy parame-
we have
ters (, δ). We can think of this as setting a constraint on the utility
at the input and then using Theorem 1 to specify the privacy −1 Σ−1 1 −1 −1
â + K Σâ Σâê Σâê Σâ −K1 −1
Σâ Σâê
Σ = 1 −1 1
,
guarantee. This approach allows us to leverage composition −K Σâê Σâ K
results using Rényi-DP for iterative algorithms to optimize
for a target δ. where Σ−1 S 2 2
â = (1+S)τ 2 (I + SH 11 ) and K = τê − Σâê Σâ
−1

Remark 2: Theorem 1 is stated for the symmetric setting: Σâê . Note that z is a Gaussian random variable N (μz , σz2 ) with
2 2
Ns = N S and τs = τ ∀s ∈ [S]. Additionally, as with many parameters μz = 12 ṽ Σ−1 ṽ and σz2 = ṽ Σ−1 ṽ given by (1) and
algorithms using the approximate differential privacy, the guar- (2), respectively. Now, we observe
antee holds for a range of (, δ) pairs subject to a trade-off
g(y|a)
constraint between and δ, as in the simple case in Definition 3. Pr log ≤ = Pr [|z| ≤ ] = 1 − 2 Pr [z > ]
Proof: As mentioned before, we identify the SH non- g(y|a )

colluding sites with s ∈ {1, . . . , SH } SH and the SC collud- − μz
ing sites with s ∈ {SH + 1, . . . , S} SC . The adversary can = 1 − 2Q
σz
observe the outputs from each site (including the aggregator).
Additionally, the colluding sites can share their private data and σz − μz
>1−2 φ ,
the noise terms, ês and gs for s ∈ SC , with the adversary. For − μz σz

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6360 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

where Q(·) is the Q-function [62] and φ(·) is the density for with δconv and δpool in Appendix H in the Supplement. Here,
standard Normal random variable. The last inequality follows δconv and δpool are the smallest δ guarantees we can afford in
from the bound Q(x) < φ(x) x [62]. Therefore, the proposed the conventional decentralized DP scheme and the pooled-data
CAPE ensures (, δ)-DP with δ = 2 −μ σz
z σz ) for each site,
φ( −μ z scenario to achieve the same noise variance as the pooled-data
assuming that the number of colluding sites is at-most SC . As scenario for a given . Additionally, we empirically compare δ,
the local datasets are disjoint and differential privacy is invariant δconv and δpool for weaker collusion assumptions in Appendix H
under post processing, the release of acape also satisfies (, δ) in the Supplement. In both cases, we observe that δ is always
g(y|a)
differential privacy. Note that, we proved that | log g(y|a ) | ≤
smaller than δconv and smaller than δpool for some τ values.
holds with probability (over the randomness of the mechanism) That is, for achieving the same noise level at the aggregator
at least 1 − δ. This actually proves probabilistic differential as the pooled-data scenario, we are ensuring a much better
privacy [14], [15], which in-turn implies approximate or (, δ)- privacy guarantee by employing the CAPE scheme over the
differential privacy [63], [64]. Furthermore, for meaningful pri- conventional approach.
vacy guarantee, we require μz ≤ in addition to ∈ (0, 1), as we Lemma 1: Consider the symmetric setting: Ns = N S and
ensured in our experiments. The scheme fails to provide formal τs = τ 2 for all sites s ∈ [S]. Let the variances of the noise
2

privacy if the number of colluding sites exceeds SC , which is terms es and gs (Step 2 of Algorithm 2) be τe2 = (1 − S1 )τ 2 and
2
determined by the choice of the scheme used for computing τg2 = τS , respectively. If we denote the variance of the additive
S
ê
s=1 s . noise (for preserving privacy) in the pooled data scenario by
2
Note that, traditional privacy mechanisms specify the mini- τpool and the variance of the estimator acape (Step 2 of Algorithm
2
mum noise needed for a given (, δ) pair. As mentioned before, 2) by τcape then Algorithm 2 achieves the same noise variance
2 2
there are infinitely many (, δ) pairs that yield the same noise as the pooled-data scenario (i.e., τpool = τcape ).
variance τ 2 . Additionally, there are several real-world applica- Proof: The proof is given in Appendix B.
tions, especially in medical and human health research, where Proposition 1: (Performance improvement) If the local noise
we need to ensure a given utility for the privacy-preserving variances are {τs2 } for s ∈ [S] then the CAPE scheme provides
2
technique. For example, ICA is widely used in neuroimaging a reduction G = ττconv
2 = S in noise variance over conventional
cape
applications. One performance index for successful ICA is the
decentralized DP scheme in the symmetric setting (Ns = N S and
normalized gain index q NGI [13], [65] that quantizes the quality
τs2 = τ 2 ∀s ∈ [S]), where τconv
2 2
and τcape are the noise variances
of the unmixing matrix. For practical usability of the recov-
of the final estimate at the aggregator in the conventional scheme
ered mixing matrix, we need to achieve q NGI ≤ 0.1 [13]. As
and the CAPE scheme, respectively.
mentioned before, our CAPE algorithm is motivated by such
Proof: The proof is given in Appendix C.
scenarios that are common in human health research among
Remark 4 (Unequal Sample Sizes at Sites): The CAPE al-
scientific research collaborators. Therefore, we designed the
gorithm achieves the same noise variance as the pooled-data
CAPE scheme with τ 2 as the input parameter and computed 2 2
scenario (i.e., τcape = τpool ) in the symmetric setting: Ns = N
the best we can achieve for a given 0 < δ < 1 (or vice-versa). 2
τcape
S

We present an empirical analysis of asymptotic growth of (, δ) and τs2 = τ 2 ∀ s ∈ [S]. In general, the ratio H(n) = 2
τpool
, where
parameters for the CAPE mechanism in Appendix G in the n [N1 , N2 , . . . , NS ], is a function of the sample sizes in the
Supplement. 2 S
1
sites. We observe: H(n) = N S3 s=1 Ns2 . As H(n) is a Schur-
Remark 3: We presented Theorem 1 for the scalar case for
convex function, it can be shown using majorization theory [66]
simplicity of presentation and understanding. However, it can 2
1
that 1 ≤ H(n) ≤ N S 3 ( (N −S+1)2 + S − 1), where the minimum
be readily extended to high-dimensional settings (e.g., f (x) ∈
Rd ). One can use the fact that the distribution of a spherically is achieved for the symmetric setting (i.e., Ns = N S ). That is,
symmetric normal is independent of the orthogonal basis from CAPE achieves the smallest noise variance at the aggregator in
which its constituent normals are drawn [59]. Therefore, one the symmetric setting.
can work in a basis that is aligned to v. Considering such a Remark 5 (Site Dropouts): If one chooses to use the
basis {b1 , . . . , bd }, and assuming that b1 is parallel to v, the SecureAgg
in Algorithm 1, the CAPE scheme achieves
analysis can be shown [59] to be reduced to the scalar case. We e
s s = 0 even in the case of site drop-out, as long as the
refrain from including the details into the current manuscript number of active sites is above some threshold (see Bonawitz
because of space constraints and to improve the coherence of et al. [43] for details). Therefore, the performance improvement
the presentation. of CAPE (Proposition 1) remains the same irrespective of the
number of dropped-out sites, as long as the number of colluding
sites does not exceed SC = /ceil ∗ S3 − 1.
A. Utility Analysis
The goal is to ensure (, δ) differential privacy for each site
2 2 B. Applicability of CAPE
and achieve τcape = τpool at the aggregator (see Lemma 1).
The CAPE protocol guarantees (, δ) differential privacy with As mentioned before, joint learning across datasets can yield
δ = 2 −μ
σz
z σz ). We claim that this δ guarantee is much
φ( −μ z
discoveries that are impossible to obtain from a single site.
better than the δ guarantee in the conventional decentralized However, privacy regulations prevent sites from sharing local
DP scheme. We empirically validate this claim by comparing δ raw data. Our CAPE algorithm is motivated by such scenarios,

2 2
which are common in human health research among scientific conventional approach, we need τes + τgs = τs2 , where τes
2
is
research collaborators. It can benefit computations with sensitiv- 2
the variance of es and is a function of σs . With these constraints,
ities satisfying some conditions (see Proposition 2). In addition we can formulate a feasibility problem to solve for the unknown
to simple averages, many functions of interest have sensitivities noise variances {σs2 , τgs
2
} as
that satisfy such conditions. Examples include the empirical
average loss functions used in ML and deep neural networks.
S
2 2
minimize 0 subject to τes + τgs = τs2 ; μ2s τgs
2 2
= τpool
Moreover, we can use the Stone-Weierstrass theorem [67] to
s=1
approximate a loss function in decentralized setting applying
CAPE and then use off-the-shelf optimizers. Additional appli- for all s ∈ [S], where {μs }, τpool and {τs } are known to the
cations include optimization algorithms, k-means clustering and aggregator. For this problem, multiple solutions are possible.
estimating probability distributions. We present one solution here along with the privacy analysis.
2
Proposition 2: Consider a decentralized setting with S > Solution: We observe that the variance τes of the zero-
1
S
1 sites in which site s ∈ [S] has a dataset Ds of Ns sam- mean random variable es = ês − μs S i=1 μi êi can be com-
S S
ples and Ss=1 Ns = N . Suppose the sites are employing the puted as τes 2
= Var[ês − i=1
μ ê
i i
] = (1 − S2 )σs2 + i=1 i iμ2 σ 2
.
μ2s S 2
CAPE scheme to compute a function f (D) with L2 sensi- S μs2S 2 2
tivity Δ(N ). Denote n= [N1 , N2 , . . . , NS ] and observe the
Note that we need s=1 μs τgs = τpool . One solution is to
2
τcape S 2
s=1 Δ (Ns )
set τgs 2
= μ21S τpool
2
. Using the constraint τes 2 2
+ τgs = τs2 and
ratio H(n) = 2 =S 3 Δ2 (N ) . Then the CAPE protocol
s
τpool
the expressions for τes 2
and τgs 2
, we have (1 − S1 )2 σs2 +
achieves H(n) = 1, if i) Δ( N
S ) = SΔ(N ) for convex Δ(N ); 1 2 2 2 1 2
i=s μi σi = τs − μ2s S τpool . We can write this expres-
μ2s S 2
and ii) S 3 Δ2 (N ) = Ss=1 Δ2 (Ns ) for general Δ(N ). sion for all s ∈ [S] in matrix form and solve for [σ12 σ22 . . . σS2 ]
Proof: The proof is given in Appendix E as
⎡ 2 μ22 μ2S ⎤−1 ⎡ 2 τpool 2 ⎤
1 − S1 μ21 S 2
··· μ21 S 2
τ1 − μ2 S
C. Extension of CAPE: Unequal Sample Sizes/Privacy ⎢ ⎥ ⎢ 1 ⎥
⎥ ⎢ ⎥
2
⎢ μ21 1 2 μ2S 2 τpool
Requirements at Sites ⎢ μ2 S 2 2 1 − · · · 2 2 ⎥ ⎢ ⎢ τ 2 − μ2 S ⎥
2
⎢
S μ2 S
⎥ ⎢ ⎥
⎢ .. .. .. .. ⎥ .. ⎥
Recall that CAPE achieves the smallest noise variance at the ⎣ . . . . ⎦ ⎣ ⎢ . ⎥
aggregator in the symmetric setting (see Remark 4). However, 2
⎦
μ21 μ22 1 2
τpool
2
μ S 2 2
μ S 2 · · · 1 − S τ 2
− 2
in practice, there would be scenarios where different sites have S S S μS S
different privacy requirements and/or sample sizes. Addition- Privacy Analysis in Asymmetric Setting: We present an
ally, sites may want the aggregator to use different weights analysis of privacy for the aforementioned scheme in asym-
for different sites (possibly according to the quality of the metric setting. Recall that theadversary can observe â =
output from a site). A simple scheme for doing so is shown [â1 , . . . , âSH ] ∈ RSH and ê = s∈SH ês . In other words, the
in [18]. In this work, we propose a generalization of the CAPE adversary observes the vector y = [â , ê] ∈ RSH +1 to make
scheme that can be applied in asymmetric settings. Additionally, inference about the non-colluding sites. As before, we must
the proposed scheme in this paper is more robust against site g(y|a)
show that | log g(y|a ) | ≤ holds with probability (over the
dropouts and does not require a trusted third-party. Note that the
challenge of this analysis is due to the correlated noise terms randomness of the mechanism) at least 1 − δ for guaranteeing
with different variances (or sample sizes). differential privacy. Recall that we release âs = f (xs ) + es +
Let us assume that site s requires local noise standard devi- gs from each site. We observe E(âs ) = f (xs ), Var(âs ) =
μs σ 2
ation τs . To initiate the CAPE protocol, each site will generate τs2 , ∀s ∈ [S] and E(âs1 âs2 ) = f (xs1 )f (xs2 ) − μ1s Ss1 −
2
ês ∼ N (0, σs2 ) and gs ∼ N (0, τgs2
). The aggregator intends to 2
μ s 2 σs 2 1
S 2 2
compute a weighted average of each site’s data/output with μ s1 S + μ s1 μ s2 S 2 i=1 μi σi , ∀s1 = s2 ∈ [S]. Without loss

weights selected according to some quality measure. For ex- of generality, we can assume [59] that a = 0 and a = a − v,
ample, if the aggregator knows that a particular site is suf- where v = [ N1s , 0, . . . , 0] . That is, the random variable â is
fering from more noisy observations than other sites, it can N (0, Σâ ), where
choose to give the output from that site less weight while ⎡ ⎤
τ12 Ψ(1, 2) · · · Ψ(1, S)
combining the site results. Let us denote the weights by {μs } ⎢ ⎥
⎢ Ψ(2, 1) τ22 · · · Ψ(2, S)⎥
such that Ss=1 μs = 1 and μs ≥ 0. First, the aggregator com- ⎢
Σâ = ⎢ . .. ⎥
.. .. ⎥,
putes Ss=1 μs ês using Algorithm 1 and broadcasts it to all ⎣ .. . . . ⎦

sites. Each site then sets es = ês − μs1S Si=1 μi êi , to achieve Ψ(S, 1) Ψ(S, 2) · · · τS2H
S
s=1 μs es = 0 and releases â s = f (xs ) + es + gs . Now, the
S
μi σi2 μj σj2 2 2
and Ψ(i, j) = − S1 ( + s=1 μs σs
aggregator computes acape = Ss=1 μs âs = Ss=1 μs f (xs ) + μj μi ) + μi μj S 2 .
Additionally,
S S H 2
the random variable ê is N (0, τê2 ),
where = Ss=1 σs . τê2
s=1 μs gs , where we used s=1 μs es = 0. In order to achieve
2
the same utility as the pooled data scenario (i.e. τpool 2
= τcape ), Therefore,
g(y|a)
is the density of N (0, Σ), where Σ =
S 2
S 2 2 2 Σâ Σâê
we need Var[ s=1 μs gs ] = τpool ⇒ s=1 μs τgs = τpool . Ad- ∈ R(SH +1)×(SH +1) . With some simple algebra,
ditionally, for guaranteeing the same local noise variance as Σ
âê τ 2
ê

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6362 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

we can find the expression

SH 2 for each entry of Σâê ∈ RSH : this gradient satisfying differential privacy. To that end, let us
2 1 2
[Σâê ]s = σs − μs S i=1 μi σi . The rest of the proof proceeds consider that the norm of the gradient due to each subject is
as the proof of Theorem 1. Note that, due to the complex nature bounded by BG Nm , where Nm is the time course length for each
of the expression of Σ, we do not have a closed form solution subject’s fMRI scan and BG is some constant. This implies that
for μz and σz (but we can numerically compute the values and (I + ŷs,n zs,n )WF ≤ BG . It is easy to see that by changing
thus, the resulting δ). one subject (i.e., for a neighboring dataset), the gradient at site
s can change by at most 2BN G Nm
s
= 2BMs . Therefore, the L2 sen-
G

IV. IMPROVED DIFFERENTIALLY PRIVATE DJICA sitivity of the function f (Xs ) = Gs is ΔsG = 2B G
Ms . In addition
In this section, we propose an algorithm that improves to the unmixing matrix W, we update a bias term b using a
upon our previous decentralized DP djICA algorithm [1] and gradient descent [13]. The gradient of the empirical average
achieves the same noise variance as the DP pooled-data scenario loss function
with respect to the bias at site s is given [13] by
in certain regimes. Recall that we are considering the joint ICA hs = N1s N n=1 s,n . Similar to the case of Gs , we can find the
s
ŷ
(jICA) [56] of decentralized fMRI data, which assumes a global L2 sensitivity of the function f (Xs ) = hs as Δsh = 2B h
Ms , where
mixing process (common spatial maps). More specifically, the ŷs,n 2 ≤ Bh . Note that for other neighborhood definitions
global mixing matrix A ∈ RD×R is assumed to generate the (time point level instead of subject level), one should consider the
time courses in Xs from underlying sources Ss ∈ RR×Ns at temporal correlation in the data [70]. According to the Gaussian
each site s ∈ [S]. Each site has data from Ms individuals, which mechanism [59], computing (, δ) DP approximates of Gs and
are concatenated temporally to form the local data matrix Xs ∈ hs requires noise standard deviations τG s
and τhs satisfy
RD×Ns . That is: X = [AS1 . . . ASS ] ∈ RD×N . We estimate
the DP global unmixing matrix W ∈ RR×D ≈ A+ by solving ΔsG 1.25 s Δs 1.25
s
τG = 2 log , τh = h 2 log . (3)
the Infomax ICA problem (see Section II) in the decentralized δ δ
setting with a multi-round gradient descent that employs CAPE.
Neuroimaging data is generally very high dimensional. As mentioned before, we employ the CAPE protocol to combine
We therefore use the DP decentralized PCA algorithm the gradients from the sites at the aggregator to achieve the same
(capePCA) [18] as an efficient and privacy-preserving utility level as that of the pooled data scenario. More specif-
dimension-reduction step of our proposed capeDJICA algo- ically, each site generates two noise terms: EG s ∈R
R×R
and
rithm. For simplicity, we assume that the observed samples es ∈ R , collectively among all sites (element-wise, according
h R

are mean-centered. We present a slightly modified version of to Algorithm 1) at each iteration round. Additionally, each site s
generates the following two noise terms locally at each iteration:
the original capePCA algorithm in Algorithm 3 (Appendix L r KG s ∈R
R×R
; [KG s ]ij i.i.d. ∼ N (0, τ
2 2
); τGk s2
= S1 τG
in the Supplement) to match the robust CAPE scheme from r khs ∈ RR ; [khs ]i i.i.d. ∼ N (0, τ 2 ); τ Gk
2 1 s2
Section III. Note that the scheme proposed in [68] was limited hk hk = S τh .
by the larger variance of the additive noise at the sites due to At each iteration round, the sites compute the noisy esti-
the smaller sample size. The capePCA alleviates this problem mates of the gradients of W and b: Ĝs = Gs + EG s + Ks ,
G

ĥs = hs + es + ks . These two terms are then sent to the ag-

h h
using the CAPE scheme and achieves the same noise variance
as the pooled-data scenario in the symmetric setting [18]. gregator and the aggregator computes: ΔW = ρ S1 Ss=1 Ĝs

Let the output of capePCA to be VR ∈ RD×R , which is and Δb = ρ S1 Ss=1 ĥs , where ρ is the learning rate. These
sent to the sites from the aggregator. Then the reduced di- gradient estimates are then used to update the variables W and
mensional data matrix (R × Ns ) at site s is denoted by: Xrs = b. By Lemma 1, the variances of the noise of the two estimates:
ΔW and Δb , are exactly the same as the pooled-data scenario
VR Xs . These projected samples are the inputs to the proposed
capeDJICA algorithm that estimates the unmixing matrix W in the symmetric setting. The complete algorithm is shown in
through a gradient descent [69]. Our proposed capeDJICA Algorithm 4 in Appendix M in the Supplement.
algorithm employs the CAPE protocol to perform the privacy- Note that, one does not need to explicitly find the bounds
preserving iterative message-passing between sites and the ag- BG and Bh . Instead, the gradients due to each subject can be
gregator to solve for W. We start the algorithm by initializing clipped to some pre-determined BG Nm or Bh Nm in L2 norm
W. At each iteration j, the sites adjust the local source estimates sense (where Nm is known apriori from the data collection
Zs (j) = W(j − 1)Xs by their bias estimate b(j − 1)1 . Local stage). That is, we can replace Gs,n = (I + ŷs,n z s,n )W with
Gs,n
gradients of the empirical average loss function are computed Gs,n = Gs,n F . Similarly, we can replace hs,n = ŷs,n
max(1, )
with respect to W and b [13]. More specifically, the gradient BG
hs,n
with respect to W at site s is given [13] by Gs = N1s (Ns I + with hs,n = hs,n 2 .
max(1, Bh )
(1 − 2Ys )Z
s )W, where Zs = WXs + b1 , Ys = g(Zs );
r
Remark 6 (Consequences of Norm Clipping): The norm clip-
b ∈ RR is the bias and 1 is a vector of ones. If we denote ping destroys the unbiasedness of the gradient estimate [17].
1 − 2Ys with Ŷs then we have Gs = N1s (Ns I + Ŷs Z s )W = If we choose BG and Bh to be too small, the average clipped
1
N s
Ns n=1 (I + ŷs,n zs,n )W, where (I + ŷs,n zs,n )W is the gra- gradient may be a poor estimate of the true gradient. Moreover,
dient contribution of one time point of a subject’s data matrix. BG and Bh dictates the additive noise level. In general, clipping
Note that this gradient estimate is needed to be sent to the prescribes taking a smaller step “downhill” towards the optimal
aggregator from the site. Therefore, we need to approximate point [30] and may slow down the convergence.

A. Privacy Analysis Using Rényi Differential Privacy offered a “pure” DP djICA procedure, there are a few short-
comings. The cost of achieving pure differential-privacy (i.e.,
We now analyze the capeDJICA algorithm with Rényi Dif-
ferential Privacy [16]. Analyzing the total privacy loss of a employing the Laplace mechanism [9]) was that the neighboring
dataset condition was met by restricting the L2 -norm of the
multi-shot algorithm, each stage of which is DP, is a chal-
samples to satisfy xn 2 ≤ 2√1D , which can be too limiting for
lenging task. It has been shown [16], [17] that the advanced
composition theorem [59] for (, δ)-differential privacy can be datasets with large ambient dimensions. The effect of this is
apparent from the experiments. Last but not the least, the DP
loose. The main reason is that one can formulate infinitely
PCA preprocessing step was less fault tolerant because of the
many (, δ)-DP algorithms for a given noise variance τ 2 . RDP
offers a much simpler composition rule that is shown to be pass the parcel or cyclic style message passing among the sites,
where site dropouts are more drastic than the one employed in
tight [16]. We review some necessary properties of RDP in
Appendix D. Recall that at each iteration j of capeDJICA, this paper (a certain number of site dropouts is permitted [43]).
we compute the noisy estimates of the gradients: ΔW (j) and By employing the CAPE protocol in the preprocessing stage and
also in the optimization process, we expect to gain a significant
Δb (j). As we employed the CAPE scheme in the symmetric
setting, the variances of noise at the aggregator for ΔW (j) performance boost. We validate the performance gain in the
pool
ρ2 τ G
2
ρ2 τhpool
2 Experimental Results (Section V).
2
and Δb (j) are: σW = ΔG and σb2 = Δh , respectively, Convergence of capeDJICA Algorithm: We note that the
ΔsG Δs
where ΔG = Sand Δh = Sh . From Proposition 5, we have gradient estimate at the aggregator
(Step 14 in Algorithm 4)
that the computation of ΔW (j) is (α, α/(2σW2
))-RDP. Sim- essentially contains the noise Sρ Ss=1 KG s , which is zero mean.
ilarly, the computation of Δb (j) is (α, α/(2σb2 ))-RDP. By Although this does not provide guarantees on the excess error,
Proposition 4, we have that each iteration step of capeDJICA the estimate of the gradient converges in expectation to the
is (α, α2 ( σ12 + σ12 ))-RDP. Denoting the number of required true gradient [71]. However, if the batch size is too small,
W b
iterations for convergence by J ∗ then, under J ∗ -fold compo- the noise can be too high for the algorithm to converge [27].
∗
sition of RDP, the overall capeDJICA algorithm is (α, 2σαJ )- Since the total additive noise variance is smaller for capeDJICA
2

1
RDP than the conventional case by a factor of S, the convergence rate
RDP, where 2
σRDP
= ( σ12 + 1
σb2 ). From Proposition 3, we can is faster. Note that a theoretical analysis of intricate relation
W
∗
conclude that the capeDJICA algorithm satisfies ( 2σαJ
2 + between the excess error and the privacy parameters is beyond
RDP
1 the scope of the current paper. We refer the reader Bassily et
logδr
, δr )-differential privacy for any 0 < δr < 1. For a given
α−1 al. [30] for further details.
δr , we find the optimal αopt as: αopt = 1 + J2∗ σRDP 2 log δ1r . Communication Cost of capeDJICA: We analyze the total
α J∗ log δ1r communication cost associated with the proposed capeDJICA
Therefore, capeDJICA algorithm is ( 2σopt
2 + αopt −1 , δr )-DP algorithm. At each iteration round, we need to generate two
RDP
for any 0 < δr < 1. zero-sum noise terms, which entails O(S + R2 ) communica-
tion complexity of the sites and O(S 2 + SR2 ) communication
B. Privacy Accounting Using Moments Accountant complexity of the aggregator [43]. Each site computes the noisy
In this section, we use the moments accountant [17] frame- gradient and sends one R × R matrix and one R dimensional
work to compute the overall privacy loss of our capeDJICA vector to the aggregator. And finally, the aggregator sends the
algorithm. Moments accountant can be used to achieve a much R × R updated weight matrix and R dimensional bias estimate
smaller overall than the strong composition theorem [59]. As to the sites. The total communication cost is O(S + R2 ) for the
mentioned before, naïvely employing the additive nature of the sites and O(S 2 + SR2 ) for the central node. This is expected as
privacy loss results in the worst case analysis, i.e., assumes that we are estimating an R × R matrix in a decentralized setting.
each iteration step exposes the worst privacy risk and this exag-
gerates the total privacy loss. However, in practice, the privacy V. EXPERIMENTAL RESULTS
loss is a random variable that depends on the dataset and is
typically well-behaved (concentrated around its expected value). In this section, we empirically show the effectiveness of our
Due to space constraints, we presented the detailed analysis of proposed CAPE scheme through the capeDJICA algorithm,
capeDJICA in Appendix I in the Supplement. Briefly, we can demonstrating the benefit of enabling correlated noise gener-
formulate a quadratic equation in terms of and then find the best ation. We note the intricate relationship between and δ (see
2 ∗ 2
for a given δtarget : 2 Jσ∗ Δ2 2 − + J8σΔ2 + log δtarget = 0. Theorem 2) due to the correlated noise scheme and the challenge
2
Here, the noise variance σ consists of two parts: σW 2
and σb2 of characterizing the overall privacy loss in our multi-round
for capeDJICA. capeDJICA algorithm. We designed the experiments to better
demonstrate the trade-off between performance and several pa-
rameters: , δ and M . We show the simulation results to compare
C. Performance Improvement With Correlated Noise
the performance of our capeDJICA algorithm with the existing
The existing DP djICA algorithm [1] achieved J ∗ - DP djICA algorithm [1] (DP-djICA), the non-private djICA
differential privacy (where J ∗ is the total number of iterations algorithm [13] and a DP ICA algorithm operating on only
required for convergence) by adding a noise term to the local local data (local DP-ICA). We modified the base non-private
estimate of the source (i.e., Zs (j)). Although the algorithm djICA algorithm to incorporate the gradient bounds BG and

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6364 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

Bh . Although we are proposing an algorithm for decentralized (with solid lines on the right y-axis) along with q NGI (with
setting, we included the performance indices for the local setting dashed lines on the left y-axis) as a means for visualizing how
to show the effect of smaller sample sizes on the performance. the privacy-utility trade-off varies with different parameters. For
We note that the DP-djICA algorithm [1] offers -differential a given privacy budget (performance requirement), the user can
privacy as opposed to (, δ)-differential privacy offered by use the overall plot on the right y-axis, shown with solid
capeDJICA. For both synthetic and real datasets, we consider lines, (q NGI plot on the left y-axis, shown with dashed lines)
the symmetric setting (i.e., Ns = N S , τG = τG and τh = τh ).
s s
to find the required i or M on the x-axis and thereby, find the
We limited the maximum number of iterations J to be 1000 corresponding performance (overall ). Although we ensured
(however, the number of iterations varies with the algorithm i ∈ (0, 1) [59], we are more interested in the overall spent ,
and amount
√ of noise). We chose the norm bounds BG = 30, which we computed using the RDP technique (Section IV-A)
Bh = BG , number of sites S = 4, SC = /ceil ∗ S3 − 1 [43], for the capeDJICA and the local DP-ICA algorithms. For the
0.015
the target δ = 10−5 and the learning rate ρ = log(R) . We show DP-djICA algorithm, we used the composition theorem [59].
the average performance over 10 independent runs. Note that, the Note that, the primary reason for using the RDP/MA techniques
choice of hyper-parameters is non-trivial [72] and corresponding is the inherent ambiguity of the approximate DP algorithms
end-to-end privacy analysis is still an open problem. (existence of infinitely many (, δ) pairs that yield the same
Synthetic Data: We generated the synthetic data from the noise variance τ 2 ). For pure (, 0)-DP, this is not an issue. If
same model as [13]. The source signals S were simulated using the end goal is to have a pure (, 0)-DP algorithm, we need to
the generalized autoregressive (AR) conditional heteroscedastic stick to the basic composition [59]. We are reporting the privacy
(GARCH) model [73], [74]. We used M = 1024 simulated sub- spent during the course of the gradient descent. The total spent
jects in our experiments. For each subject, we generated R = 20 including PCA would be slightly higher.
time courses with 250 time points. The data samples are equally Performance Variation with i : First, we explore how the
divided into S = 4 sites. For each subject, the fMRI images are privacy-utility trade-off between q NGI and the overall “privacy
30 × 30 dimensional. We employ the capePCA algorithm [18] risk” varies with i . As mentioned before, we compare the
(Appendix L in the Supplement) as a preprocessing stage to performance of capeDJICA with those of the djICA, the
reduce the sample dimension from D = 900 to R = 20. The DP-djICA and local DP-ICA. In Figs. 1(a)–(d), we show the
capeDJICA is carried out upon the R-dimensional samples. variation of q NGI and overall for different algorithms with
Real Data: We use the same data and preprocessing as Baker i on synthetic and real data. For both datasets, we show the
et al. [13]: the data were collected using a 3-T Siemens Trio performance indices for two different M values: M = 256 and
scanner with a 12-channel radio frequency coil, according to the M = 1024. We observe from the figures that the proposed
protocol in Allen et al. [55]. In the dataset, the resting-state scan capeDJICA outperforms the existing DP-djICA by a large
durations range from 2 min 8 sec to 10 min 2 sec, with an average margin for the range of i values that results in q NGI ≤ 0.1.
of 5 min 16 sec [13]. We used a total of M = 1548 subjects This is expected as DP-djICA suffers from too much noise (see
from the dataset and estimated R = 50 independent compo- Section IV-C for the explanation). capeDJICA also guaran-
nents using the algorithms under consideration. For details on tees the smallest overall among the privacy-preserving meth-
the preprocessing, please see [13]. We also projected the data ods. capeDJICA can reach the utility level of the non-private
onto a 50-dimensional PCA subspace estimated using pooled djICA for some parameter choices and naturally outperforms
non-private PCA. As we do not have the ground truth for the real local DP-ICA as estimation of the sources is much accurate
data, we computed a pseudo ground truth [13] by performing when more samples are available. For the same privacy loss (i.e.,
a pooled non-private analysis on the data and estimating the for a fixed ), one can achieve better performance by increasing
unmixing matrix. The performance of capeDJICA, djICA, M . For both synthetic and real data, we note that assigning
DP-djICA and local DP-ICA algorithms are evaluated against a higher i may provide a good q NGI but does not guarantee
this pseudo ground truth. a small overall . This is because the overall is an implicit
Δs 1.25 2
Performance Index: We set τG s
= iG 2 log 10 −2 and τh =
s function of the added noise variance at each iteration σRDP and
the total number of iterations required for convergence J ∗ (see
Δsh 1.25
i 2 log 10 −2 for our experiments, where i is the privacy Section IV-A for details). The user needs to choose i based on
parameter per iteration, ΔsG and Δsh are the L2 sensitivities of privacy budget and required performance.
Gs and hs , respectively. To evaluate the performance of the Performance Variation with M : Next, in Figs. 2(a)–(d), we
algorithms, we consider the quality of the estimated unmixing show the variation of q NGI and the overall with the total
matrix W. More specifically, we utilize the normalized gain number of subjects M for two different i values on synthetic
index q NGI [13], [65] that quantizes the quality of W. The and real data. We observe similar trends in performance as in
normalized gain index q NGI varies from 0 to 1, with lower the case of varying i . The capeDJICA algorithm outperforms
values indicating a better estimation of a set of ground-truth the DP-djICA and the local DP-ICA: with respect to both q NGI
components (i.e. the unmixing matrix times the mixing matrix and the overall . For the q NGI , the capeDJICA performs very
is closer to an identity matrix [65]). For practical usability of the closely to the non-private djICA, even for moderate M values,
recovered A, we need to achieve q NGI ≤ 0.1 [13]. We consider while guaranteeing the smallest overall . The performance
the overall as a performance index. We plotted the overall gain over DP-djICA is particularly noteworthy. For a fixed M ,

Fig. 1. Variation of q NGI and overall with privacy parameter i for: (a)–(b) synthetic fMRI data, (c)–(d) real fMRI data. For capeDJICA, higher i results
a smaller q NGI , but not necessarily a small overall , i.e., an optimal i can be chosen based on q NGI or overall requirement.

Fig. 2. Variation of q NGI and overall with total number of subjects M for: (a)–(b) synthetic fMRI data, (c)–(d) real fMRI data. For capeDJICA, higher M
results a smaller q NGI and a smaller overall .

Fig. 3. Recovered spatial maps from synthetic data: the ground truth and the ones resulting from djICA, local DP-ICA, DP-djICA and capeDJICA.

Fig. 4. Spatial maps (synthetic data) resulting from capeDJICA for different parameters. capeDJICA estimates spatial maps that closely resemble the true ones,
even for strict privacy guarantee (small overall ).

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6366 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

increasing results in a slightly better utility, albeit at the cost empirically compared the performance of the proposed algo-
of greater privacy loss. We show the performance variation with rithms with those of conventional, non-private and local algo-
δ in Appendix J in the Supplement. rithms on synthetic and real datasets. We varied privacy param-
Reconstructed Spatial Maps: Finally, we intend to demon- eters and relevant dataset parameters to show that the proposed
strate how the estimated spatial maps (the estimated global mix- algorithms outperformed the conventional and local algorithms
ing matrix A, see Section II) actually look like, as interpretability comfortably and matched the performance of the non-private
is one of the most important concerns for fMRI applications. algorithms for some parameter choices. In general, the proposed
In Fig. 3, we show the true spatial map, the ones estimated algorithm offered very good utility even for strong privacy guar-
from the non-private djICA [13], local DP-ICA, DP-djICA antees – indicating we may be able to achieve meaningful privacy
and capeDJICA algorithms. It is evident from the figure that even without losing much utility. An interesting future work
the spatial map recovered by the proposed capeDJICA is very could be to extend the CAPE framework to (, 0)-differential
close to that of the ground truth. The overall is also very privacy, perhaps using the Staircase Mechanism [75] for differ-
small. The local DP-ICA, although can achieve a small , cannot ential privacy. Another possible direction is to extend CAPE for
recover the spatial maps well enough for practical purposes. arbitrary tree-structured networks.
However, increasing the i and/or increasing the number of
subjects would certainly improve the quality of the spatial maps.
Finally, for the DP-djICA, the algorithm fails to converge to APPENDIX A
anything meaningful due to excessive amount of noise. In Fig. 4, CONVENTIONAL DECENTRALIZED DP COMPUTATIONS
we show the estimated spatial maps resulting from the proposed As mentioned before, DP algorithms often introduce noise
capeDJICA algorithm along with the overall for a variety of in the computation pipeline to induce the randomness. For
combinations of i and M . We observe that when sufficiently additive noise mechanisms, the standard deviation of the noise
large number of subjects are available, the estimated spatial maps is scaled to the sensitivity of the computation
[59]. To illustrate,
closely resemble the true one, even for strict privacy guarantee consider estimating the mean f (x) = N1 N n=1 xn of N scalars
(small overall ). For smaller number of samples, we may need to x = [x1 , . . . , xN −1 , xN ] with each xi ∈ [0, 1]. The sensitiv-
compensate by allowing larger values to achieve good utility. In ity of the function f (x) is N1 . Therefore, for computing the
general, we observe that capeDJICA can achieve very good ap- (, δ)-DP estimate of the average a = f (x), we can follow the
proximate to the true spatial map, almost indistinguishable from Gaussian mechanism [9] to release âpool = a + epool , where
the non-private spatial map. This emphasizes the effectiveness 2
epool ∼ N (0, τpool ) and τpool = N1 2 log 1.25 δ .
of the proposed capeDJICA in the sense that very meaningful
Suppose now that the N samples are equally distributed
utility can be achieved even with strict privacy guarantee.
among S sites. That is, each site s ∈ {1, . . . , S} holds a disjoint
dataset xs of Ns = N/S samples. An aggregator wishes to
VI. CONCLUSION
estimate and publish the mean of all the samples. For pre-
We proposed a novel decentralized DP computation scheme, serving privacy, the conventional DP approach is for each site
CAPE, which uses correlated randomness at the sites to im- to release (or send to the aggregator node) an (, δ)-DP esti-
prove the performance of decentralized signal processing and s ) as: âs = f (x
mate of the function f (x s ) + es , where es ∼
ML applications involving locally held private data. Example N (0, τs2 ) and τs = N1s 2 log 1.25 2 log 1.25
δ = N
S
δ . The ag-
scenarios include health care research with legal and ethical
gregator can then compute the (, δ)-DP approximate average as
limitations on the degree of sharing the “raw” data. CAPE can
greatly improve the privacy-utility trade-off when (a) all parties âconv = S1 Ss=1 âs = S1 Ss=1 as + S1 Ss=1 es . The variance
τs2 τs2 2
follow the protocol and (b) the number of colluding sites is of the estimator âconv is S · S2 = S τconv . We observe the
2
not more than S/3 − 1 [43]. Our proposed CAPE protocol is ratio:
τpool
2 = τs2 /S 2
= S1 . That is, the decentralized DP averag-
τconv τs2 /S
based on an estimation-theoretic analysis of the noise addition ing scheme will always result in a poorer performance than the
process for differential privacy and therefore, provides different pooled data case.
guarantees than cryptographic approaches such as SMC. CAPE
guarantees (, δ) probabilistic differential privacy and hence
(, δ) differential privacy, where and δ satisfy a specific re- APPENDIX B
lation. It can achieve the same level of additive noise variance as VARIANCE OF THE ESTIMATOR UNDER CAPE AND
the pooled data scenario in certain regimes. We further extend the POOLED-DATA SCENARIO
CAPE scheme to asymmetric network/privacy settings. These Proof of Lemma 1: We prove the lemma according to [18].
benefits of CAPE come in-part from assuming that sites can Recall that in the pooled data scenario, the sensitivity of
generate zero-sum correlated noise. This can be accomplished the function f (x) is N1 , where x = [x1 , . . . , xS ]. There-
by various methods, each of which comes with additional costs fore, to approximate f (x) satisfying (, δ)-DP, we need to
or assumptions on trust. We applied CAPE to DP decentralized
joint independent component analysis (capeDJICA) for col- additive Gaussian noise with standard deviation τpool =
have
1
laborative source separation. To handle the privacy composition N 2 log 1.25
δ [59]. Next, consider the decentralized data set-
for multi-round algorithms, we analyzed capeDJICA using the ting (as in Section
II) with local noise standard deviation given
1 1.25 1.25
Rényi differential privacy and the moments accountant. We by τs = Ns 2 log δ = S
N 2 log δ = τ . We observe

2
2
τpool = τSs ⇒ τpool = Sτ 2 . We will now show that the CAPE Definition 5 (Majorization): Consider two vectors a ∈ RS
algorithm will yield the same noise variance of the estimator at and b ∈ RS with non-increasing entries (i.e., ai ≥ aj and
the aggregator. Recall that at the aggregator we compute acape = bi ≥ bj for i < j). Then a is majorized by b, denoted a ≺ b,
S N S
1
s=1 âs = N
1
n=1 xn + S
1
s=1 gs . The variance of the
if and only if the following holds: Ss=1 as = Ss=1 bs and
S J J
s=1 as ≤ s=1 bs ∀J ∈ [S]
2 τ2 τ2 2
estimator acape is: τcape S · Sg2 = Sg = Sτ 2 , which is exactly
the same as the pooled data scenario. Therefore, the CAPE al- Consider nsym N S [1, . . . , 1] ∈ R for some positive N .
S

gorithm allows us to achieve the same additive noise variance as Then any vector n = [N1 , . . . , NS ] ∈ RS with non-increasing

the pooled data scenario in the symmetric setting (Ns = N entries and Ss=1 |Ns | = N majorizes nsym , or nsym ≺ n.
S and
τs2 = τ 2 ∀s ∈ [S]), while satisfying (, δ) differential privacy at Definition 6 (Schur-convex functions): The function K :
the sites and for the final output from the aggregator, where (, δ) RS → R is Schur-convex if K(a) ≤ K(b) ∀ a ≺ b ∈ RS .
satisfy δ = 2 −μ
σz
z σz ).
φ( −μ z
Lemma 2: If K is symmetric and convex, then K is Schur-
convex. The converse does not hold.
APPENDIX C Proof of Proposition 2: As the sites are computing the func-
PERFORMANCE IMPROVEMENT OF CAPE tion f with L2 sensitivity Δ(N ), the local noise standard devia-
tion for preserving privacy is proportional to Δ(Ns ) by Gaussian
Proof of Proposition 1: The local noise variances are mechanism [9]. It can be written as: τs = Δ(Ns )C, where C is
{τs2 } for s ∈ [S]. In the conventional decentralized DP a constant for a given (, δ) pair. Similarly, the noise standard
scheme,
1
S we compute 1
S the following at the aggregator: aconv = deviation in the pooled data scenario can be written as: τpool =
a
s=1 s + es . The variance of the estimator is: Δ(N )C. Now, the final noise variance at the aggregator for
S
s Sτs2 s=11 s
τconv = s=1 S 2 = S 2 s=1 τs2 . In the CAPE scheme, we
2
CAPE protocol is: τcape2 τ2
= Ss=1 Sg2 = S13 Ss=1 Δ2 (Ns )C 2 .
compute the following quantity at the aggregator: acape = τ2
S
Δ2 (N )
1
S 1
S 1
S Observe: H(n) = τcape = Ss=1 s
. As we want to achieve
s=1 as + S s=1 es + S
2 3 Δ2 (N )
S s=1 gs . The variance of the pool
s τg the same noise variance as the pooled-data scenario, we need
2
estimator is: τcape = s=1 S 2 = S13 ss=1 τs2 . Therefore, the
2
2
S 3 Δ2 (N ) = Ss=1 Δ2 (Ns ), which proves the case for general
CAPE scheme provides a reduction G = ττconv
2 = S in noise sensitivity function Δ(N ). Now, if Δ2 (N ) is convex then the
cape
variance over conventional decentralized DP approach in the by Lemma 2 the function K(n) = Ss=1 Δ2 (Ns ) is Schur-
2 2
symmetric setting (Ns = NS and τs = τ ∀s ∈ [S]). convex. Thus, the minimum of K(n) is obtained when n =
S 2 N
nsym = [ N S , . . . , S ]. We observe: Kmin (n) =
N
s=1 Δ ( S ) =
2 N
APPENDIX D S · Δ ( S ). Therefore, for convex Δ(N ), we have H(n) = 1 if
ADDITIONAL BACKGROUND CONCEPTS Δ( NS ) = SΔ(N ).
Rényi Differential Privacy: Analyzing the total privacy loss
of a multi-shot algorithm, each stage of which is DP, is a REFERENCES
challenging task. It has been shown [16], [17] that the advanced
[1] H. Imtiaz et al., “Privacy-preserving source separation for distributed data
composition theorem [59] for (, δ)-differential privacy can be using independent component analysis,” in Proc. Annu. Conf. Inf. Sci.
loose. The main reason is that one can formulate infinitely many Syst., 2016, pp. 123–127. [Online]. Available: http://dx.doi.org/10.1109/
(, δ)-DP algorithms for a given noise variance τ 2 . RDP offers CISS.2016.7460488
[2] P. M. Thompson et al., “ENIGMA and the individual: Predicting fac-
a much simpler composition rule that is shown to be tight [16]. tors that affect the brain in 35 countries worldwide,” Neuroimage,
We review some properties of RDP [16]. vol. 145, pp. 389–408, 2017. [Online]. Available: https://doi.org/10.1016/
Proposition 3 (From RDP to differential privacy [16]): If j.neuroimage.2015.11.057
[3] S. M. Plis et al., “COINSTAC: A privacy enabled model and prototype
A is an (α, r )-RDP mechanism, then it also satisfies (r + for leveraging and processing decentralized brain imaging data,” Front.
log δ1r Neurosci., vol. 10, 2016, Art. no. 365. [Online]. Available: https://doi.org/
α−1 , δr )-differential privacy for any 0 < δr < 1. 10.3389/fnins.2016.00365
Proposition 4 (Composition of RDP [16]): Let A : D → T1 [4] K. W. Carter et al., “ViPAR: A software platform for the virtual pooling and
be (α, r1 )-RDP and B : T1 × D → T2 be (α, r2 )-RDP, then analysis of research data,” Int. J. Epidemiol., vol. 45, no. 2, pp. 408–416,
2015. [Online]. Available: https://doi.org/10.1093/ije/dyv193
the mechanism defined as (X, Y ), where X ∼ A(D) and Y ∼ [5] A. Gaye et al., “DataSHIELD: Taking the analysis to the data. not the data
B(X, D), satisfies (α, r1 + r2 )-RDP. to the analysis,” Int. J. Epidemiol., vol. 43, no. 6, pp. 1929–1944, 2014.
Proposition 5 (RDP and Gauss. Mech. [16]): If A has L2 sen- [Online]. Available: https://doi.org/10.1093/ije/dyu188
[6] A.Narayanan and V. Shmatikov, “How to break anonymity of the Netflix
sitivity 1, then the Gaussian mechanism Gσ A(D) = A(D) + E prize dataset,” 2006. [Online]. Available: http://arxiv.org/abs/cs/0610105
satisfies (α, 2σα2 )-RDP, and a composition of J Gaussian mech- [7] L.Sweeney, “Only you, your doctor, and many others may know,” Technol.
2
anisms satisfies (α, 2σ αJ
2 )-RDP, where E ∼ N (0, σ ).
Sci., vol. 2015092903, no. 9, p. 29, 2015. [Online]. Available: https://
techscience.org/a/2015092903
The proofs of the Propositions 3, 4 and 5 are provided in [16]. [8] J. L. Ny and G. J. Pappas, “Differentially private Kalman filter-
ing,” in Proc. 50th Annu. Allerton Conf. Commun., Control, Com-
put., 2012, pp. 1618–1625. [Online]. Available: https://doi.org/10.1109/
APPENDIX E Allerton.2012.6483414
APPLICABILITY OF CAPE IN MACHINE LEARNING [9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to
sensitivity in private data analysis,” in Proc. 3rd Conf. Theory Cryptog-
First, we review some definitions and lemmas [66, Proposition raphy, 2006, pp. 265–284. [Online]. Available: http://dx.doi.org/10.1007/
C.2] necessary for the proof. 11681878_14

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6368 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

[10] V. D. Calhoun et al., “Independent component analysis for brain fMRI [33] A. Rajkumar and S. Agarwal, “A differentially private stochastic gradient
does indeed select for maximal independence,” PLoS One, vol. 8, 2013, descent algorithm for multiparty classification,” in Artif. Intell. Statist.,
Art. no. e73309. [Online]. Available: http://dx.doi.org/10.1371/journal. 2012, pp. 933–941.
pone.0073309 [34] R. Bassily et al., “Practical locally private heavy hitters,” in Adv. Neural
[11] V. D.Calhoun and T. Adali, “Multisubject independent component analysis Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R.
of fMRI: A decade of intrinsic networks, default mode, and neurodiagnos- Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc.,
tic discovery,” IEEE Rev. Biomed. Eng., vol. 5, pp. 60–73, Aug. 2012. 2017, pp. 2288–2296. [Online]. Available: http://papers.nips.cc/paper/
[Online]. Available: https://doi.org/10.1109/RBME.2012.2211076 6823-practical-locally-private-heavy-hitters.pdf
[12] P. Comon, “Independent component analysis, a new concept?,” Signal [35] X. Wu et al., “Bolt-on differential privacy for scalable stochastic gradient
Process., vol. 36, no. 3, pp. 287–314, 1994. descent-based analytics,” in Proc. ACM Int. Conf. Manage. Data. New
[13] B. T. Baker et al., “Decentralized temporal independent component analy- York, NY, USA, 2017, pp. 1307–1322. [Online]. Available: http://doi.acm.
sis: Leveraging fMRI data in collaborative settings,” NeuroImage, vol. 186, org/10.1145/3035918.3064047
pp. 557–569, 2019. [36] A.Smith, “Privacy-preserving statistical estimation with optimal conver-
[14] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber, gence rates,” in Proc. 43rd Annu. ACM Symp. Theory Comput., New York,
“Privacy: Theory meets practice on the map,” in Proc. IEEE 24th Int. NY, USA, 2011, pp. 813–822. [Online]. Available: http://doi.acm.org/10.
Conf. Data Eng., 2008, pp. 277–286. 1145/1993636.1993743
[15] S. Meiser, “Approximate and probabilistic differential privacy definitions,” [37] F.McSherry and K. Talwar, “Mechanism design via differential privacy,”
IACR Cryptol. ePrint Arch., vol. 2018, 2018, Art. no. 277. in Proc. 48th Annu. IEEE Symp. Found. Comput. Sci., 2007, pp. 94–103.
[16] I.Mironov, “Rényi differential privacy,” in Proc. 30th Comput. Secur. [Online]. Available: http://dx.doi.org/10.1109/FOCS.2007.41
Found. Symp. (CSF), 2017, pp. 263–275. [38] B. McMahan et al., “Communication-efficient learning of deep networks
[17] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM from decentralized data,” in Proc. 20th Int. Conf. Artif. Intell. Statist., Ser.
SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, 2016, Proc. Mach. Learn. Res., A. Singh and J. Zhu, Eds., vol. 54. Fort Laud-
pp. 308–318. [Online]. Available: http://doi.acm.org/10.1145/2976749. erdale, FL, USA, 20-22 Apr. 2017, pp. 1273–1282. [Online]. Available:
2978318 http://proceedings.mlr.press/v54/mcmahan17a.html
[18] H.Imtiaz and A. D. Sarwate, “Distributed differentially-private algorithms [39] P. Kairouz et al., “Advances and open problems in federated learning,”
for matrix and tensor factorization,” IEEE J. Sel. Topics Signal Process., Dec. 2019. [Online]. Available: https://arxiv.org/abs/1912.04977
vol. 12, no. 6, pp. 1449–1464, Dec. 2018. [Online]. Available: https://doi. [40] M. Heikkilä, E. Lagerspetz, S. K. Kaski, S. Shimizu Tarkoma, and A.
org/10.1109/JSTSP.2018.2877842 Honkela, “Differentially private bayesian learning on distributed data,” in
[19] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. H. Bengio, R.
optimization and statistical learning via the alternating direction method Wallach, S. Fergus Vishwanathan, and R. Garnett, Eds. Curran Associates,
of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, Inc., 2017, pp. 3229–3238.
Jan. 2011. [Online]. Available: http://dx.doi.org/10.1561/2200000016 [41] S. Goryczka, L. Xiong, and V. Sunderam, “Secure multiparty aggregation
[20] D. K. Molzahn et al., “A survey of distributed optimization and control with differential privacy: A comparative study,” in Proc. Joint EDBT/ICDT
algorithms for electric power systems,” IEEE Trans. Smart Grid, vol. 8, Workshops. New York, NY, USA, 2013, pp. 155–163. [Online]. Available:
no. 6, pp. 2941–2962, Nov. 2017. http://doi.acm.org/10.1145/2457317.2457343
[21] C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedić, “Optimal algorithms for [42] P. Kairouz, S. Oh, and P. Viswanath, “Secure multi-party differential
distributed optimization,” 2017, arXiv:1712.00232. privacy,” in Adv. Neural Inf. Process. Syst., C. Cortes, N. D. Lawrence,
[22] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributed D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc.,
constrained optimization,” IEEE Trans. Autom. Control, vol. 62, no. 1, 2015, pp. 2008–2016.
pp. 50–64, Jan. 2017. [43] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
[23] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private distributed machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur..
convex optimization via objective perturbation,” in Proc. Amer. Control New York, NY, USA, 2017, pp. 1175–1191. [Online]. Available: http://doi.
Conf., 2016, pp. 2061–2066. acm.org/10.1145/3133956.3133982
[24] J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed [44] S. A. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A.
online algorithms over time-varying directed networks,” IEEE Trans. Smith, “What can we learn privately?,” in Proc. IEEE 49th Annu. IEEE
Signal Inf. Process. Netw., vol. 4, no. 1, pp. 4–17, Mar. 2018. Symp. Found. Comput. Sci., 2008, pp. 531–540. [Online]. Available: http:
[25] K. Chaudhuri and C. Monteleoni, “Privacy-preserving logistic regression,” //dx.doi.org/10.1109/FOCS.2008.27
in Proc. 21st Int. Conf. Neural Inf. Process. Syst. 21, Vancouver, British [45] A. Beimel, K. Nissim, and U. Stemmer, “Characterizing the sample
Columbia, Canada: Curran Associates, Inc., 2008, pp. 289–296. complexity of private learners,” in Proc. 4th Conf. Innov. Theor. Com-
[26] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially pri- put. Sci., New York, NY, USA: Association for Computing Machinery,
vate empirical risk minimization,” J. Mach. Learn. Res., vol. 12, 2013, pp. 97–110. [Online]. Available: https://doi.org/10.1145/2422436.
pp. 1069–1109, Jul. 2011. [Online]. Available: http://dl.acm.org/citation. 2422450
cfm?id=1953048.2021036 [46] B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by sub-
[27] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradient descent sampling: Tight analyses via couplings and divergences,” in Adv. Neural
with differentially private updates,” in Proc. IEEE Glob. Conf. Signal Inf. Inf. Process. Syst., S. Bengio, H. Wallach, H. Larochelle, K. Grauman,
Process., 2013, pp. 245–248. N. Cesa-Bianchi, and R. Garnett, Eds. Curran Associates, Inc., 2018,
[28] Z. Ji, Z. C. Lipton, and C. Elkan, “Differential privacy and machine pp. 6277–6287.
learning: A survey and review,” 2014, arXiv:1412.7584. [47] B. B., B. J., G. A., and N. K., “The privacy blanket of the shuffle model,”
[29] C. Li, P. Zhou, L. Xiong, Q. Wang, and T. Wang, “Differentially private in Adv. Cryptology - CRYPTO 2019. CRYPTO 2019, ser. Lecture Notes
distributed online learning,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 8, Comput. Sci., B. A. and M. D., Eds. Cham: Springer, 2019, vol. 11693.
pp. 1440–1453, Aug. 2018, doi: 10.1109/TKDE.2018.2794384. [48] Ú. Erlingsson et al., “Encode, shuffle, analyze privacy revisited: Formal-
[30] R. Bassily, A. Smith, and A. Thakurta, “Private empirical risk minimiza- izations and empirical evaluation,” Jan. 2020, arXiv:2001.03618.
tion: Efficient algorithms and tight error bounds,” in Proc. IEEE 55th Annu. [49] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our
Symp. Found. Comput. Sci., Washington, DC, USA, 2014, pp. 464–473. data, ourselves: privacy via distributed noise generation,” in Adv. Cryp-
[Online]. Available: http://dx.doi.org/10.1109/FOCS.2014.56 tology, vol. 4004. Saint Petersburg, Russia: Springer Verlag, May 2006,
[31] K. Ligett, S. A. Neel, B. Roth Waggoner, and S. Z. Wu, “Accuracy first: pp. 486–503.
Selecting a differential privacy level for accuracy constrained ERM,” in [50] B. Anandan and C. Clifton, “Laplace noise generation for two-party
Adv. Neural Inf. Process. Syst., I. Guyon, U. V. Luxburg, S. H. Bengio, R. computational differential privacy,” in Proc. 13th Annu. Conf. Privacy,
Wallach, S. Fergus Vishwanathan, and R. Garnett, Eds. Curran Associates, Secur. Trust, 2015, pp. 54–61.
Inc., 2017, pp. 2563–2573. [51] J. Sui, T. Adalı, G. D. Pearlson, and V. D. Calhoun, “An ICA-based
[32] D. Wang, M. Ye, and J. Xu, “Differentially private empirical risk min- method for the identification of optimal fMRI features and components
imization revisited: Faster and more general,” in Proc. Adv. Neural Inf. using combined group-discriminative techniques,” NeuroImage, vol. 46,
Process. Syst., Long Beach, California, USA, 2017, pp. 2719–2728. no. 1, pp. 73–86, 2009. [Online]. Available: http://dx.doi.org/10.1016/j.
neuroimage.2009.01.026

[52] J.Liu and V. Calhoun, “Parallel independent component analysis for mul- [73] R.Engle, “Autoregressive conditional heteroscedasticity with estimates
timodal analysis: Application to fMRI and EEG data,” in Proc. 4th IEEE of the variance of United Kingdom inflation,” Econometrica, vol. 50,
Int. Symp. Biomed. Imaging: From Nano to Macro, 2007, pp. 1028–1031. no. 4, pp. 987–1007, 1982. [Online]. Available: http://dx.doi.org/10.2307/
[Online]. Available: http://dx.doi.org/10.1109/ISBI.2007.357030 1912773
[53] C. Bordier, M. Dojat, and P. L. de Micheaux, “Temporal and spatial [74] T.Bollerslev, “Generalized autoregressive conditional heteroskedasticity,”
independent component analysis for fMRI data sets embedded in a r. J. Econometrics, vol. 31, pp. 307–327, 1986. [Online]. Available: http:
package,” 2010, arXiv:1012.0269. //dx.doi.org/10.1.1.161.7380
[54] V. Calhoun, T. Adali, G. Pearlson, and J. Pekar„ “A method for making [75] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, “The staircase mechanism
group inferences from functional MRI data using independent component in differential privacy,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 7,
analysis,” Hum. Brain Mapping, vol. 14, no. 3, pp. 140–151, 2001. [On- pp. 1176–1184, Oct. 2015.
line]. Available: https://doi.org/10.1002/hbm.1048 [76] A.Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11,
[55] E. A. Allen et al., “A baseline for the multivariate comparison of resting pp. 612–613, Nov. 1979. [Online]. Available: http://doi.acm.org/10.1145/
state networks,” Front. Syst. Neurosci., vol. 5, no. 2, 2011. [Online]. 359168.359176
Available: http://dx.doi.org/10.3389/fnsys.2011.00002 [77] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Trans.
[56] V. Calhoun et al., “Method for multimodal analysis of independent source Inf. Theory, vol. 22, no. 6, pp. 644–654, Nov. 1976.
differences in schizophrenia: Combining gray matter structural and audi- [78] J. So, B. Guler, and A. Salman Avestimehr, “CodedPrivateML: A fast and
tory oddball functional data,” Hum. Brain Mapping, vol. 27, no. 1, pp. 47– privacy-preserving framework for distributed machine learning,” IEEE J.
62, 2006. [Online]. Available: http://dx.doi.org/10.1002/hbm.20166 Sel. Areas Inf. Theory, vol. 2, no. 1, pp. 441–451, 2021.
[57] A. J.Bell and T. J. Sejnowski, “An information-maximization approach to [79] I.Mironov, “On significance of the least significant bits for differential
blind separation and blind deconvolution,” Neural Computation, vol. 7, privacy,” in Proc. ACM Conf. Comput. Commun. Secur., New York,
no. 6, pp. 1129–1159, Nov. 1995. [Online]. Available: http://dx.doi.org/ NY, USA, 2012, pp. 650–661. [Online]. Available: http://doi.acm.org/10.
10.1162/neco.1995.7.6.1129 1145/2382196.2382264
[58] C. Dwork, “Differential privacy,” in Automata, Lang. Program., Berlin, [80] V. Balcer and S. Vadhan, “Differential privacy on finite computers,” in
Heidelberg: Springer, 2006, pp. 1–12. Proc. 9th Innov. Theor. Comput. Sci. Conf., Leibniz Int. Proc. Inform.,
[59] C.Dwork and A. Roth, “The algorithmic foundations of differential pri- vol. 94. Dagstuhl, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer In-
vacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3-4, pp. 211–407, formatik, 2018, pp. 43: 1–43:21.
2013. [Online]. Available: http://dx.doi.org/10.1561/0400000042 [81] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential
[60] G. R. Kurri, V. M. Prabhakaran, and A. D. Sarwate, “Coordination privacy,” in Proc. IEEE 51st Annu. Symp. Found. Comput. Sci., 2010,
through shared randomness,” IEEE Trans. Inf. Theory, vol. 67, no. 8, pp. 51–60.
pp. 4948–4974, Aug. 2021. [Online]. Available: https://doi.org/10.1109/ [82] H. Imtiaz, J. Mohammadi, and A. D. Sarwate, “Distributed differentially
TIT.2021.3091604 private computation of functions with correlated noise,” Apr. 2019. [On-
[61] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. New York, NY, line]. Available: https://arxiv.org/abs/1904.10059
USA: Cambridge Univ. Press, 2012. [83] S. Song, K. Chaudhuri, and A. D. Sarwate, “Learning from data with
[62] S. Malluri and V. K. Pamula, “Gaussian q-function and its approxima- heterogeneous noise using SGD,” in Proc. 18th Int. Conf. Artif. Intell.
tions,” in Proc. Int. Conf. Commun. Syst. Netw. Technol., 2013, pp. 74–77. Statist., 2015, vol. 38, pp. 894–902. [Online]. Available: http://jmlr.org/
[63] D.Desfontaines and B. Pejó, “SoK: Differential privacies,” Proc. Privacy proceedings/papers/v38/song15.html
Enhancing Technol., vol. 2020, no. 2, pp. 288–313, 2020. [Online]. Avail-
able: https://doi.org/10.2478/popets-2020-0028
[64] J. Zhao et al., “Reviewing and improving the gaussian mechanism for
differential privacy,” 2019. [Online]. Available: http://arxiv.org/abs/1911. Hafiz Imtiaz received the B.Sc. and first M.Sc. de-
12060 grees from the Bangladesh University of Engineer-
[65] K. Nordhausen, E. Ollila, and H. Oja, “On the performance indices of ICA ing and Technology (BUET), Dhaka, Bangladesh,
and blind source separation,” in Proc. 12th IEEE Int. Workshop Signal in 2009 and 2011, respectively, the second M.Sc.
Process. Adv. Wireless Commun., 2011, pp. 486–490. [Online]. Available: degree from Rutgers University New Brunswick, NJ,
http://dx.doi.org/10.1109/SPAWC.2011.5990458 USA, in 2017, and the Ph.D. degree from Rutgers
[66] A. W.Marshall, l. Olkin, and B. C. Arnold, Inequalities: Theory of Ma- University, New Brunswick, NJ, USA, in 2020. He is
jorization and Its Applications. New York, NY, USA: Springer-Verlag, currently an Assistant Professor with the Department
1979. [Online]. Available: https://link.springer.com/book/10.1007/978-0- of Electrical and Electronic Engineering, BUET. Pre-
387-68276-1 viously, he worked as an Intern with Qualcomm and
[67] W.Rudin, Principles of Mathematical Analysis. McGraw-Hill Higher Intel Labs, focusing on activity/image analysis and
Education, 1976. [Online]. Available: https://www.mheducation. adversarial attacks on neural networks, respectively. His primary area of research
com/highered/product/principles-mathematical-analysis-rudin/ interests include developing privacy-preserving machine learning algorithms for
M007054235X.html decentralized data settings. More specifically, he focuses on matrix and tensor
[68] H.Imtiaz and A. D. Sarwate, “Differentially private distributed principal factorization, and optimization problems, which are core components of many
component analysis,” in Proc. IEEE Int. Conf. Acoust., Speech Signal modern machine learning algorithms.
Process., 2018, pp. 2206–2210. [Online]. Available: https://doi.org/10.
1109/ICASSP.2018.8462519
[69] S. Amari, A. Cichocki, and H. H. Yang, “A new learning algorithm Jafar Mohammadi received the Doctoral degree in
for blind signal separation,” in Proc. 8th Int. Conf. Neural Inf. Process. electrical engineering from the Technical University
Syst., Cambridge, MA, USA: MIT Press, 1995, pp. 757–763. [Online]. of Berlin, Germany, in 2016. He has been with Hein-
Available: http://dl.acm.org/citation.cfm?id=2998828.2998935 rich Hertz Institute Fraunhofer since 2011 as a Doc-
[70] Y. Cao, M. Yoshikawa, Y. Xiao, and L. Xiong, “Quantifying differential toral Candidate and later as a Researcher. In 2017,
privacy under temporal correlations,” in Proc. IEEE 33rd Int. Conf. Data he joined Rutgers University, NJ, USA, as a Post-
Eng., 2017, pp. 821–832. doctoral Researcher working on differential privacy
[71] L.Bottou, “On-line Learning in Neural Networks,” in On-line Learn. Neu- for distributed machine learning. He is currently a
ral Netw., D. Saad, Ed New York, NY, USA: Cambridge Univ. Press, 1998, Researcher with Nokia Bell Labs working on the
ch. On-line learning and stochastic approximations, pp. 9–42. [Online]. intersection of machine learning and wireless com-
Available: http://dl.acm.org/citation.cfm?id=304710.304720 munications. During his career, he was contributing
[72] K. Chaudhuri and S. A. Vinterbo, “A stability-based Validation procedure to flagship European funded projects, such as mmMAGIC and Hexa-X. His
for differentially private machine learning,” in Adv. Neural Inf. Process. main areas of interest could be summarized as using mathematical and machine
Syst., C. J. C. L. Burges, M. Bottou, Z. Welling Ghahramani, and K. Q. learning tools to optimize wireless communications systems.
Weinberger, Eds. Curran Associates, Inc., 2013, pp. 2652–2660.

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.
6370 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 69, 2021

Rogers Silva (Member, IEEE) received the B.Sc. Anand D. Sarwate (Senior Member, IEEE) re-
degree in electrical engineering in 2003 from the ceived the B.S. degrees in electrical engineering and
Pontifical Catholic University, Porto Alegre, Brazil, computer science and mathematics from the Mas-
and both the M.S. degree in computer engineering sachusetts Institute of Technology, Cambridge, MA,
(with minors in statistics and in mathematics) in 2011, USA, in 2002, and the M.S. and Ph.D. degrees in
and the Ph.D. degree in computer engineering (with electrical engineering from the Department of Electri-
distinction) in 2017, from The University of New cal Engineering and Computer Sciences, University
Mexico, Albuquerque, NM, USA. He is currently a of California, Berkeley (U.C. Berkeley), Berkeley,
Research Scientist with the Tri-institutional Center CA, USA. He is a currently an Associate Professor
for Translational Research in Neuroimaging and Data with the Department of Electrical and Computer En-
Science (TReNDS) with Georgia State University, gineering, The State University of New Jersey, New
Georgia Institute of Technology, and Emory University. He is also the Leader of Brunswick, NJ, USA, where he has been since January 2014. He was previously
the #BSIsubspace Section of the Brain Space Initiative. Previously, he was a Research Assistant Professor from 2011 to 2013 with the Toyota Technological
a Postdoctoral Fellow with The Mind Research Network, a Data Scientist with Institute at Chicago. Prior to this, he was a Postdoctoral Researcher from 2008
Datalytic Solutions, and worked as an Engineer, Lecturer, and Consultant. As a to 2011 with the University of California, San Diego, CA, USA. His research
Multidisciplinary Scientist, he develops algorithms for statistical and machine interests include information theory, machine learning, signal processing, opti-
learning, image analysis, numerical optimization, memory efficient large scale mization, and privacy and security. Dr. Sarwate was the recipient of the Rutgers
data reduction, and distributed analyses, focusing on multimodal, multi-subject Board of Governors Research Fellowship for Scholarly Excellence in 2020 and
neuroimaging data from thousands of subjects. the NSF CAREER award in 2015. He is a member of Phi Beta Kappa and Eta
Kappa Nu.

Bradley Baker received a Bachelor of Arts in applied

mathematics/philosophy from the New College of
Florida, and a Master of Science in Computer Science
from the University of New Mexico. He is currently a
PhD student in the department of Computer Science Vince D. Calhoun (Fellow, IEEE) received his M.
and Engineering at Georgia Institute of Technology. S. in information systems and M.A. in biomedical
He is primarily interested in machine learning and its engineering in 1996 and 1993, respectively, from
intersections with complex applications and theory. Johns Hopkins University, Baltimore, MD. He also
His current research interests center around the use holds a B.S. degree in electrical engineering from
of information theory for interpreting deep neural University of Kansas, Lawrence, KS. and received
networks, the application of deep learning to neu- the Ph.D. in electrical engineering from University of
roimaging data, and algorithms for distributed learning. His previous work Maryland Baltimore County, Baltimore, MD in 2002.
has focused on the application of machine learning algorithms in privacy and He is the Founding Director of the Tri-Institutional
security-sensitive settings. He has also worked on projects in recommender Center for Translational Research in Neuroimaging
systems, information ethics, and mathematical pedagogy. and Data Science (TReNDS) and a Georgia Research
Alliance eminent scholar in brain health and image analysis where he holds
appointments with Georgia State University, Georgia Institute of Technology
and Emory University. He was previously the President of the Mind Research
Sergey M. Plis received the Ph.D. degree in computer Network and Distinguished Professor of Electrical and Computer Engineering
science from The University of New Mexico, Albu- with the University of New Mexico. He is the author of more than 900 full
querque, NM, USA, in 2007. He is currently an Asso- journal articles and more than 850 technical reports, abstracts and conference
ciate Professor of computer science with the Georgia proceedings. His work includes the development of flexible methods to analyze
State University and the Director of machine learning functional magnetic resonance imaging data such as independent component
core with the Tri-Institutional Center for Transla- analysis, deep learning for neuroimaging, data fusion of multimodal imaging and
tional Research in Neuroimaging and Data Science. genetics data, neuroinformatics tools, and the identification of biomarkers for
His research interests include developing novel and disease. His research is funded by the NIH and NSF among other funding agen-
applying existing techniques and approaches in ma- cies. Dr. Calhoun is a fellow of The American Association for the Advancement
chine learning and data science to analyzing large of Science, The American Institute of Biomedical and Medical Engineers, The
scale datasets in multimodal brain imaging and other American College of Neuropsychopharmacology, The Organization for Human
domains. One of his key goals is to take advantage of the strengths of imaging Brain Mapping (OHBM) and the International Society of Magnetic Resonance in
modalities and infer structure and patterns that are hard to obtain non-invasively Medicine. He was the Chair for the OHBM from 2018–2019 and is the Past Chair
and/or that are unavailable for direct observation. In the long term this amounts of the IEEE Machine Learning for Signal Processing Technical Committee. He
to developing methods capable of revealing mechanisms used by the brain to is currently the IEEE BISP Technical Committee and is also a member of IEEE
form task-specific transient interaction networks and their cognition-inducing Data Science Initiative Steering Committee as well as the IEEE Brain Technical
interactions via multimodal fusion at features and interaction levels. His ongoing Committee.
work is focused on inferring multimodal probabilistic and causal descriptions
of these function-induced networks based on fusion of fast and slow imaging
modalities. This includes feature estimation via deep learning-based pattern
recognition and learning causal graphical models.

Authorized licensed use limited to: Rutgers University. Downloaded on May 19,2022 at 14:27:13 UTC from IEEE Xplore. Restrictions apply.

Heigl Et Al. - 2021 - On The Improvement of The Isolation Forest Algorit
No ratings yet
Heigl Et Al. - 2021 - On The Improvement of The Isolation Forest Algorit
26 pages
A Differential Privacy Protection Based Federat - 2024 - Engineering Application
No ratings yet
A Differential Privacy Protection Based Federat - 2024 - Engineering Application
12 pages
Efficient Clustering Techniques On Hadoop and Spark
No ratings yet
Efficient Clustering Techniques On Hadoop and Spark
22 pages
Ijdns 2024 7
No ratings yet
Ijdns 2024 7
14 pages
An Efficient Privacy-Enhancing Cross-Silo Federated Learning and Applications For False Data Injection Attack Detection in Smart Grids
No ratings yet
An Efficient Privacy-Enhancing Cross-Silo Federated Learning and Applications For False Data Injection Attack Detection in Smart Grids
15 pages
Scalable Privacy Preservation in Big Data A Sur - 2015 - Procedia Computer Scien
No ratings yet
Scalable Privacy Preservation in Big Data A Sur - 2015 - Procedia Computer Scien
5 pages
1Differentially_Private_Federated_Learning_With_an_Adaptive_Noise_Mechanism
No ratings yet
1Differentially_Private_Federated_Learning_With_an_Adaptive_Noise_Mechanism
14 pages
1-s2.0-S0020025523013130-main
No ratings yet
1-s2.0-S0020025523013130-main
18 pages
Personalized Mobile App Recommendation by Learning User's Interest From Social Media
No ratings yet
Personalized Mobile App Recommendation by Learning User's Interest From Social Media
5 pages
Survey On Categorical Data For Neural Networks: Open Access Survey Paper
No ratings yet
Survey On Categorical Data For Neural Networks: Open Access Survey Paper
41 pages
Enhancing protection in high-dimensional data_ Distributed differential privacy with feature selection
No ratings yet
Enhancing protection in high-dimensional data_ Distributed differential privacy with feature selection
20 pages
Single Multi-Source Black-Box Domain Adaption For Sensor Time Series Data
No ratings yet
Single Multi-Source Black-Box Domain Adaption For Sensor Time Series Data
12 pages
A Graph Based Approach To Manage CAE-Data in A Data-Lake
No ratings yet
A Graph Based Approach To Manage CAE-Data in A Data-Lake
6 pages
1 s2.0 S0925231220319032 Main
No ratings yet
1 s2.0 S0925231220319032 Main
11 pages
Li 20
No ratings yet
Li 20
14 pages
A Neural-Based Architecture For Small Datasets Classification
No ratings yet
A Neural-Based Architecture For Small Datasets Classification
9 pages
Zhao 2020
No ratings yet
Zhao 2020
32 pages
Reinforcement learning-based dynamic pruning for distributed inference via explainable AI in healthcare IoT systems
No ratings yet
Reinforcement learning-based dynamic pruning for distributed inference via explainable AI in healthcare IoT systems
17 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
Review On Health Care Database Mining in Outsourced Database
No ratings yet
Review On Health Care Database Mining in Outsourced Database
4 pages
Anamoly Detection
No ratings yet
Anamoly Detection
7 pages
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
No ratings yet
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
12 pages
Big Data Stream Mining Using Integrated Framework With Classification and Clustering Methods
No ratings yet
Big Data Stream Mining Using Integrated Framework With Classification and Clustering Methods
9 pages
Sensors 22 07726 With Cover
No ratings yet
Sensors 22 07726 With Cover
18 pages
Generative-artificial-intelligence-for-distribut_2024_International-Journal-
No ratings yet
Generative-artificial-intelligence-for-distribut_2024_International-Journal-
8 pages
Energy-Efficient Secure Data Aggregation Framework (Esdaf) Protocol in Heterogeneous Wireless Sensor Networks
No ratings yet
Energy-Efficient Secure Data Aggregation Framework (Esdaf) Protocol in Heterogeneous Wireless Sensor Networks
10 pages
SEA Multi-Graph-Based Higher-Order Sensor Alignment For Multivariate Time-Series Unsupervised Domain Adaptation
No ratings yet
SEA Multi-Graph-Based Higher-Order Sensor Alignment For Multivariate Time-Series Unsupervised Domain Adaptation
16 pages
2020_BigData_Hybrid_Decision_Tree-Neural_Network
No ratings yet
2020_BigData_Hybrid_Decision_Tree-Neural_Network
9 pages
Multi-Dimensional Feature Fusion and Stacking Ensemble Mechanism
No ratings yet
Multi-Dimensional Feature Fusion and Stacking Ensemble Mechanism
14 pages
Secure and Collaborative Network Intrusion Detection - A Federated Approach (Final)
No ratings yet
Secure and Collaborative Network Intrusion Detection - A Federated Approach (Final)
5 pages
1804.05271
No ratings yet
1804.05271
20 pages
DL - 2019 - Deep Neural Networks For Recommender Systems
No ratings yet
DL - 2019 - Deep Neural Networks For Recommender Systems
5 pages
Network Data Security For The Detection System in The Internet of Things With Deep Learning Approach
No ratings yet
Network Data Security For The Detection System in The Internet of Things With Deep Learning Approach
6 pages
CCWC2025_Goyal_Mahmoud_CameraReady
No ratings yet
CCWC2025_Goyal_Mahmoud_CameraReady
8 pages
Feature Level Fusion of Multi-Source Data For Network Intrusion Detection
No ratings yet
Feature Level Fusion of Multi-Source Data For Network Intrusion Detection
7 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
Recommendation Systems Using Graph Neural Networks
No ratings yet
Recommendation Systems Using Graph Neural Networks
6 pages
Heterogeneous Sensor Data Fusion by Deep Multimodal Encoding
No ratings yet
Heterogeneous Sensor Data Fusion by Deep Multimodal Encoding
14 pages
A Novel Deep Anomaly Detection Approach For Intrusion Detection in Futuristic Network
No ratings yet
A Novel Deep Anomaly Detection Approach For Intrusion Detection in Futuristic Network
11 pages
A Data-Driven Soft Sensor Modeling Method Based On Deep Learning and Its Application
No ratings yet
A Data-Driven Soft Sensor Modeling Method Based On Deep Learning and Its Application
9 pages
Format Synopsis
No ratings yet
Format Synopsis
15 pages
1-s2.0-S1746809424000211-main
No ratings yet
1-s2.0-S1746809424000211-main
8 pages
V3i201 PDF
No ratings yet
V3i201 PDF
5 pages
Unsupervised Anomaly Detection in Multivariate Time Series
No ratings yet
Unsupervised Anomaly Detection in Multivariate Time Series
32 pages
5ANFIS and Deep Learning based missing sensor data prediction in IoT
No ratings yet
5ANFIS and Deep Learning based missing sensor data prediction in IoT
15 pages
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
No ratings yet
(IJCST-V12I1P6) :kaushik Kashyap, Rinku Moni Borah, Priyanku Rahang, DR Bornali Gogoi, Prof. Nelson R Varte
5 pages
1 s2.0 S2665917422000411 Main
No ratings yet
1 s2.0 S2665917422000411 Main
6 pages
Example
No ratings yet
Example
8 pages
Reference Paper - Page 85
No ratings yet
Reference Paper - Page 85
30 pages
1-s2.0-S0020025521012391-main
No ratings yet
1-s2.0-S0020025521012391-main
16 pages
Hafiz Imtiaz Sir
No ratings yet
Hafiz Imtiaz Sir
16 pages
Privacy Protection Techniques in Cloud Computing
No ratings yet
Privacy Protection Techniques in Cloud Computing
4 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
M Phil Computer Science Thesis Download
100% (2)
M Phil Computer Science Thesis Download
4 pages
Unsupervised and Self-Adaptative Techniques For Cross-Domain Person Re-Identification
No ratings yet
Unsupervised and Self-Adaptative Techniques For Cross-Domain Person Re-Identification
21 pages
Group Secret Key Generation Via Received
No ratings yet
Group Secret Key Generation Via Received
14 pages
May2024(Volume45Issue01)_IJCSMS_Ankush_6_14
No ratings yet
May2024(Volume45Issue01)_IJCSMS_Ankush_6_14
9 pages
SSRN Id4469457
No ratings yet
SSRN Id4469457
39 pages
PREUNN
No ratings yet
PREUNN
12 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Empotech - q2 - Mod2 - Planning and Conceptualizing ICT Project
100% (4)
Empotech - q2 - Mod2 - Planning and Conceptualizing ICT Project
19 pages
Carver Practical Data Analysis With JMP Student Solutions
No ratings yet
Carver Practical Data Analysis With JMP Student Solutions
116 pages
Transportation Problem Note
No ratings yet
Transportation Problem Note
8 pages
Passwords For Digital Drives
No ratings yet
Passwords For Digital Drives
3 pages
KPMG It Procedure - SDLC 1.0
No ratings yet
KPMG It Procedure - SDLC 1.0
11 pages
Apply Lighting Effect To 3D Object
No ratings yet
Apply Lighting Effect To 3D Object
5 pages
Matlab Manual
No ratings yet
Matlab Manual
70 pages
rsch_pdf_30297368
No ratings yet
rsch_pdf_30297368
83 pages
Hysys
No ratings yet
Hysys
227 pages
Pumpsmart: Configuration & Operation Guide
No ratings yet
Pumpsmart: Configuration & Operation Guide
177 pages
Healthcare 11 00863 Compressed
No ratings yet
Healthcare 11 00863 Compressed
17 pages
Theano Cheatsheet: Imports
No ratings yet
Theano Cheatsheet: Imports
4 pages
Ict Lesson Plans 1 Week 3 Intro
No ratings yet
Ict Lesson Plans 1 Week 3 Intro
4 pages
Firewall Ufw
No ratings yet
Firewall Ufw
10 pages
1KW-60280-1 6517B Electrometer High-Resistance-Meter Datasheet 041122
No ratings yet
1KW-60280-1 6517B Electrometer High-Resistance-Meter Datasheet 041122
14 pages
SPPM Final Notes
No ratings yet
SPPM Final Notes
62 pages
TEAC Srlxi User Manual
No ratings yet
TEAC Srlxi User Manual
12 pages
Java Swing Tutorial: Difference Between AWT and Swing
No ratings yet
Java Swing Tutorial: Difference Between AWT and Swing
3 pages
StoreSim Optimizing Information Leakage in Multi-Cloud Storage Services
No ratings yet
StoreSim Optimizing Information Leakage in Multi-Cloud Storage Services
6 pages
CHPT 2
No ratings yet
CHPT 2
16 pages
William Stallings Computer Organization and Architecture
No ratings yet
William Stallings Computer Organization and Architecture
18 pages
McMurdo FastFind 220 PLB Datasheet
No ratings yet
McMurdo FastFind 220 PLB Datasheet
4 pages
Section of CRT TV
No ratings yet
Section of CRT TV
41 pages
Combined Graduate Level Recruitment OSSC Notice
No ratings yet
Combined Graduate Level Recruitment OSSC Notice
12 pages
Acceptance Letter
No ratings yet
Acceptance Letter
1 page
William Crum - CS 159 - C Programming (Applications For Engineers) - Fall 2022-Stipes Publishing L.L.C (2022)
No ratings yet
William Crum - CS 159 - C Programming (Applications For Engineers) - Fall 2022-Stipes Publishing L.L.C (2022)
216 pages
M7721v6 3
No ratings yet
M7721v6 3
186 pages
Viral Video Hacks-How to Get Millions of Views on Your YouTube Shorts
No ratings yet
Viral Video Hacks-How to Get Millions of Views on Your YouTube Shorts
10 pages
Integration With Oracle's WMS and MSCA: White Paper
No ratings yet
Integration With Oracle's WMS and MSCA: White Paper
7 pages
OPLAN KALUSUGAN English Complete
No ratings yet
OPLAN KALUSUGAN English Complete
1 page

Final Version on IEEE

Uploaded by

Final Version on IEEE

Uploaded by

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

69, 2021 6355

A Correlated Noise-Assisted Decentralized

algorithm. Additive noise mechanisms such as the Gaussian or

we can find the expression

ĥs = hs + es + ks . These two terms are then sent to the ag-

Bradley Baker received a Bachelor of Arts in applied

You might also like