0% found this document useful (0 votes)
7 views

Decentralised Deep

The paper presents a framework called Fair and Differentially Private Decentralised Deep Learning (FDPDDL) that addresses fairness and privacy in collaborative deep learning. It introduces a reputation system using digital tokens and local credibility, combined with differential privacy techniques, to ensure fair contributions and protect sensitive data. Experimental results demonstrate that FDPDDL achieves high fairness and comparable accuracy to existing frameworks while outperforming standalone models.

Uploaded by

linkthesoul1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Decentralised Deep

The paper presents a framework called Fair and Differentially Private Decentralised Deep Learning (FDPDDL) that addresses fairness and privacy in collaborative deep learning. It introduces a reputation system using digital tokens and local credibility, combined with differential privacy techniques, to ensure fair contributions and protect sensitive data. Experimental results demonstrate that FDPDDL achieves high fairness and comparable accuracy to existing frameworks while outperforming standalone models.

Uploaded by

linkthesoul1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 1

How to Democratise and Protect AI: Fair and


Differentially Private Decentralised Deep
Learning
Lingjuan Lyu, Member, IEEE, Yitong Li, Karthik Nandakumar, Senior Member, IEEE, Jiangshan Yu,
and Xingjun Ma

Abstract—This paper firstly considers the research problem of fairness in collaborative deep learning, while ensuring privacy. A novel
reputation system is proposed through digital tokens and local credibility to ensure fairness, in combination with differential privacy to
guarantee privacy. In particular, we build a fair and differentially private decentralised deep learning framework called FDPDDL, which
arXiv:2007.09370v1 [cs.CR] 18 Jul 2020

enables parties to derive more accurate local models in a fair and private manner by using our developed two-stage scheme: during the
initialisation stage, artificial samples generated by Differentially Private Generative Adversarial Network (DPGAN) are used to mutually
benchmark the local credibility of each party and generate initial tokens; during the update stage, Differentially Private SGD (DPSGD) is
used to facilitate collaborative privacy-preserving deep learning, and local credibility and tokens of each party are updated according to
the quality and quantity of individually released gradients. Experimental results on benchmark datasets under three realistic settings
demonstrate that FDPDDL achieves high fairness, yields comparable accuracy to the centralised and distributed frameworks, and delivers
better accuracy than the standalone framework.

Index Terms—Decentralised deep learning; Fairness; Credibility; Privacy.

F
1 I NTRODUCTION

I N real world, many practical applications would benefit


from large-scale deep learning across sensitive datasets
owned by different parties, thus data sharing and analysis
• Single-point-of-failure. Once the central server is shut
down, the whole network stops working. Moreover, if
the central server is attacked, the entire network is under
across parties are of paramount importance to accelerate the risk of being compromised [2].
scientific discovery, facilitate quality improvement initiatives, • Malicious attack. The data being disseminated is muta-
speed up hypothesis testing, and boost accuracy towards ble. An attacker could arbitrarily change its local model
higher level, especially when there are not enough local without being detected, and no audit trail is available to
examples to test a hypothesis [1]. This trend is motivated by identify such malicious behaviour.
the fact that the data from a single organization may be very • Lack of fairness and vulnerable to free-riders: Existing
homogeneous, ending up with an unsatisfactory model that frameworks consider that all parties contribute equally.
fails to generalise to other data, as shown in Fig. 1a. Therefore, This is typically impractical due to the data quality and
there is much demand to train a global model by a central quantity of each party. It is thus unfair that at the end
server on the combined data collected from independent of the collaboration, all parties get access to the same
parties to ensure sufficient statistical power to test hypotheses global model regardless of their contributions. In an
(Fig. 1b). On the other hand, deep learning can be performed extreme case, even the free-riders could successfully join
in a collaborative manner, where a parameter server is the system, and enjoy the system’s global model for
required to maintain the latest parameters available to all free. The lack of fairness might discourage collaboration
parties (Fig. 1c). However, such central server-based learning among parties.
framework suffers from the following weaknesses: To address the first two issues, we make use of Blockchain
• Untrusted server. Due to privacy issue, a party may not to provide a fully decentralised framework, i.e., each partici-
trust a central server [2], thus reluctant to transfer either pant does not trust any third party or any other participants,
data or model parameters to the server. as illustrated in Fig. 1d. In particular, we explicitly study two
types of malicious parties, including free-rider without any
data and the GAN attacker. We claim these two malicious parties
• L. Lyu is with the Department of Computer Science, National University considered all belong to the category of “non-credible” parties, but
of Singapore. E-mail: [email protected].
• Y. Li is with School of Computing and Information Systems, The
not all “non-credible” parties are malicious, a “non-credible”
University of Melbourne, Parkville, Australia, 3010. E-mail: yi- party might follow the protocol honestly, but may have
[email protected]. limited data or totally different data distribution from the
• Karthik Nandakumar is with IBM Singapore Lab, 018983. E-mail: majority party, thus it is still reasonable for all the other
[email protected].
• J. Yu is with Faculty of Information Technology, Monash University, parties to give low local credibility to this “non-credible”
Clayton, Australia. E-mail: [email protected]. party, or even isolate it.
• X. Ma is with School of Information Technology, Deakin University, However, the last two issues are yet to be solved, and
Geelong, Australia. E-mail: [email protected].
collaboration might be significantly hindered due to privacy
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 2

(a) Standalone. (b) Centralised. (c) Distributed. (d) Decentralised.

Fig. 1: Different deep learning frameworks.

and confidentiality restrictions. To overcome these problems, Descent (DSSGD) for distributed deep learning. It allows each
we are inspired to develop a decentralised collaborative party to keep local model private while iteratively update
learning framework that respects collaborative fairness, data its model by integrating differentially private gradients from
privacy and utility at the same time, so as to encourage other parties via a parameter server, as illustrated in Fig. 1c.
more parties to collaborate. Our main contributions are The communication cost, within each round of parameter
summarised as follows: update, is addressed by only sharing a fraction (e.g., 1%-10%)
• We initiate the research problem of collaborative fairness of local model gradients that have values larger than a certain
in collaborative learning, and propose a fair and private threshold or those gradients with the largest absolute values.
framework, named Fair and Differentially Private De- Federated learning is a special case of distributed deep
centralised Deep Learning (FDPDDL). learning, which is tailored to deal with Non-IID, unbalanced
• To address fairness and privacy problem, we propose a and massively distributed data in mobile application [4], [5],
two-stage scheme, which is realised by two algorithms [6]. The goal is to train a shared global model while leaving
respectively, i.e., i) local credibility and tokens initial- training data on users’ mobile phones. Mobile phones with
isation, and ii) local credibility and tokens update. In relatively powerful and fast processors (including GPUs) are
particular, we build a novel reputation system, which required to download the current model, compute updates by
reflects the relative contribution of each party, thus performing local computation, then send local model updates
ensuring fairness; we use Differentially Private GAN to the trusted Google Cloud server in each communication
(DPGAN) in the initialisation stage and Differentially round.
Private Stochastic Gradient Descent (DPSGD) in the update Decentralised deep learning: Decentralised framework is
stage to mitigate privacy leakage; much different than server-based framework, in the sense
• Our framework provides a viable solution to detect and that it is purely decentralised without relying on any central
isolate free-riders and GAN attacker both before and servers, as exemplified in Fig. 1d. The first decentralised
during the collaborative learning process; machine learning model is ModelChain [7], which applies
• We evaluate our framework on several benchmark Blockchain technology to machine learning by incorporating
datasets under three realistic settings. Extensive experi- the idea of boosting, i.e., samples that are more difficult to
ments demonstrate that FDPDDL achieves high fairness, classify are more likely to improve the model significantly.
delivers comparable accuracy to the centralised and The follow-up work [8], [9], [10] integrated blockchain into
distributed deep learning frameworks, and outperforms deep learning. For example, Kang et al. [10] proposed an
the standalone deep learning framework, confirming the effective incentive mechanism to motivate high-reputation
superiority of FDPDDL. mobile devices with high-quality data to participate in model
learning, but they overlooked the privacy issues.
It should be noted that in both distributed framework and
2 R ELATED W ORK AND P RELIMINARIES decentralised framework, parties are all involved in the iter-
This section firstly reviews the relevant deep learning frame- ative process of building a global or consensus model, hence
works, and the privacy and fairness issues in deep learning. we call them collaborative deep learning frameworks. A succinct
Secondly, the relevant techniques used in this paper are comparison among different deep learning frameworks is
introduced, including differential privacy and Blockchain. provided in Table 1.

2.2 Privacy-preserving Deep Learning


2.1 Deep Learning Frameworks Privacy-preserving Centralised deep learning: Centralised
In general, deep learning frameworks fall into the following model is very effective, however it is not privacy-preserving
four categories: since the central server has direct access to all sensitive
Standalone deep learning: participants individually train information. Shokri et al. [3] pointed out that centralised
standalone models on their training data without any collab- deep learning poses serious privacy threats, including (i) all
oration, as shown in Fig. 1a. the sensitive training data are exposed to a susceptible third
Centralised deep learning: Centralised deep learning party who can permanently keep the collected data; (ii) data
forces multiple participants to pool their data into a cen- owners have no control over the learning objective or the
tralised server to train a global model on the combined data, knowledge of what can be inferred from their data; (iii) the
as depicted in Fig. 1b. learned model is not directly available to data owners.
Distributed deep learning: Shokri et al. [3] firstly intro- Privacy-preserving Distributed deep learning: Dis-
duced the concept of Distributed Selective Stochastic Gradient tributed deep learning generally suffers from the common
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 3

TABLE 1: Feature comparison of different deep learning frameworks.

Deep learning frameworks Standalone Centralised Distributed [3], [4], [11] Decentralised [7] Decentralised (ours)
Architecture Fig. 1a Fig. 1b Fig. 1c Fig. 1d Fig. 1d
Global/Consensus model No Yes Yes Yes No
Local models Yes No Yes Yes Yes
Fairness NA NA No No Yes
“non-credible” party detection NA No No No Yes

issue of privacy leakage from the shared gradients. As Bonawitz et al. [5] proposed a secure and failure-robust
demonstrated in [12], even a small proportion of local protocol based on SMC to securely aggregate local model
gradients can reveal certain amount of local data information. updates as the weighted average to update the global
In the case of a local network with only one neuron, the server model on the server. Another more efficient method is to
can extract local data with non-negligible probability. Even borrow differential privacy to conceal user participation, as
for complex neural networks trained with regularisation, the demonstrated by McMahan et al. [6]. However, it requires a
gradients can still expose certain label information of local large number of users (on the order of thousands) to ensure
data [12]. Moreover, if a party turns out to be malicious, it can model convergence and an acceptable trade-off between
easily sabotage the learning process (e.g., by spoofing random privacy and utility. Moreover, the default trusted Google
data samples) or violate some of the privacy requirements by server is entitled to see all users’ updates in the clear,
inferring information about the victim party’s private data, aggregate these updates and add noise to the aggregation,
which the attacker is not supposed to know. Hitaj et al. [13] hence their scheme is even weaker than DSSGD when the
devised an active inference attack on deep neural networks server is untrusted.
in a collaborative setting, which is referred to as Generative Overall, all the current distributed deep learning frame-
Adversarial Networks (GAN) attack. It exploits the real-time works need to be coordinated by a central server, thus falling
nature of the learning process that allows the adversarial under the umbrella of server-based frameworks.
party to train a GAN that generates prototypical samples Privacy-preserving Decentralised deep learning: The first
of the targeted training data that was meant to be private decentralised machine learning model, i.e., ModelChain,
and the generated samples are intended to come from the stated that privacy is preserved by exchanging zero patient
same distribution as the training data. The malicious party data, however, the exchanged model-level information can
is able to attack other parties successfully as long as the still largely leak local data information [12]. Furthermore,
global model is under the process of learning. GAN attack the proposed logic for ModelChain is reasonable only if
makes the distributed setting even more undesirable, as all the participants are honest. More recently, Kim et al. [9]
in centralised learning only the server may pose privacy proposed blockchain-based privacy preserving deep learning
threat, but in distributed learning, any party can violate the and utilized a consensus mechanism to verify local model
privacy of any other parties in the system, even without updates. Zhu et al. [8] provided a proof-of-concept for
involving the server [13]. It is worth noting that GAN attack managing security issues in federated learning systems
succeeds only if the following three conditions are held: (i) via blockchain technology. However, none of these works
the adversary has knowledge of labels of the victim party; considered the fairness problem in collaborative learning.
(ii) class distributions of the adversary and the victim party
are non-independent and identically distributed (Non-IID); 2.3 Fairness in Collaborative Deep Learning
(iii) the victim party is not secured by any privacy protection There has been a long line of work studying fairness in
mechanism or it adopts per-parameter privacy in DSSGD machine learning, however, to the best of our knowledge,
which results in meaningless privacy. existing research on fairness mostly focuses on the protection
To tackle with privacy issue, secure multiparty compu- of some specific attributes, or aim to reduce the variance
tation (SMC) has been used to build privacy-preserving of the accuracy distribution across participants [14], [15],
neural networks in a distributed manner. For example, Se- while none of the previous works addressed the problem of
cureML [11] allows clients to distribute their private training collaborative fairness in collaborative learning.
data among two non-colluding servers during the setup Overall, all the current collaborative deep learning frame-
phase; these two servers then employ SMC to train a global works focus on how to learn a global model or consensus
model on the clients’ encrypted joint data. In general, SMC model with higher accuracy than standalone models, while
techniques achieve a high level of privacy and accuracy, at the losing the ability to verify the contribution of individual
expense of high computational and communication overhead participant, because participants can access the same global
for participants, thereby doing a disservice to attracting model or consensus model no matter how differently they
participation. Alternatively, Shokri et al. [3] perturbed the contribute. In extreme cases, there may exist free-riders
shared local model gradients by adding noise to satisfy in the collaborative learning system, who aim to benefit
differential privacy. However, their privacy bounds are given from the global model, but do not want to contribute any
per-parameter, the large number of parameters prevents the real information. For clarity, we give a concrete example in
technique from providing a meaningful privacy guarantee. Example. 2.1 to showcase how a free-rider party C (no data
In federated learning, to protect individual model updates or model in particular) can also obtain the global model even
from the adversarial server who might scrutinize individual if it fails to make any practical contributions to the global
updates, instead of using differential privacy as in [3], learning process.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 4

Example 2.1. Suppose that three parties A, B, and C are involved database, smart properties are data entries, and smart
in server-based deep learning: contracts are stored procedures. Therefore, our FDPDDL can
• A (honest) has data DA and Model MA with accuracy 90%. be implemented using Blockchain 2.0 technologies, where
• B (honest) has data DB and Model MB with accuracy 80%. the transaction metadata is utilised to disseminate local
• C (free-rider) has no data or model to start with. DPGAN samples or model gradients among parties, all the
Suppose that whenever it is C’s turn to upload the gradients, upload and download transactions are recorded immutably
it always uploads random or carefully crafted gradients, so that on the blockchain, and algorithms like tokens/credibility
the global learning process will not be affected. Finally, all three assignments are done using smart contracts, which make all
participants have access to the same global model. the transactions among all parties fully visible. Compared
with current server-based frameworks, the peer-to-peer
This lack-of-fairness is an essential problem that persists in
architecture of Blockchain allows each party to remain
all the existing collaborative learning frameworks but has been
modular while interoperating with other parties. In addition,
overlooked by far. Lack of fairness can be an obstacle for
instead of ceding control to the central server, Blockchain
widespread the adoption of collaborative learning as a new
enhances security by avoiding single-point-of-failure, each
type of powerful learning platform. On the other hand, this
party in the Blockchain system has control about how its data
fairness issue can be addressed by swinging existing learning
should be accessed, hence obeying the institutional policies.
frameworks to the other extreme that openly publishes all the
gradients. In that case, fairness might be achieved but at the
cost of privacy, which is highly undesirable in collaborative
deep learning. 3 FDPDDL F RAMEWORK

2.4 Differential Privacy This section details our proposed Fair and Differentially
Private Decentralised Deep Learning (FDPDDL) framework,
Definition 1. A randomised algorithm A satisfies (, δ)-
including the main focuses of FDPDDL, and an investigation
approximate Differential Privacy (DP) if
on the Blockchain as the decentralised architecture for
Pr{A(D1 ) ∈ S} ≤ e Pr{A(D2 ) ∈ S} + δ , (1) FDPDDL. For the readers’ convenience, Table 2 contains
a list of notations used throughout the paper.
for all set S ⊆ range(A), and all pairs of datasets D1 , D2 , where
D1 can be obtained from D2 by adding or removing one tuple. TABLE 2: Table of notations.
Further, if δ = 0, we say A preserves -differential privacy.
Unlike the previous empirical criterion for privacy [16], Symbol Meaning
differential privacy is based on a solid theoretical founda- Di , Vi , Mi local training data, validation data and standalone
tion [17]. The formal definition of DP has two parameters: model of party i
pi , di tokens and gradients download budget of party i
i) privacy budget  measures the incurred privacy leakage –
cji local credibility of party j given by party i based on
lower  means less information leakage and higher privacy the usefulness of party j to party i
guarantee; ii) δ bounds the probability that the privacy loss ui number of uploaded DPGAN samples or gradients by
exceeds , with the recommended value δ  1/N , where party i
dij number of gradients of party j downloaded by party i
N is the number of training examples. The values of (, δ)
λj sharing level of party j
accumulate as the algorithm repeatedly accesses the private ∆wj gradients of party j
data [18]. ∆(wij )S selected gradients of party j sent to party i
wi parameter of party i at previous communication round
Theorem 1. [17, Theorem 3.16.] Composition for (, δ )- w0i updated parameter of party i at current round by
differential privacy (the epsilons and the deltas add P up): the
Pcom- combining all parties’ selected gradients
0
position of k differentially private mechanisms is ( i i , i δi )- wji temporary parameter of party i by removing party j ’s
differentially private, where for any 1 ≤ i ≤ k , the i-th mechanism gradients ∆(wij )S from w0i to update cji
is (i , δi )-differentially private. acc validation accuracy of party i
accj validation accuracy of party i by excluding party j ’s
∆(wij )S
2.5 Blockchain Technology n number of participating parties
cth lower bound of the credibility threshold agreed by the
Blockchain was first proposed as a proof-of-work consensus majority party
protocol implementation of peer-to-peer timestamp server on C credible party set with local credibility above cth
a decentralised basis in the Bitcoin crypto-currency [19]. As a mj number of matches between majority labels and party
j ’s predicted labels
new form of a distributed database, it can store arbitrary data ei gap between download budget di and current down-
in the transaction metadata. Specifically, an electronic coin loads j∈C\i dij of party i
P

(e.g., Bitcoin) is defined as a chain of transactions. A block rji extra gradients of party j that can be provided to i
contains multiple transactions to be verified, and the blocks Li parties in C that can provide additional gradients to
party i
are chained as blockchain using hash functions to achieve (ski0 , pki0 ) party i’s key pair for signing and verification, respec-
the timestamp feature. Such Blockchain-based distributed tively
database is known as Blockchain 2.0, including technologies f sk fresh symmetric encryption key used in the hybrid
such as smart properties (the properties with blockchain- cryptosystem
(ski , pki ) party i’s key pair for decryption and encryption used
controlled ownership) and smart contracts (programs that in the hybrid cryptosystem
manage smart properties) [20]. In the context of a distributed
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 5

3.1 Main focuses of FDPDDL or gradients will be traded using digital tokens, and recorded
Privacy: In FDPDDL, we assume parties do not trust each as transactions in the blockchain. Tokens can be consumed
other or any third server. To remove the deterrents for parties by downloading or be earned by uploading differentially
to collaborate, instead of publishing all the original data or private samples or gradients. In this way, we guarantee the
model parameters, each party leverages DPGAN to publish fair exchange among participants.
differentially private samples for mutual evaluation during Depending on the application scenario, the blockchain in
the initialisation stage, and publishes differentially private FDPDDL can be either a consortium blockchain that requires
gradients during the update stage. permissions to participant in the system, such as Hyperledger
Fairness: The basic idea of fairness is motivated by the Fabric [21], or a permissionless blockchain which anyone can
fact that the party who contributes more to other parties join at any time, such as Ethereum [22].
should be given a higher local credibility and rewarded with
3.2.1 Genesis Block
a better performing final model than a low-contribution
party. To ensure fairness, we build a reputation system In our blockchain, the first block, i.e., the genesis block,
through digital tokens and local credibility. Each party in initialises the system. The genesis block also records the
the system participates in the evaluation of the usefulness of verification key pki0 of party i’s signing key ski0 . Whenever a
other parties, and requests more samples or gradients from new party joins in the network or the existing party adds new
parties with higher local credibility. In this way, participants data during training, initialisation will be restarted and a new
are motivated to release more in order to earn more tokens, block will be added to the blockchain to update the relevant
which can be used to download gradients from other parties. data. Meanwhile, based on the Blockchain mechanism, we
For example, if a local model has 100K parameters, the do not need to deal with the party departure. When a party
participant with sharing level λj of 0.1 can at most publish leaves the private blockchain network, other parties just need
10%, i.e., 10K gradients in a privacy-preserving manner and to remove it from their local credibility lists.
be rewarded 10K tokens. If any of the other participants To initialise a genesis block, we propose Algorithm 1
want to download gradients, they need to pay some tokens. to initialise local credibility and tokens, where participants
Uploading more samples or gradients gives a participant contribute their artificial samples in a privacy-preserving
more tokens and using these tokens this participant can manner, then mutually evaluate the quality of each other’s
download more gradients published by others. This is the samples, and gain reward in the form of tokens. All the re-
incentive for publishing more, as long as it is within the limits leased samples are authenticated through a digital signature
of privacy. Similarly, downloading more gradients consumes scheme, where the public key pki of party i is advertised
more tokens. Fairness is achieved by rewarding each party as together with the signed samples. This key will later be
per its relative contribution to other parties during download included in the genesis block, and the corresponding signing
and upload processes as follows: key ski0 will be used to claim the associated reward. The
• Download as per local credibility: Since one party
agreed reward, in the form of tokens of each party, together
might contribute differently to different parties, the with their verification key pki0 , will be recorded in the genesis
credibility of one party might be different from the block through an initial blockchain consensus process, which
perspective of different parties, therefore, each party i is specific to each blockchain.
should keep a local credibility list by sorting all parties
3.2.2 Operation Block
as per their local credibilities in descending order, which
is known only by party i. The higher the local credibility After initialisation, the event of trading (i.e., purchasing)
of party j in party i’s credibility list, the more likely differentially private gradients are recorded as transactions
party i will download gradients from party j , and more in the blocks. To order gradients from a party j , party i
tokens will be rewarded to party j . needs to create a purchase order, as a transaction, and record
• Upload as per request and sharing level: Once one
it in a block. The transaction includes the tokens party i
party receives download request (demand for the num- is willing to pay for a specified number of gradients, and
ber of gradients), how many gradients will be uploaded party i’s public key pki that will be used later by party j for
by the requested party depends on both the download encryption purpose. Considering the released gradients are
request and the sharing level λj of the requested party. of high dimensionality, the standard hybrid cryptosystem can
be used to take advantage of the efficiency of the symmetric-
By enforcing fairness, our FDPDDL allows parties to (i)
key cryptosystem — a freshly generated symmetric key f sk
independently converge to different parameters; (ii) critically
is used to encrypt gradients, while pki is used to encrypt
avoid overfitting their parameters to a single party’s local
f sk . In this way, we minimize the required computational
training data. Once multiple local models are collaboratively
cost incurred by asymmetric key based encryption. Once
trained, each party can independently evaluate its model on
the order is placed in the blockchain, party j agrees and
the unseen data, without interacting with other parties.
completes the order in two steps: (1) party j encrypts the
selected gradients with f sk , and sends both the encrypted
3.2 Blockchain investigation for FDPDDL gradients and encrypted f sk to a public accessible storage;
For investigation of Blockchain, we first formulate the notion (2) party j creates a transaction that contains the hash value
of “digital token” as a currency for transaction. Second, of the encrypted gradients, together with a pointer to the
we make use of blockchain to record and supervise data transaction containing party i’s request. Once this transaction
exchange in a distributed manner that is robust, fair, and is included in the blockchain, the agreed tokens will be
transparent. In particular, the differentially private samples transferred from party i to party j automatically through
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 6

blockchain. Note that if party i wants to maliciously denial and local credibility initialisation according to the number of
the fact that party j has honestly shared gradients, party released artificial samples and relative contribution of each
j can reveal the provided gradients to other participants party.
for verification. Since both pki and the hash value of the Sharing level and Tokens Initialisation: Based on the
encrypted gradients are recorded in the blockchain, with the number of artificial samples ui that party i publishes at the
gradients revealed by party j , anyone can verify that whether beginning, sharing level is autonomously determined that it
party j has provided the requested data to party i, and is comfortable with, which can be quantified as λi = ui /|Di |,
party i will be punished through a special transaction in the where Di is the local training data of party i. The more
blockchain. Similarly, thanks to the transparent blockchain, private party would prefer to release less samples, while
if party j is misbehaved, it will be detected and punished in the less private party is comfortable with releasing more
a similar way. samples. Similarly, during the update stage, more private
party would prefer to release less gradients. Tokens of party i
are initialised as pi = λi ∗|wi |∗(n−1), where λi is the sharing
4 FDPDDL R EALIZATION level of party i, |wi | is the number of model parameters, and
This section details the two-stage realization in FDPDDL: n is the number of parties. The gained tokens will be used to
local credibility and tokens initialisation and local credibility download gradients in the update stage.
and tokens update, as shown in Fig. 2. Local Credibility Initialisation. For local credibility
initialisation, each party compares the majority voting of
4.1 Local credibility and tokens initialisation all the combined labels with an individual party’s predicted
labels to evaluate the effect of this party. It relies on the
Algorithm 1 Local credibility and tokens initialisation fact that the majority voting of all the combined labels
reflects the outcome of the majority party, while the predicted
Input: number of participating parties n, C={1,. . . ,n} labels of party j only reflects the outcome of party j . For
Output: local credibility and tokens of all parties
example, in the case of party i initialising local credibilities
1: Pre-train aprior model: Each party i trains standalone
model Mi based on its local training data. for other parties, party i broadcasts its artificial samples
2: Artificial samples generation: Party i releases ui = λi ∗|Di | generated by DPGAN to other parties, who label these
artificial samples generated by DPGAN to any party j . samples using their pre-trained standalone models, then
3: Local credibility initialisation: Party j labels the received send the corresponding predicted labels back to party i.
artificial samples by its standalone model Mj , then returns Meanwhile, party i also labels its own artificial samples
the predicted labels back to party i. Meanwhile, party i also using its pre-trained standalone model, then combines all
predicts labels for its own DPGAN samples using Mi . Party i
then applies majority voting to all the predicted labels, and
the predicted labels of all parties as a label matrix with total
m
initialises the local credibility of party j as cji = uij , where mj n columns with each column corresponding to one party’s
is the number of matches between majority labels and party predicted labels. From this label matrix, party i can initialise
j m
j ’s predicted labels, and ui is the number of DPGAN samples the local credibility of party j as ci = uij , where mj is the
generated by party i. number of matches between the majority labels and party j ’s
j
c
4: Local credibility normalisation: cji = P i j predicted labels, and ui is the number of DPGAN samples
c j
j∈C\i i released by party i. Afterwards, party i normalises ci within
if cji < cth then [0,1]. If the majority party report that the local credibility of
party i reports party j as "non-credible"
end if one party is lower than the threshold cth , implying a “non-
5: Credible party set: If majority party report party j as "non- credible” party, it will be banned from the local credibility
credible", Blockchain removes party j from the credible party lists of all parties. Here, cth should be agreed by the majority
set C and all parties rerun step 4 again. party.
6: Tokens initialisation to download gradients: pi = λi ∗ In addition, Algorithm 1 can automatically take care of
|wi |∗(n − 1). the scenario where an honest participant publishes some
gradients, while all the other honest participants assign
As stated in Algorithm 1, to initialise local credibility very low credibility. In this case, the data distribution
and tokens, each participant first trains a DPGAN based of the publisher is completely different from that of the
on its local training data to generate artificial samples with other participants, hence it is still reasonable to reduce the
differential privacy guarantee. These artificial samples are credibility of the publisher, because other participants are
generated in a way that does not disclose the true sensitive anyway unlikely to gain much from the updates released by
image instances, as well as the true distribution of data. the publisher.
Rather, they only provide a few implicit density estimation Differentially Private GAN (DPGAN). During the ini-
within a tolerable privacy budget used in DPGAN [23]. Each tialisation stage, we use Differentially Private Generative Ad-
participant then publishes individually generated samples versarial Network (DPGAN) to generate differentially private
with size proportional to individual sharing level without artificial samples to mutually benchmark the local credibil-
publishing any labels. After receiving DPGAN samples ity of each party and generate initial tokens. Each party
from one participant, all the other participants run their individually trains a Differentially Private GAN (DPGAN)
pre-trained standalone models on these received artificial by using GANobfuscator which adds tailored noise to
samples and send the predicted labels back to the sender for gradients during the training procedure of GAN [24]. The
local credibility initialisation. Below, we detail the main tasks main idea lies in the post-processing property of differential
in Algorithm 1: sharing level and digital tokens initialisation, privacy, differentially private discriminator combined with
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 7

1st Stage: Local credibility and tokens initialisation 2nd Stage: Local credibility and tokens update

Sharing level λ1 = |SD1 |/|D1 | w10 = w1 + ∆w1 + Σ4j=2 (∆wj1 )S


 majority 2 3 4 0
[c , c , c ], Σ4j=2 cj1 = 1 0 0 0

M1 (SD1 )M2 (SD1 )M3 (SD1 )M4 (SD1 ) [c21 , c31 , c41 ], Σ4j=2 cj1 = 1
voting 1 1 1
DPGAN, SD1 DPGAN, SD2
Party 1 SD1 Party 2 Party 1 (∆w12 )S Party 2
D1 ,M1 D2 ,M2 ∆w1 ,M10 ∆w2 ,M20
M2 (SD1 ) (∆w21 )S

SD1 (∆w14 )S
M3 (SD1 ) SD1 (∆w31 )S (∆w13 )S
M4 (SD1 ) (∆w41 )S
DPGAN, SD3 DPGAN, SD4
Party 3 Party 4 Party 3 Party 4
D3 ,M3 D4 ,M4 ∆w3 ,M30 ∆w4 ,M40

Fig. 2: Two-stage Realization of FDPDDL. SD1 : DPGAN samples randomly chosen by Party 1 from the pool of local
DPGAN samples generated offline, M1 : standalone model of Party 1; (∆w21 )S : selected gradients of Party 2 sent to Party 1
(d12 = min(c21 × d1 , λ2 × |∆w2 |) gradients are selected from ∆w2 and grouped into set S , where c21 × d1 is the download
request from Party 1), M20 : local model of Party 2 at current communication round.

the computation of generator will produce differentially 4.2 Local credibility and tokens update
private generator. To counter the stability and scalability
For local credibility and tokens update, each part i takes 20%
issues of training DPGAN, we apply adaptive pruning,
local training data as the validation data and uses leave-one-
which significantly improve both training stability and
out strategy to evaluate the local credibility of party j based
utility [24]. DPGAN can generate infinite number of samples
on the usefulness of party j ’s gradients in each round of
for the intended analysis, while rigorously guaranteeing
training process. Specifically, party i evaluates the change
(, δ)-differential privacy of training data. Without loss of
of validation accuracy by removing party j ’s gradients
generality, we exemplify DPGAN in the context of the
from the updated model parameter w0i that combines all
improved WGAN framework [25]. As demonstrated by the
parties’ gradients, i.e., using the combined gradients with
most recent work [23], [24], DPGAN is able to synthesize
and without party j ’s gradients to evaluate the validation
data with inception scores fairly close to both the real data
accuracy of party i, which yield acc and accj respectively, the
and the samples generated by the non-private GANs. As
difference between acc and accj reflects how party j affects
evidenced by Fig. 3, although the generated artificial samples
validation accuracy. Party i computes the local credibility
are not real training samples, the digits clearly vary either
cji of party j at the current round by passing an "accuracy
in shape, colour or surroundings, they can still keep the acc
factor" x = acc+acc through a sigmoid function f as in
general characteristics of the class to ensure utility. Due to j
Eq. (2).
limited data size of each party, we let each party apply data
augmentation to expand local data size to 100 times, which 1
helps DPGAN to generate more reliable samples within a cji = f (x) = (2)
1 + exp(−15 ∗ (x − 0.5))
moderate privacy budget for local credibility initialisation. In
particular, we augment image datasets with rotation range The incentive can be explicitly explained as follows. As x
of 1 and width shift range and height shift range of 0.01. For stands for the accuracy ratio between the validation accuracy
text datasets, we repeat each record for 100 times. Samples using the combined gradients of all parties and the validation
generated by DPGAN with  = 4 and δ = 10−5 for MNIST accuracy using the combined gradients without party j ’s
and  = 4, δ = 10−6 for SVHN are illustrated in Fig. 3. Each gradients, hence it can be further expressed as:
party individually trains a DPGAN on 60,000 augmented acc acc
MNIST examples, and 100,000 augmented SVHN examples x= = (3)
acc + accj 2 ∗ acc + ∆
respectively. Note that each party can generate massive
DPGAN samples offline without affecting collaboration. where accj = acc + ∆, ∆ indicates the impact of removing
party j , the more positive the value of ∆, the better the
validation accuracy after removing party j , hence the lower
the contribution of party j . To be more specific, if party j has
j
no impact, ∆ = 0, x = 0.5, ci = 0.5; if party j contributes
j
negatively, accj > acc, then ∆ > 0, x < 0.5, ci < 0.5; if
party j contributes positively, accj < acc, then ∆ < 0, x >
0.5, cji > 0.5. Each party i keeps updating its local credibility
Fig. 3: Generated samples by DPGAN with  = 4, δ = 10−5
list based on the contributions of all the other parties in
for MNIST and  = 4, δ = 10−6 for SVHN. Each party
each round and integrates their historical local credibilities
trains a DPGAN on 60,000 augmented MNIST examples, and
by averaging over the local credibility of current round and
100,000 augmented SVHN examples respectively.
previous round. In the follow-up rounds, the number of
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 8

gradients to be downloaded will be dependent on the sharing to compute the spent privacy over the course of training.
level, local credibility and download budget. Different from DPSGD used for the whole database in the
For tokens update, one token is consumed/rewarded centralised framework [18], decentralised framework enables
for each download/upload of gradients. In the subsequent each party to individually train local model, and we are
rounds, party i is more likely to download gradients from concerned with the privacy leakage from the local model
more credible parties, while download less, even ignore those before publication in each round. To limit the sensitivity of
published by less credible parties. If the credibility of one updates, we follow the DPSGD algorithm [18] to clip the
party falls below a threshold cth , it can be even banned from gradient of each example such that the L2 norm is bounded
the local credibility list of party i. by the chosen gradient norm upper bound. Model training
that satisfies differential privacy with respect to example-
Algorithm 2 Local credibility and tokens update adjacent datasets satisfies the intuitive notion of privacy: the
Input: C , cji , pi , pj , di , λj , ∆wj , wi , Vi presence or absence of any specific example in the training
0
Output: updated parameters w0i , credibility cji and tokens data has an imperceptible impact on the parameters of the
0 0
pj , pi learned model [26]. It follows that an adversary inspecting
In each round, suppose party i aims to download total di the trained model cannot infer whether any specific example
gradients from all parties in C , while party j ∈ C \ i can at was used in the training, irrespective of what auxiliary
most upload λj × |∆wj | gradients. Party i updates its local information they may
credibility list, model parameters and tokens based on the p have.
gradients of party j ∈ C \ i as follows:
We choose σ = 2 ∗ ln(1.25/δ)/ for DPSGD, where  ≤
if di < pi then 1, by the standard arguments [17], each step is (, δ)-DP with
for j ∈ C \ i do respect to each lot. Since each lot is randomly sampled with
dij = min(cji ∗ di , λj ∗ |∆wj |) replacement, the privacy amplification theorem [27] implies
p0j = pj + dij , p0i = pi − dij each step is (O(q), qδ)-DP w.r.t the full database, where q =
dij gradients of ∆wj are grouped into set S , which L/N is the sample ratio. Compared with strong composition
are selected according to the “largest values" criterion: sort theorem [28], moments accountant delivers tighter bound
gradients in ∆wj , and upload dij of them, starting from the p
in two ways [18]: it saves a log(1/δ) factor in the  part
largest.
Parameter update: w0i = wi + ∆wi + j∈C\i ∆(wij )S ,
P and a T q factor in the δ part. For appropriately√chosen
0 noise scale and clipping threshold, DPSGD is (O(q T , δ))-
wji = w0i − ∆(wij )S , where wi is party i’s local parameters
of previous communication round. differentially private. Here, T is the total number of iterations
0
acc ← (w0i , Vi ), accj ← (wji , Vi ) over the training data. Because of this tighter bound on
acc
x = acc+accj privacy spending, DPSGD can iterate over the training data
0 j
c +f (x) sufficient number of times before exhausting a moderate
cji = i 2 , where f refers to the sigmoid credibility privacy budget. This explains why DPSGD is able to train
mapping function in Eq. (2).
deep models that offer good model utility.
end for j0
0 c We remark that one attractive consequence of applying
credibility normalisation: cji = P i j 0
c
j∈C\i i DPGAN and DPSGD is that integrating DP into training
0
if cji
< cth then generalizes well [29]. Like normal GAN and normal SGD,
party i reports party j as "non-credible" DPGAN and DPSGD need to iterate over the training data
end if and apply gradient computation multiple times. However,
Credible party set: If majority party report party j as "non- each access to the training data causes information leakage
credible", Blockchain removes party j from credible party set
of the training data and thus incurs privacy loss from the
C and all parties remove party j ’s model updates ∆(wij )S
from their updated w0i and rerun credibility normalisation. overall privacy budget . To apply DPGAN and DPSGD
end if to the distributed/decentralized settings, we follow recent
work [30], [31], [32], [33] to conduct local gradient computa-
The detailed local credibility and tokens update proce- tion and calculate privacy on a per party-basis, where each
dure is elaborated in Algorithm 2. Note that for any party party individually applies moments accountant [18] to keep
i, the received gradients j∈C\i dij could be different from
P
track of the spent privacy budget. Each party repeats the local
the download budget di , as party j at most can provide training process until the allocated privacy budget is used
λj ∗ |∆wj | gradients, while party i plans to download cji ∗ di up. In particular, for local training process of DPGAN and
gradients from party j . To fill in the gap between di and DPSGD over local dataset of each party, we allocate a privacy
i budget of (4, 10−5 )-DP and (2, 10−5 )-DP respectively (with
P
d
j∈C\i j we design a supplement mechanism which can
,
be referred to the supplementary material. the exception of SVHN where δ = 10−6 ). As per composition
Differentially Private SGD (DPSGD): To facilitate property of DP in Theorem 1, it results in a total (6, 2 ∗ 10−5 )-
collaborative privacy-preserving deep learning, we use DP for MNIST, Adult and Hospital, and (6, 2 ∗ 10−6 )-DP for
DPSGD [6], [18] to enable information exchange in a differen- SVHN for each party.
tially private manner. DPSGD consists of two parts: sanitizer
and moments accountant. Sanitizer performs two operations: 5 P ERFORMANCE E VALUATION
(1) limit the sensitivity of each individual example by 5.1 Datasets
clipping the norm of its gradient; and (2) add noise to
MNIST1 . This dataset is for handwritten digit recognition,
the gradient of a lot (several mini-batches) before updating
which consists of 60,000 training examples and 10,000 test
network parameters. Moments accountant keeps track of a
bound on the moments of the privacy loss random variable 1. http://yann.lecun.com/exdb/mnist/
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 9

examples. Each example is a 32x32 gray-level image, with criterion which is consistent throughout the entire learning
digits locating at the center of the image. process in DSSGD.
SVHN2 . This dataset is obtained from Google’s street view
images, containing over 600,000 examples, from which we 5.3 Communication Protocol
use 100,000 for training and 10,000 for testing. Each example
Asynchronous protocol may lead to concurrency issues, also
is a 32x32 centered image with RGB channels. SVHN is
known as the staleness effect [35], which is due to the
more challenging than MNIST as most images are noisy and
speed inconsistency between different parties. For those slow
contain distractors at the sides. The classification objective
parties, downloaded parameters may lose usefulness if other
for both MNIST and SVHN is to classify the input image as
parties perform parameter updates much more frequently.
one of 10 possible digits within [“0”-“9”].
This staleness effect can slow down convergence or even
Adult3 . The Adult Census dataset includes 48,843 records
destroy learning.
with 14 sensitive attributes, including age, race, education
In contrast, synchronous SGD typically works better than
level, marital status, and occupation, etc. This dataset is
the asynchronous SGD, as demonstrated in [36], synchronous
commonly used to predict whether an individual makes over
training achieves around 0.5% to 0.9% higher accuracy, needs
50K dollars in a year (binary). There are 48,842 records in
fewer epochs to converge, and scales better. Therefore, we
total, with 24% (11,687) records over 50K and 76% (37,155)
use synchronous parameter exchange protocol in all our
under 50K. We manually balance the dataset to 11,687 records
experiments. However, we observed that both federated
over 50K and 11687 records under 50K by random sampling,
learning [4], [5] and distributed learning with DSSGD [3]
resulting in 23,374 records. We allocate 80% records as
suffer from certain non-convergence and accuracy degrada-
training set and 20% as test set.
tion problems working with this synchronous protocol. That
Hospital. The Diabetic Hospital dataset contains data
partly explains why DSSGD adopts asynchronous, round
on diabetic patients from 130 US hospitals and integrated
robin or random order protocols, rather than synchronous
delivery networks. We directly derived the dataset from [34]
protocol [3]. We also observed that our FDPDDL framework
which balances the training set to 10k positives and 10k
is less sensitive to various hyper-parameter settings, e.g., it
negatives. The record of each patient is represented by
does not suffer from non-convergence problem even using the
127 features, such as demographic (e.g., gender, race, age),
same hyper-parameter setting as in DSSGD. We hypothesize
administrative (e.g., length of stay) and medical (e.g., test
that the downloading strategy based on accumulated cred-
results). The task is to predict whether a patient would be
ibility contributes to model convergence – a by-product of
readmitted to hospital within 30 days (binary).
our framework.

5.2 SGD Frameworks


5.4 Experimental Setup
To show the effectiveness of our proposed FDPDDL, we
For implementation on image datasets including MNIST and
compare with three baselines as outlined in [3]. SGD is
SVHN, we use multi-layer perceptron (MLP) and convolutional
adopted in all frameworks.
neural network (CNN) architectures as in [3]. The detailed
Centralised framework assumes all the local training
architecture description is deferred to the supplementary
data are pooled into a trusted server to train a global model
file. For text datasets including Adult and Hospital, we
on the combined data using standard SGD.
use MLP with a single hidden layer with 128 units. To
Standalone framework enables participants to train
reduce the impact of random initialisation and counter non-
standalone models on their local training data without
convergence, each party initialises its local model with the
any collaboration. When training alone, each participant
same parameter w0 , then runs training on its local data to
is susceptible to falling into local optima.
update local model parameter wi . This contributes to a fair
Distributed framework allows participants to train in-
and consistent local credibility initialisation. For local model
dependently and concurrently, and to choose a fraction of
training, we follow
√ the preliminary study in [18] to choose
parameters to upload per round. Distributed framework
the lot size as N , where N is the total number of local
using selective SGD (DSSGD) can achieve equivalent or
training examples including the augmented examples in our
even higher performance than the centralised framework
case, and set the initial learning rate as 0.1 with decay 10−7 .
because updating a small fraction of parameters acts as
During the training of DPGAN and DPSGD, we dynamically
a regularisation technique, which prevents the neural net-
adjust the clipping bounds to achieve faster convergence and
work from "memorizing" training data, hence avoiding
better utility [24]. To boost fairness and enable local models
overfitting [3]. Therefore, we also use DSSGD in distributed
to move towards their respective model minima, we let each
framework. As DSSGD with round robin parameter exchange
party individually train 10 local epochs in advance before
protocol results in the highest accuracy in [3] and facilitates
collaborative learning starts. For all the experiments, we
fairness calculation, we follow the round robin protocol for
empirically set the local credibility threshold as cth = n1 ∗ 32
DSSGD, where participants run SSGD sequentially: a party
via grid search, where n is the number of parties.
downloads a fraction of the most up-to-date parameters from
For applicability, we mainly investigate three realistic
the server, runs local training, and uploads selected gradients;
settings as follows, among which, the first two settings
the next party follows in the fixed order [3]. Gradients
belong to the balanced partition, while the last setting
are selected and uploaded according to the “largest values”
belongs to the unbalanced partition. In particular, for the
2. http://ufldl.stanford.edu/housenumbers/ balanced partition of image datasets, we randomly sample
3. http://archive.ics.uci.edu/ml/datasets/Adult 1% of the entire examples as the local training data for
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 10

each party, i.e., 600 examples for MNIST and 1000 examples where x̄ and ȳ are the sample means of x and y , sx and sy
for SVHN; for the balanced partition of text datasets, we are the corrected standard deviations. The range of fairness
randomly sample 370 examples as the local training data of is within [-1,1], with higher values implying good fairness.
each party for Adult dataset, and 400 examples as the local Conversely, negative coefficient implies poor fairness.
training data of each party for Hospital dataset.
• Same sharing level λi , same data size |Di |: we set the 5.6 Experimental Results
sharing level of each party as 0.1, where each party
releases 10% artificial samples during the initialisation TABLE 3: Fairness test of distributed framework and FD-
stage, and 10% gradients during the update stage; PDDL on MNIST, with different party numbers (P-k ) and
• Different sharing level λi , same data size |Di |: we different settings.
randomly assign sharing levels from [0.1, 0.5] to each
party, each party releases artificial samples and gradients Different λi , same |Di | Different |Di |, same λi
as per their individual sharing levels; Distributed FDPDDL Distributed FDPDDL
• Different data size |Di |, same sharing level λi : the CNN MLP CNN MLP CNN MLP CNN MLP
difference of this setting from the previous two settings P4 -0.68 0.30 0.92 0.96 -0.97 0.28 0.95 0.98
lies in that different parties are allocated with different P15 0.20 -0.15 0.90 0.92 0.03 -0.07 0.91 0.90
number of examples, i.e., imbalanced data size. For P30 -0.02 0.02 0.87 0.85 0.04 0.13 0.84 0.78
P50 -0.16 -0.05 0.78 0.76 0.14 0.07 0.75 0.71
example, for MNIST, we randomly partition a total of
{2400, 9000, 18000, 30000} examples among {4, 15, 30, 50}
TABLE 4: Fairness test on SVHN.
parties respectively. Similarly, for SVHN, a total of {4000,
15000, 30000, 50000} examples are randomly partitioned Different λi , same |Di | Different |Di |, same λi
among {4, 15, 30, 50} parties respectively. The sharing Distributed FDPDDL Distributed FDPDDL
level of each party is set equally to 0.1. CNN MLP CNN MLP CNN MLP CNN MLP
P4 0.27 0.26 0.89 0.85 0.38 0.20 0.98 0.97
5.5 Quantification of Fairness P15 0.16 0.19 0.83 0.79 -0.13 0.36 0.90 0.89
In collaborative learning, collaborative fairness should be P30 -0.14 0.12 0.75 0.69 0.04 -0.27 0.85 0.84
P50 -0.25 -0.37 0.72 0.66 -0.23 0.15 0.77 0.73
quantified from the view of the whole system. In this work,
we quantify collaborative fairness through the correlation co-
efficient between party contributions (i.e., standalone model TABLE 5: Fairness test on Adult and Hospital.
accuracies which characterize the learning capability of Different λi , same |Di | Different |Di |, same λi
different parties on their own data, and sharing levels, which Distributed FDPDDL Distributed FDPDDL
characterize the sharing willingness of different parties) and Adult Hosp Adult Hosp Adult Hosp Adult Hosp
party rewards (i.e., final model accuracies of different parties).
P4 0.13 0.15 0.97 0.94 0.15 0.18 0.99 0.95
Specifically, we take party contributions as the X-axis, P15 0.02 0.07 0.90 0.85 0.07 0.10 0.92 0.88
which represents the contributions of different parties from P30 -0.08 -0.12 0.75 0.71 -0.02 0.03 0.77 0.74
P50 -0.12 -0.21 0.68 0.65 -0.15 -0.18 0.69 0.67
the system view. In particular, in Setting 2, we characterize
different parties’ contributions by their sharing levels and
standalone model accuracies, as the party who is less private Fairness Test. For collaborative fairness comparison, we
and has local data with better generalization empirically only analyze our FDPDDL and the distributed framework
contributes more. In Setting 1 and Setting 3, we characterize using DSSGD, neglecting the centralised framework and
different parties’ contributions by their standalone model standalone framework, because parties do not collaborate
accuracies, as the party who has local data with better in the standalone framework, and parties cannot get access
generalization empirically contributes more. Specifically, in to the trained global model in the centralised framework,
Setting 3, the party with more local data typically yields the global model is only available in the form of “machine
higher standalone model accuracy in IID scenarios. In learning as a service" (MLaaS). Table 3 and Table 4 list the
summary, the X-axis can be expressed by Eq. 4, where λj calculated fairness of the distributed framework and our
and saccj denote the sharing level and standalone model FDPDDL on MNIST and SVHN datasets using CNN and
accuracy of party j respectively: MLP architectures, under settings of different sharing level
and imbalanced data partition. Similarly, Table 5 lists the
n { Pλ1 , · · · , Pλn } + { Psacc1 , · · · , Psaccn }, Setting 2
x= λj λj saccj saccj fairness results on Adult and Hospital datasets. In particular,
{sacc1 , · · · , saccn }, Setting 1&3 we omit the results for the same sharing level setting, as
(4) fairness is a less concerned problem in this setting. All the
Similarly, we take party rewards (i.e., final model accura- results are averaged over five random trails. As is evidenced
cies of different parties) as the Y-axis, y = {acc1 , · · · , accn }, by the high positive correlation coefficient, with all of them
where accj denotes the final model accuracy of party j . above 0.5, FDPDDL achieves reasonably good fairness, which
As the Y-axis measures local model performance of confirms the intuition behind fairness: the party who is
different parties after collaboration, it is expected to be less private and has more training data delivers higher
positively correlated with the X-axis to deliver good fairness. accuracy. In contrast, as evidenced in Table 3, Table 4, and
Hence, we formally quantify collaborative fairness in Eq. 5: Table 5, the distributed framework exhibits poor fairness
Pn
(xi − x̄)(yi − ȳ) with significantly lower values than that of FDPDDL in all
rxy = i=1 (5) cases, with even negative values in some cases, manifesting
(n − 1)sx sy
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 11

the lack of fairness in the distributed framework. This is offline, as parties are required to share their DPGAN samples
because in the distributed framework, all the participating only once during the first stage of initialisation, it does not
parties can derive similarly well local models no matter how affect the second stage of update, as shown in Fig. 2. For the
much they contribute. update stage, all parties can individually update their local
Learning Accuracy. For accuracy comparison, we imple- models in parallel. Moreover, using DP instead of encryption-
ment FDPDDL using synchronous SGD protocol, and set based technique [38] during the second stage results in less
the sharing level of each party to 0.1 (λj = 0.1). Similarly, communication cost.
for distributed framework, we implement DSSGD without
differential privacy using round robin protocol, and set the
5.7 Malicious Party Detection
upload rate θu to 0.1. It is worth noting that we did not
apply any privacy-preserving techniques to all the other We further demonstrate how our framework can provide
three baseline frameworks in order to assess the impact robustness to two specific malicious parties: “free-riders”
of our FDPDDL on accuracy. For MNIST dataset, Fig. 4 and GAN attacker.
demonstrates that FDPDDL does not sacrifice much model Robustness to “free-riders”. In collaborative system,
utility when compared to the distributed or the centralised free-riders may pretend to be contributing by generating fake
framework, meanwhile it delivers better accuracy than the information to release to the requester. The main incentives
standalone framework. for free-rider to submit fake information may include: (1)
Detailed accuracy comparison over varying participating one party may not have any data to train a local model; (2)
parties (n = {4, 15, 30, 50}) can be found in Table 6 for one party is too concerned about data privacy to release any
MNIST dataset, Table 7 for SVHN dataset, and Table 8 information that may compromise privacy; (3) one party may
for Adult and Hospital datasets. As can be observed, the not want to consume any local computation power to train
best test accuracy is reported by either the centralised any model. As demonstrated in Example. 2.1, it is possible
framework or distributed framework using DSSGD without for a free-rider without any data or model to have access to
differential privacy, while the worst accuracy is given by the the same global model. We simulate two possible strategies
standalone framework (minimum utility, maximum privacy). that such a free-rider party could exploit to achieve its goals.
In contrast, we observe that FDPDDL allows all parties to Release random labels: During the initialisation stage, the
derive higher accuracies than that given by the standalone free-rider can release random labels for the received DPGAN
models trained on their local data alone, under all the samples. The initialisation stage allows each participant
investigated scenarios. This confirms the benefits brought to to evaluate the data quality of other participants before
every party by the collaborative learning in our FDPDDL. collaborative learning starts. If a participant does not have
Meanwhile, our FDPDDL also achieves comparable accuracy reasonable amount of training data to produce a decent
to the centralised framework and distributed framework model, it will perform poorly in the evaluation of DPGAN
using DSSGD without differential privacy, substantiating the samples sent from other parties, thus other parties would as-
competitive effectiveness of our decentralised framework. sign low local credibility to this party to ensure fairness. More
Combining the above fairness results in Table 3 Table 4, specifically, when the publisher receives the random labels
and Table 5, and accuracy results in Table 6, Table 7 and from the free-rider, it will find that these random labels are
m
Table 8, we conclude that FDPDDL achieves both fairness and not consistent with the majority voting, i.e., uij  cth , then
privacy without severely harming accuracy. This proves that our the free-rider will be reported as "malicious". If the majority
FDPDDL is a promising framework for effective, privacy- party report one party as "malicious", then the blockchain will
preserving and more importantly fair collaborative learning. opt this party out in the future communications. Even though
Complexity Analysis. Considering complexity, the main the free-rider might succeed in initialisation somehow, the
communication cost occurs when each party sends its differ- credibility of the free-rider is significantly lower compared
entially private samples or the selected differentially private with the other honest parties, and the other parties will
gradients to the other (n − 1) parties, resulting in (n − 1) ∗ L download less gradients from this free-rider.
cost, where n and L are the number of parties and the Release random or carefully crafted gradients: During the up-
average size of the released samples and gradients. It should date stage, the free-rider may publish meaningless gradients
be noted that parties do not share all their model updates such as random or carefully crafted gradients to pretend that
with other parties, they selectively share model updates as it is contributing, but do not wish to be “caught” in cheating
per download request and their sharing levels, as explained (keep stealthy). However, such meaningless gradients will
in Section 3.1. Therefore, we remark that our FDPDDL further downgrade its local credibilities to all the other
is more relevant to practical applications in horizontally parties during the local credibility and tokens update stage
federated learning (HFL) to businesses (H2B) [37], such as (as described in Section 4.2). Consequently, the free-rider
biomedical or financial institutions where the number of will gradually lose its chance to earn more tokens as more
parties n is not too large, while the collaborative fairness and more parties downgrade its local credibility. The tokens
is a more concerned problem. On the other hand, the main of the free-rider will drain out faster and eventually be
computation cost occurs at each party who needs to train a blocked out from the learning process when its tokens are
local DPGAN and local model to initialise local credibility used up. This can be automatically done by our reputation
and tokens during the first stage, and conduct local training system through digital tokens and local credibility and the
and mutual evaluation of local credibility during the second Blockchain protocol itself.
stage. However, we remark that parties can train their We simulate all the above malicious behaviors of the free-
DPGAN models and generate massive DPGAN samples rider and track the collaboration process among parties. We
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 12

MNIST, MLP, n=4 MNIST, MLP, n=15 MNIST, MLP, n=30 MNIST, MLP, n=50
1 1 1 1

0.8 0.8 0.8 0.8

Accuracy
Accuracy
Accuracy

Accuracy
0.6 0.6 0.6 0.6

FDPDDL(synchronous, j=0.1) FDPDDL(synchronous, j=0.1) FDPDDL(synchronous, j=0.1) FDPDDL(synchronous, j=0.1)


0.4 Distributed(round robin, =0.1) 0.4 Distributed(round robin, =0.1) 0.4 Distributed(round robin, u=0.1) 0.4 Distributed(round robin, =0.1)
u u u
Standalone Standalone Standalone Standalone
Centralised Centralised Centralised Centralised
0.2 0.2 0.2 0.2
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
Epoch Epoch Epoch Epoch

Fig. 4: MLP convergence on MNIST dataset for different frameworks and varying number (n) of parties.

TABLE 6: Maximum accuracy [%] on MNIST under varying party number settings, achieved by Centralised, Distributed
(DSSGD without DP, round robin, θu = 10%), Standalone and FDPDDL (see Section 5.4 for details of the three settings)
frameworks using MLP and CNN architectures. P-k indicates there are k parties participating in the learning process.
MLP CNN
Framework
P4 P15 P30 P50 P4 P15 P30 P50
Centralised 91.68 95.17 96.28 96.85 96.58 98.19 98.52 98.58
Distributed 91.67 95.17 96.33 97.35 96.25 98.04 98.63 98.83
Standalone 87.39 88.06 88.64 88.80 93.81 93.46 94.04 94.05
FDPDDL (same λi , same |Di |) 88.44 92.18 93.50 95.33 94.34 96.89 97.62 97.67
FDPDDL (different λi , same |Di |) 89.84 92.82 93.50 95.37 94.62 96.15 97.78 98.05
FDPDDL (different |Di |, same λi ) 88.92 91.44 92.69 95.02 94.39 96.46 96.92 97.47

TABLE 7: Maximum accuracy [%] on SVHN under varying party number settings using MLP and CNN architectures.
MLP CNN
Framework
P4 P15 P30 P50 P4 P15 P30 P50
Centralised 75.40 83.08 85.77 87.15 90.50 91.88 93.42 95.44
Distributed 78.34 85.49 87.64 89.21 91.78 93.03 95.75 96.19
Standalone 57.85 58.77 57.90 59.18 80.24 80.74 81.29 81.60
FDPDDL (same λi , same |Di |) 67.74 76.55 81.86 84.51 88.07 90.18 90.74 92.83
FDPDDL (different λi , same |Di |) 68.16 76.67 79.25 83.57 88.91 90.15 91.29 93.18
FDPDDL (different |Di |, same λi ) 68.57 74.15 80.37 83.34 89.53 90.03 92.13 93.82

TABLE 8: Maximum accuracy [%] on Adult and Hospital under varying party number settings using MLP.
Adult Hospital
Framework
P4 P15 P30 P50 P4 P15 P30 P50
Centralised 80.69 81.54 82.75 83.43 65.21 69.12 74.50 76.21
Distributed 80.73 81.89 82.81 83.49 65.58 69.50 74.73 77.12
Standalone 78.49 78.50 78.52 78.54 53.51 53.71 53.89 53.95
FDPDDL (same λi , same |Di |) 79.08 80.05 81.16 82.21 63.21 67.35 72.38 74.55
FDPDDL (different λi , same |Di |) 79.15 80.08 81.20 82.28 63.38 67.42 72.29 74.62
FDPDDL (different |Di |, same λi ) 79.21 80.17 81.25 82.39 63.35 67.58 72.41 74.58

notice that in most cases, the free-rider can be detected and Even though the GAN adversary can somehow survive
excluded at the initialisation stage, and no free-riders can the initialisation stage, in the subsequent update stage, the
survive two stages. accumulated local credibility of the adversary is rated even
Robustness to GAN Attacker. We next discuss the lower when it iteratively publishes false and meaningless
robustness of our FDPDDL framework against GAN at- gradients. Eventually, the GAN adversary can be successfully
tacks [13]. As argued in [13], GAN attacks can only succeed detected and isolated by our FDPDDL when it is agreed as
if the class distributions of the adversary and the victim "malicious" through the blockchain consensus.
party are Non-IID. Therefore, following the same setting Recall that for GAN attack, the adversary needs to learn
as in GAN attack [13] on MNIST dataset, we assume the an extra GAN network during the collaborative learning
victim parties own local data of class {0, 1, 2, 3, 4} and the process and this requires expensive computation. This in-
adversary has data of class {5, 6, 7, 8, 9}, that is, Non-IID evitably results in suspicious longer training time than the
class distribution between the adversary and victim parties. honest parties. Therefore, the response time characteristic
We confirmed empirically that our FDPDDL framework can can be further incorporated into our credibility mechanism
successfully detect and isolate such kind of adversary. In to greatly reduce the chance of privacy leakage. If one party
particular, the initial local credibility of the adversary should does not respond within a reasonable amount of time, other
be rated quite low by most parties during the initialisation parties should anyway assume that this party is down and
m
stage, i.e., uij  cth . Therefore, the GAN adversary can discard its submission at the current round. Moreover, even
be detected and excluded mostly at the initialisation stage. if we assume that the malicious party somehow manages to
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 13

circumvent the above mechanism, there is only little harm incentive for more parties to collaborate. Our proposed
because the subsequent gradients submitted by the malicious Fair and Differentially Private Decentralised Deep Learning
party will not be rated highly by the honest parties, and its (FDPDDL) framework demonstrates the following properties:
local credibility will get progressively reduced. (1) it inherently solves the single-point-of-failure problem
existing in all server-based frameworks; (2) it achieves high
6 D ISCUSSION fairness by creating a reputation system through digital
tokens and local credibility, which considers the relative
Attacking Fairness in FDPDDL. Here, we discuss some
contributions of all the parties through two novel algorithms:
possible strategies that can be exploited by a free-rider or
local credibility and tokens initialisation, and local credibility
a party with very little data to deceive other parties and
and tokens update; (3) it provides a viable solution to
gain unfairly from our FDPDDL. For a free-rider owning
detect even isolate the "non-credible" party both before
no local data, it can manually label the received DPGAN
the collaborative learning process starts, and during the
examples as its local data and publish the labels to cheat the
collaborative learning process, in this way, our scheme
initialisation stage and the subsequent update stage. But this
provides robustness to the malicious "free-riders" and GAN
can be extremely expensive and practically unachievable for
attacker; (4) Differentially Private GAN and Differentially
more complex collaborative learning tasks.
Private SGD are used to guarantee local privacy of each
Similarly, for a malicious party having very little local
party. The experimental results on benchmark datasets in
data, it can make use of the received DPGAN examples to
three realistic settings demonstrate that FDPDDL framework
first train a good representation extractor via unsupervised
consistently outperforms the standalone framework and
representation learning (e.g., autoencoder), then build a
achieves comparable accuracy to the centralised framework
good local classification model on top of the representation
and distributed framework without differential privacy,
extractor using its labelled local data. Implementing this may
confirming the applicability of FDPDDL. We believe our
improve the local model quality of this party and increase
findings could be inspiring for the follow-up research in the
its credibility, thus seems to increase the risk of privacy
decentralized learning, especially we initiate a new field of
leakage towards this party. However, we remark that the
collaborative fairness in such an environment. For future
privacy of honest parties in FDPDDL should not be affected
work, we would like to consider more advanced model
under such malicious behaviors, as secured by our majority
architectures, different attacks in distributed/decentralised
voting mechanism used in the initialisation stage. That is,
learning and Non-IID data.
local improvement of a malicious party does not guarantee
its high credibility as long as it conforms with the majority
honest parties, otherwise even decreases its credibility. ACKNOWLEDGMENT
In practice, there may also exist honesty challenge
imposed by a group of malicious parties. For instance, This work is partially supported by Faculty of Information
some malicious parties might collude with each other to Technology, Monash University; and an IBM PhD Fellowship.
downgrade the credibility of an honest party and to block
it out from the learning process. Or they can upgrade
each other’s credibilities. However, as far as honesty is
R EFERENCES
concerned, FDPDDL can always be able to detect such [1] L. Ohno-Machado, V. Bafna, A. A. Boxwala, B. E. Chapman, W. W.
malicious behaviors as long as a majority of the participants Chapman, K. Chaudhuri, M. E. Day, C. Farcas, N. D. Heintzman,
X. Jiang et al., “idash: integrating data for analysis, anonymization,
are honest. and sharing,” Journal of the American Medical Informatics Association,
Collaborative Learning and GANs. We apply data vol. 19, no. 2, pp. 196–201, 2012.
augmentation to expand local data size to help DPGAN [2] T. McConaghy, R. Marques, A. Müller, D. De Jonghe, T. Mc-
generate reliable samples for local credibility initialisation, Conaghy, G. McMullen, R. Henderson, S. Bellemare, and
A. Granzotto, “Bigchaindb: a scalable blockchain database,” white
and facilitate the implementation of DPSGD in collaborative paper, BigChainDB, 2016.
learning, as larger amount of local data allows for more [3] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,”
iterations in training a DPGAN or a differentially private in Proceedings of the 22nd ACM SIGSAC Conference on Computer and
Communications Security, 2015, pp. 1310–1321.
local model within a moderate privacy budget. One natural
[4] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A.
question is that, if we can generate infinite examples using y Arcas, “Communication-efficient learning of deep networks from
GANs, why do we still need collaborative learning? This decentralized data,” in Artificial Intelligence and Statistics, 2017, pp.
is because GANs can only learn the local data distribution, 1273–1282.
[5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,
which means the examples generated by GANs are restricted S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggre-
to local data distribution, while collaborative learning is gation for privacy-preserving machine learning,” in Proceedings of
specially designed to break such local restrictions through the 2017 ACM SIGSAC Conference on Computer and Communications
benefiting from global collaboration [3]. Note that using Security, 2017, pp. 1175–1191.
[6] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning
DPGAN and DPSGD in FDPDDL instead of the standard differentially private recurrent language models,” in Proceedings of
GAN and SGD not only preserves the training data privacy, the 5th International Conference on Learning Representations, 2018.
but also preserves the privacy of the augmented data [39]. [7] L.-M. T-T Kuo, C-N Hsu, “Modelchain: Decentralized privacy-
preserving healthcare predictive modeling framework on private
blockchain networks,” in ONC/NIST Blockchain in Healthcare and
7 C ONCLUSION AND F UTURE W ORK Research Workshop, Gaithersburg, MD, September 26-7, 2016.
[8] X. Zhu, H. Li, and Y. Yu, “Blockchain-based privacy preserving
This work is the first step in bringing fairness and privacy deep learning,” in International Conference on Information Security
to democratise and protect AI, hence providing better and Cryptology. Springer, 2018, pp. 370–383.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING 14

[9] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on-device [34] P. W. Koh and P. Liang, “Understanding black-box predictions
federated learning,” IEEE Communications Letters, 2019. via influence functions,” in Proceedings of the 34th International
[10] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, “Incentive Conference on Machine Learning-Volume 70, 2017, pp. 1885–1894.
mechanism for reliable federated learning: A joint optimization [35] W. Zhang, S. Gupta, X. Lian, and J. Liu, “Staleness-aware async-sgd
approach to combining reputation and contract theory,” IEEE for distributed deep learning,” in Proceedings of the Twenty-Fifth
Internet of Things Journal, 2019. International Joint Conference on Artificial Intelligence, 2016, pp. 2350–
[11] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy- 2356.
preserving machine learning,” in Security and Privacy (SP), 2017 [36] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, “Revisiting
IEEE Symposium on. IEEE, 2017, pp. 19–38. distributed synchronous sgd,” arXiv preprint:1604.00981, 2016.
[12] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., “Privacy-preserving [37] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A
deep learning via additively homomorphic encryption,” IEEE survey,” arXiv preprint arXiv:2003.02133, 2020.
Transactions on Information Forensics and Security, vol. 13, no. 5, [38] L. Lyu, J. Yu, K. Nandakumar, Y. Li, X. Ma, J. Jin, H. Yu, and K. S.
pp. 1333–1345, 2018. Ng, “Towards fair and privacy-preserving federated deep models,”
[13] B. Hitaj, G. Ateniese, and F. Pérez-Cruz, “Deep models under the IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 11,
gan: information leakage from collaborative deep learning,” in pp. 2524–2541, 2020.
Proceedings of the 2017 ACM SIGSAC Conference on Computer and [39] L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, “Differentially private
Communications Security. ACM, 2017, pp. 603–618. generative adversarial network,” arXiv preprint:1802.06739, 2018.
[14] R. Cummings, V. Gupta, D. Kimpara, and J. Morgenstern, “On the
compatibility of privacy and fairness,” 2019.
[15] M. Jagielski, M. Kearns, J. Mao, A. Oprea, A. Roth, S. Sharifi-
Malvajerdi, and J. Ullman, “Differentially private fair learning,” in Lingjuan Lyu (IEEE M’18) is currently a Re-
Proceedings of the 36th International Conference on Machine Learning, search Fellow with The Department of Computer
2019, pp. 3000–3008. Science, NUS. She received Ph.D. degree from
[16] L. Lyu, X. He, Y. W. Law, and M. Palaniswami, “Privacy-preserving the University of Melbourne. Her current research
collaborative deep learning with application to human activity interests span machine learning, privacy, fairness,
recognition,” in Proceedings of the 2017 ACM Conference on Informa- and edge intelligence.
tion and Knowledge Management. ACM, 2017, pp. 1219–1228.
[17] C. Dwork, A. Roth et al., “The algorithmic foundations of differen-
tial privacy.” Foundations and Trends in Theoretical Computer Science,
vol. 9, no. 3-4, pp. 211–407, 2014. Yitong Li is currently a Ph.D student in School
[18] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, of Computing and Information Systems, the Uni-
K. Talwar, and L. Zhang, “Deep learning with differential privacy,” versity of Melbourne. He received B.S. degree
in Proceedings of the 2016 ACM SIGSAC Conference on Computer and from Shanghai Jiao Tong University. His research
Communications Security. ACM, 2016, pp. 308–318. interests cover privacy and adversarial learning
[19] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” with NLP applications.
Manubot, Tech. Rep., 2019.
[20] G. Wood, “Ethereum: A secure decentralised generalised trans-
action ledger,” Ethereum Project Yellow Paper, vol. 151, pp. 1–32,
2014.
[21] “IBM Blockchain: Hyperledger Fabric,” https://www.ibm.com/ Karthik Nandakumar (IEEE SM’02) is a Re-
blockchain/hyperledger, 2017. search Staff Member at IBM Research, Sin-
[22] “Ethereum,” https://www.ethereum.org/, 2017. gapore. Prior to joining IBM in 2014, he was
[23] X. Zhang, S. Ji, and T. Wang, “Differentially private releasing via a Scientist at Institute for Infocomm Research,
deep generative model,” arXiv preprint arXiv:1801.01594, 2018. A*STAR, Singapore for more than six years. He
[24] C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren, “Ganobfus- received his B.E. degree (2002) from Anna Uni-
cator: Mitigating information leakage under gan via differential versity, Chennai, India, M.S. degrees in Com-
privacy,” IEEE Transactions on Information Forensics and Security, puter Science (2005) and Statistics (2007), and
vol. 14, no. 9, pp. 2358–2371, 2019. Ph.D. degree in Computer Science (2008) from
[25] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative Michigan State University, and M.Sc. degree in
adversarial networks,” in Proceedings of the 34th International Management of Technology (2012) from National
Conference on Machine Learning, 2017, p. 214–223. University of Singapore. His research interests include computer vision,
[26] B. K. Beaulieu-Jones, W. Yuan, S. G. Finlayson, and Z. S. Wu, statistical pattern recognition, biometric authentication, image processing,
“Privacy-preserving distributed deep learning for clinical data,” machine learning and blockchain.
arXiv preprint arXiv:1812.01484, 2018.
Jiangshan Yu received the Ph.D. degree from
[27] A. Beimel, S. P. Kasiviswanathan, and K. Nissim, “Bounds on the the University of Birmingham (UK) in 2016.
sample complexity for private learning and private data release,” He is currently Associate Director (Research)
in Theory of Cryptography Conference. Springer, 2010, pp. 437–454. at Monash Blockchain Technology Centre at
[28] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differen- Monash University, Australia. Previously, he was
tial privacy,” in Foundations of Computer Science (FOCS), 2010 51st a research associate at SnT, University of Luxem-
Annual IEEE Symposium on. IEEE, 2010, pp. 51–60. bourg (LU). The focus of his research has been
[29] R. Bassily, K. Nissim, A. Smith, T. Steinke, U. Stemmer, and on design and analysis of cryptographic proto-
J. Ullman, “Algorithmic stability for adaptive data analysis,” in cols, cryptographic key management, blockchain
Proceedings of the forty-eighth annual ACM symposium on Theory of consensus, and ledger-based applications.
Computing. ACM, 2016, pp. 1046–1059.
[30] T. Zhang and Q. Zhu, “Dynamic differential privacy for admm- Xingjun Ma is a lecturer in School of Informa-
based distributed classification learning,” IEEE Transactions on tion Technology, Deakin University, and also an
Information Forensics and Security, vol. 12, no. 1, pp. 172–187, 2016. honorary fellow in School of Computing and In-
[31] Z. Huang, R. Hu, Y. Guo, E. Chan-Tin, and Y. Gong, “Dp-admm: formation Systems, The University of Melbourne.
Admm-based distributed learning with differential privacy,” IEEE He received Ph.D., M.E., B.E. degrees from the
Transactions on Information Forensics and Security, vol. 15, pp. 1002– University of Melbourne, Tsinghua University
1012, 2019. and Jilin University respectively. He works in the
[32] H. Kim, S.-H. Kim, J. Y. Hwang, and C. Seo, “Efficient privacy- areas of adversarial machine learning and robust
preserving machine learning for blockchain network,” IEEE Access, optimisation.
vol. 7, pp. 136 481–136 495, 2019.
[33] L. Lyu, J. C. Bezdek, X. He, and J. Jin, “Fog-embedded deep learning
for the internet of things,” IEEE Transactions on Industrial Informatics,
vol. 15, no. 7, pp. 4206–4215, 2019.

You might also like