0% found this document useful (0 votes)
28 views

1 Fu

Uploaded by

KHAOULA ELBEDOUI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

1 Fu

Uploaded by

KHAOULA ELBEDOUI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Federated Unlearning

Gaoyang Liu1 , Xiaoqiang Ma1 , Yang Yang2 , Chen Wang1 , Jiangchuan Liu3
1
Huazhong University of Science and Technology, Wuhan, China
2
Hubei University, Wuhan, China
3
Simon Fraser University, British Columbia, Canada
1
{liugaoyang, maxiaoqiang, chenwang}@hust.edu.cn, 2 [email protected], 3 [email protected]

Abstract—Federated learning (FL) has recently emerged as a (GDPR) in the European Union [6] and the California Con-
promising distributed machine learning (ML) paradigm. Prac- sumer Privacy Act (CCPA) in the United States [7]. The “right
tical needs of the “right to be forgotten” and countering data to be forgotten” stipulates and sometimes legally enforces that
poisoning attacks call for efficient techniques that can remove,
individuals can request at any time to have their personal data
arXiv:2012.13891v3 [cs.LG] 6 May 2021

or unlearn, specific training data from the trained FL model.


Existing unlearning techniques in the context of ML, however, are cease to be used by a particular entity storing it.
no longer in effect for FL, mainly due to the inherent distinction Beyond the “right to be forgotten”, data removal from FL
in the way how FL and ML learn from data. Therefore, how models is also beneficial when certain training data becomes
to enable efficient data removal from FL models remains largely no longer invalid, which is especially of common occurrence
under-explored. In this paper, we take the first step to fill this gap
by presenting FedEraser1 , the first federated unlearning method- due to the distributed learning methodology and inherently
ology that can eliminate the influence of a federated client’s data heterogeneous data distribution across parties. Considering in
on the global FL model while significantly reducing the time FL some training data are polluted or manipulated by data
used for constructing the unlearned FL model. The basic idea of poisoning attacks [8], [9], [10], or outdated over time, or
FedEraser is to trade the central server’s storage for unlearned even identified to be mistakes after training. The ability to
model’s construction time, where FedEraser reconstructs the
unlearned model by leveraging the historical parameter updates completely forget such data and its lineage can greatly improve
of federated clients that have been retained at the central server the security, responsiveness and reliability of the FL systems.
during the training process of FL. A novel calibration method These practical needs call for efficient techniques that enable
is further developed to calibrate the retained updates, which are FL models to unlearn, or to forget what has been learned
further used to promptly construct the unlearned model, yielding from the data to be removed (referred to as the target data).
a significant speed-up to the reconstruction of the unlearned
model while maintaining the model efficacy. Experiments on Directly delete the target data is proved to be unserviceable, as
four realistic datasets demonstrate the effectiveness of FedEraser, the trained FL model has potentially memorized the training
with an expected speed-up of 4× compared with retraining data [11], [12], [13]. A naive way to satisfy the requested
from the scratch. We envision our work as an early step in FL removal would be to simply retrain the model from scratch on
towards compliance with legal and ethical criteria in a fair and the remaining data after removing the target one(s). For many
transparent manner.
Index Terms—Federated learning, machine unlearning, data
applications, however, the costs (in time, computation, energy,
removal, parameter calibration. etc.) can be prohibitively expensive, especially when several
rounds of alternations between training and aggregating among
multiple participators are involved in FL settings.
I. I NTRODUCTION
Existing studies on unlearning in the context of ML (a.k.a.
As a distributed machine learning (ML) framework, fed- machine unlearning) [14], [15], [16], [17] cannot eliminate
erated learning (FL) has been recently proposed to address the influence of the target data on the global FL model
the problem of training ML models without direct access to either, mainly due to the inherent distinction in the way how
diverse training data, especially for privacy-sensitive tasks [1], FL and ML learn from data. In particular, FL employs the
[2]. FL allows multiple clients to jointly train a shared ML iterative training involving multiple rounds of training, where
model by sending locally learned model parameter updates the clients’ initial model for each round of training is obtained
instead of their data to the central server. With the distributed from the parameter updates in the previous round, and thus
nature of such a computing paradigm, clients can thus benefit contains the information of all clients including the target one.
from obtaining a well-trained aggregated ML model while Such a forward coupling of information in parameter updates
keeping their data in their own hands [3], [4], [5]. leads to fundamental challenges on the machine unlearning
Given an FL model jointly trained by a group of clients, techniques, which are designed for the (arguably one round)
there are many settings where we would like to remove specific centralized ML framework, yielding them no longer in effect
training data from the trained model. One example of the need for FL. Therefore, how to efficiently forget the target data from
is the “right to be forgotten” requirements enacted by recent FL models remains largely under-explored.
legislations such as the General Data Protection Regulation In this paper, we take the first step to fill this gap by pre-
senting FedEraser, an efficient federated unlearning methodol-
1 The code of FedEraser has been publicly released at ogy that can eliminate the influences of a federated client’s
https://www.dropbox.com/s/1lhx962axovbbom/FedEraser-Code.zip?dl=0 data on the global model while significantly reducing the
unlearning time. The basic idea of FedEraser is to trade the in which datasets coming from different domains share the
central server’s storage for unlearned model’s construction same sample space but differ in feature space. We focus on
time, where FedEraser reconstructs the unlearned model by horizontal FL in this paper.
leveraging the historical parameter updates of clients that have
been retained at the central server during the training process B. Architecture of FL
of FL. Since the retained updates are derived from the global In a typical architecture of FL, there are K federated
model which contains the influence of the target client’s data, clients with the same data structure and feature space that
these updates have to be calibrated for information decoupling collaboratively train a unified DL model with the coordination
before using them for unlearning. Motivated by the fact that of a central server. The central server organizes the model
the client updates indicate in which direction the parameters training process, by repeating the following steps until the
of the global model need to be changed to fit the model to training is stopped. At the ith training round (i ∈ E, where
the training data [18], we further calibrate the retained client E is the number of training rounds), the updating process of
updates through performing only a few rounds of calibration the global model M is performed as follows:
training to approximate the direction of the updates without Step 1. Federated clients download the current global
the target client, and the unlearned model can be constructed model Mi and the training setting from the central server.
promptly using the calibrated updates. Step 2. Each client Ck (k ∈ {1, 2, · · · , K}) trains the
We summarize our major contributions as follows: downloaded model Mi on its local data Dk for Elocal rounds
• We frame the problem of federated unlearning, and based on the training setting, and then computes an update Uki
present FedEraser, the first efficient unlearning algorithm with respect to Mi .
in FL that enables the global model to “forget” the target Step 3. The central server collects all the updates U i =
client’s data. FedEraser is non-intrusive and can serve as {U1i , U2i , · · · , UK
i
} from the K clients.
an opt-in component inside existing FL systems. Step 4. The central server updates the global model on the
• We develop novel storage-and-calibration techniques to basis of the aggregation of the collected updates U i , thereby
tackle the forward coupling of information in parameter obtaining an updated model Mi+1 that will play the role as
updates, which can provide a significant speed-up to the the global model for the next training round.
reconstruction of the unlearned model while maintaining When the termination criterion has been satisfied, the central
the model efficacy. server will stop the above iterative training process and get the
• We propose a new indicator by using the layer parame- final FL model M.
ter’s deviation between the unlearned model and the re-
trained model, to measure the effectiveness of FedEraser III. D ESIGN OF F ED E RASER
on the global model. A. Overview
• We evaluate the performance of FedEraser on four re-
alistic datasets, and compare it with two baselines. The In order to efficiently eliminate the influence of the target
results demonstrate that FedEraser can remove the influ- client’s data from the trained global model, we add one extra
ence of the target client’s data, with an expected speed-up function to the central server for FedEraser in the current
of 4× compared with retraining from the scratch. architecture of FL, while the original functions of the central
server remaining unchanged. Specifically, during the training
II. P RELIMINARY process of the global model, the central server retains the
updates of the clients, at intervals of regular rounds, as well
A. Federated Learning as the index of corresponding round, so as to further calibrate
FL is proposed recently by Google [19], [20] as a promising the retained updates to reconstruct the unlearned global model,
solution that can train a unified deep learning (DL) model rather than retraining from scratch.
across multiple decentralized clients holding local data sam- For clarity, we denote the round intervals as ∆t, and the
t
ples, under the coordination of a central server. In FL, each retained updates of the client Ck as Ukj (j ∈ {1, 2, · · · , T },
client’s data is stored on its local storage and not transferred to where t1 = 1, tj+1 = tj + ∆t, T is the number of
E
other clients or the central server; only locally learned model retaining rounds that equals to b ∆t c, and b·c is the floor
parameter updates are exposed to the central server for ag- function). Thus, the whole retained updates can be denoted as
t t t
gregation to construct the unified global model. FL embodies U = {U t1 , U t2 , · · · , U tT }, where U tj = {U1j , U2j , · · · UKj }.
the principles of focused collection and data minimization, and Given the retained updates U and the target client Cku
can mitigate many of the systemic privacy risks resulting from whose data is required to be removed from the FL model, Fed-
traditional, centralized DL approaches. Eraser mainly involves the following four steps: (1) calibration
Although there are different forms of FL, most existing training, (2) update calibrating, (3) calibrated update aggre-
works mainly focus on the horizontal FL, or sample-based gating, and (4) unlearned model updating. The first step is per-
FL, in which case datasets of multiple clients share the same formed on the calibrating clients Ckc (kc ∈ [1, 2, · · · , K] \ ku ,
feature space but different space in samples. On the contrary, i.e., the federated clients excluding the target one), while the
vertical FL, or feature-based FL, is applicable to the scenarios rest steps are executed on the central server (c.f. Algorithm 1).
B. Design Details Algorithm 1 FedEraser
1) Calibration Training: Specifically, at the tj th training Require: Initial global model M1 ; retained client updates U
round, we let the calibrating clients run Ecali rounds of local Require: Target client index ku
training with respect to the calibrated global model M ftj that Require: Number of global calibration round T
is obtained by FedEraser in the previous calibration round. Require: Number of local calibration training epoch Ecali
It should be noticed that FedEraser can directly update the Central server executes:
global model without calibration of the remaining clients’ for each round Rtj , j ∈ {1, 2, · · · , T } do
parameters at the first reconstruction epoch. The reason for for each client Ckc , kc ∈ {1, 2, · · · , K}\ku in parallel
this operation is that the initial model of the standard FL has do
Ub tj ← CaliTrain(Ck , M ftj , Ecali )
not been trained by the target client and thus this model does kc c kc
b tj
U
not contain the influence brought by the target client. e tj ← |U tj |
U kc
t . Update Calibrating
kc kc b j ||
||U
After the calibration training, each calibrating client Ckc kc

calculates the current update U b tj and sends it to the central end


kc
Uetj ← (K−1)1P wk
P e tj
server for update calibrating. c
kc wkc Ukc . Update Aggregating
2) Update Calibrating: After the calibration training, the Mftj+1 ← M
ftj + Uetj . Model Updating
central server can get each client’s current update U b tj with
kc end
respect to the calibrated global model M ctj . Then FedEraser
leverages U b tj to calibrate the retained update U tj . In Fed- CaliTrain(Ckc , M
t
f j , Ecali ): // Run on client Ck
kc kc kc c
Eraser, the norm of Uktc indicates how much the parameters for each local training round j from 1 to Ecali do
of the global model needs to be changed, while the normalized ftj |j+1 ← T rain(M
M ftj |j , Dk )
kc kc c
Ub tj indicates in which direction the parameters of M ftj should
kc end
tj
be updated. Therefore, the calibration of Ukc can be simply b tj ← Calculating Update(M
U ftj |E , Mftj |1 )
kc kc cali kc
expressed as: b tj to the central server
b tj return Ukc
U
Ue tj = |U tj | kc (1)
kc kc b tj ||
||Ukc

3) Calibrated Update Aggregating: Given the calibrated and the original model M on other clients’ data to compare
client updates U e tj |kc ∈ [1, 2, · · · , K]\ku }, FedEraser
e tj = {U their performance).
kc
next aggregates these updates for unlearned model updating. In It is worth noting that FedEraser does not require far-
particular, FedEraser directly calculates the weighted average reaching modifications of neither the existing architecture of
of the calibrated updates as follows: FL nor the training process on the federated clients, yielding
1 it very easy to be deployed in existing FL systems. In
e tj
X
Uetj = P wkc Ukc (2) particular, the process of calibration training executed on the
(K − 1) kcw kc
kc federated clients can directly reuse the corresponding training
where wkc is the weight for the calibrating client obtained process in the standard FL framework. The aggregating and
N c updating operations in FedEraser do not need to modify the
from the standard architecture of FL, and wkc = P kN kc
kc
where Nkc is the number of records the client Ckc has. It is existing architecture of FL, while only the additional retaining
worth noting that this aggregation operation is consistent with functionality is required at the central server side. In addition,
the standard FL. FedEraser can be performed unwittingly, as it does not involve
4) Unlearned Model Updating: With the aggregation of the any information about the target client, including his/her
calibrated updates, FedEraser can thus renovate the global FL updates and local data, during the unlearning process.
model as:
ftj+1 = M ftj + Uetj C. Time Consumption Analysis
M (3)
One crucial feature of FedEraser is that it can speed up the
where M ftj (resp. M ftj+1 ) is the current global model (resp. reconstruction of the unlearned model, compared with retrain-
updated global model) calibrated by FedEraser. ing the model from scratch. Thus, we provide an elementary
The central server and the calibrating clients collaboratively analysis of the speed-up significance of FedEraser here. For
repeat the above process, until the original updates U have ease of presentation, we use the time consumption required
all been calibrated and then updated to the global model M. f for retraining from scratch as the baseline.
Finally, FedEraser gets the unlearned global model M f that In FedEraser, there are two settings that can speed up the
has removed the influence of the client Cku ’s data. reconstruction of the unlearned model. First, we modify the
Once the unlearned global model M f is obtained, the standard FL to retain the client updates at intervals of regular
standard deployment process of the deep learning model can rounds. We use a hyper-parameter ∆t to control the size of the
be performed, including manual quality assurance, live A/B retaining interval. Since FedEraser only processes on retained
testing (by using the unlearned model M f on some clients’ data updates, the larger ∆t is, the less retaining rounds are involved,
and the less reconstruction time FedEraser would require. This TABLE I
setting could provide FedEraser with a speed-up of ∆t times. T HE ARCHITECTURES OF FEDERATED MODELS .
Second, FedEraser only requires the calibrating client perform
Dataset Model Architecture
a few rounds of local training in order to calibrate the retained
updates. Specifically, the round number of the calibration Adult 2 FC layers
training is controlled by the calibration ratio r = Ecali /Eloc . Purchase 3 FC layers
This setting can directly reduce the time consumed by training MNIST 2 Conv. and 2 FC layers
on the client, and provide FedEraser with a speed-up of r−1
CIFAR-10 2 Conv., 2 Pool., and 2 FC layers
times. Overall, FedEraser can reduce the time consumption by
r−1 ∆t times compared with retraining from scratch.
In our experiments, we empirically find that when r = 0.5
and ∆t = 2, FedEraser can achieve a trade-off between the 2) Global Models: In the paradigm of FL and Federated
performance of the unlearned model and the time consumption Unlearning, the global model will be broadcasted to all clients
of model reconstruction (detailed in the following section). In and serve as the initial model for each client’s training process.
such a case, FedEraser can achieve an expected speed-up of We make use of 4 global models with different structures for
4× compared with retraining from the scratch. different classification tasks. The details of these models are
summarized in Table I, where FC layer means fully connected
IV. P ERFORMANCE E VALUATION layer in the deep neural network (DNN) models, and Conv.
In this section, we evaluate the performance of FedEraser (resp. Pool.) layer represents convolutional (resp. maxpooling)
on different datasets and models. Besides, we launch member- layer in the convolutional neural network (CNN) models.
ship inference attacks (MIAs) against FedEraser to verify its 3) Evaluation Metrics: We evaluate the performance of
unlearning effectiveness from a privacy perspective. FedEraser using standard metrics in the ML field, including
A. Experimental Setup the accuracy and the loss. We also measure the unlearning
time consumed by FedEraser to make a given global model
1) Datasets: We utilize four datasets in our experiments,
forget one of the clients.
including UCI Adult2 , Purchase3 , MNIST4 , and CIFAR-105 .
Furthermore, in order to assess whether or not the unlearned
Adult (Census Income). This dataset includes 48, 842
model still contains the information about the target client,
records with 14 attributes such as age, gender, education,
we adopt the following three extra metric. One metric is the
marital status, occupation, working hours, and native country.
prediction difference, denoted as the L2 norm of prediction
The classification task of this dataset is to predict if a person
probability difference, between the original global model and
earns over $50K a year based on the census attributes.
the unlearned model:
Purchase. Purchase dataset is obtained from Kaggle’s
N
“acquire valued shoppers” challenge whose purpose is to 1 X
design accurate coupon promotion strategies. Purchase dataset Pdiss = ||M(xi ) − M(x
f i )||2 xi ∈ Dku (4)
N i=1
contains shopping histories of several thousand shoppers over
one year, including many fields such as product name, store where N is the number of the target client’s samples Dku .
chain, quantity, and date of purchase. In particular, Purchase M(xi ) (resp. M(x f i )) is the prediction probability of the
dataset (with 197, 324 records) does not contain any class sample xi obtained from the original (resp. unlearned) model.
labels. Following [16], [21], [22], we adopt an unsupervised The rest two metrics are obtained from the MIAs that we
clustering algorithm to assign each data record with a class perform against the unlearned global model. The goal of MIAs
label. We cluster the records in Purchase dataset into 2 classes. is to determine whether a given data was used to train a given
MNIST. This is a dataset of 70, 000 handwritten digits ML model. Therefore, the performance of MIAs can measure
formatted as 32 × 32 images and normalized so that the digits the information that still remains in the unlearned global
are located at the central of the image. It includes sample model. We utilize the attack precision of the MIAs against
images of handwritten digits from 0 to 9. Each pixel within the target data as one metric, which presents the proportion of
the image is represented by 0 or 1. target client’s data that are predicted to have been participated
CIFAR-10. CIFAR-10 is a benchmark dataset used to in the training of the global model. We also use the attack
evaluate image recognition algorithms. This dataset consists of recall of the MIAs, which presents the fraction of the data of
60, 000 color images of size 32 × 32 and has 10 classes such the target client that we can correctly infer as a part of the
as “air plane”, “dogs”, “cats”, and etc. Particularly, CIFAR-10 training dataset. In other words, attack precision and attack
is a balanced dataset with 6, 000 randomly selected images for recall measure the privacy leakage level of the target client.
each class. Within CIFAR-10 dataset, there are 50, 000 training 4) Comparison Methods: In our experiments, we compare
images and 10, 000 testing images. FedEraser with two different methods: Federated Retrain (Fe-
2 https://archive.ics.uci.edu/ml/datasets/Adult
dRetrain) and Federated Accumulating (FedAccum).
3 https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data FedRetrain: a simple method for unlearning by retraining
4 http://yann.lecun.com/exdb/mnist this model from scratch on the remaining data after removing
5 http://www.cs.toronto.edu/∼kriz/cifar.html the target client’s data, which will serve as the baseline in
(a) Prediction Accuracy (Testing Data) (b) Prediction Accuracy (Target Data)

Fig. 1. Prediction performance comparison.

our experiments. Empirically, FedRetrain provides an upper 1) Performance on Testing Data: Fig. 1(a) shows the
bound on the prediction performance of the unlearned model prediction accuracy of FedEraser and three comparisons on
reconstructed by FedEraser. the testing data and the target data. From the results we can
FedAccum: a simple method for unlearning by directly see that FedEraser performs closely as FedRetrain (baseline)
accumulating the previous retained updates of the calibrating on all datasets, with an average difference of only 0.61%.
clients’ local model parameters, and leveraging the accumu- Especially, for Adult dataset, FedEraser achieves a prediction
lated updates to update the global model. The update process accuracy of 0.853, which is higher than that of FedAccum
can be expressed as follows: by 5.76% and lower than that of FedRetrain by 0.8%. On
tj+1 tj tj
MNIST dataset, the performance of FedEraser achieves an
M
faccum =M
faccum + Ueaccum (5) accuracy of 0.986, which only has 0.52% difference from that
tj of FedRetrain. For Purchase dataset, FedEraser can achieve a
where Ueaccum is the accumulation of P
the model updates at
tj tj testing accuracy of 0.943. FedAccum and FedRetrain achieve a
tj th round, and Ueaccum = (K−1)1P wk kc wkc Ukc . Besides,
t j t j+1
c
mean testing accuracy of 0.913 and 0.949, respectively. As for
Maccum (resp. Maccum ) represents the global model before
tj CIFAR-10 dataset, FedEraser gets a testing accuracy of 0.562,
(resp. after) updating with Ueaccum . The main difference be-
which is lower than that of FedRetrain by 0.52%. In such a
tween FedEraser and FedAccum is that the latter does not
case, FedAccum only achieves a testing accuracy of 0.408.
calibrate the clients’ updates.
Overall, FedEraser can achieve a prediction performance close
In order to evaluate the utility of the unlearned global model, to FedAvg and FedRetrain, but better than that of FedAccum,
we also compare FedEraser with the classical FL without indicating high utility of the obtained unlearned model.
unlearning. We employ the most widely used FL algorithm, Table II shows the time consumption of FedEraser and
federated averaging (FedAvg) [1], to construct the global the comparison methods in constructing the global models.
model. FedAvg executes the training procedure in parallel on According to the results, it is obvious that FedRetrain takes the
all federated clients and then exchanges the updated model same order of magnitude of the time as FedAvg to reconstruct
weights. The updated weights obtained from every client are the global model. On the contrary, FedEraser can signifi-
averaged to update the global model. cantly speed up the removal procedure of the global model,
5) Experiment Environment: In our experiments, we use improving the time consumption by 3.4× for Adult dataset.
a workstation equipped with an Intel Core i7 9400 CPU As for MNIST and Purchase datasets, FedEraser reduces the
and NVIDIA GeForce GTX 2070 GPU for training the deep reconstruction time by 4.8× and 4.1×, respectively. Besides,
learning models. We use Pytorch 1.4.0 as the deep learning FedEraser also provides a speed-up of 4.0× in reconstructing
framework with CUDA 10.1 and Python 3.7.3. the global model for complex classification tasks. As for
We set the number of clients to 20, the calibration ratio FedAccum, since it only aggravates the retained parameters
r = Ecali /Eloc = 0.5, and the retaining interval ∆t = 2. of the calibrating client’s models in every global epoch and
As for other training hyper-parameters, such as learning rate, updates the global model with the aggravations, it does not
training epochs, and batch size, we use the same settings to involve the training process on the clients. Consequently,
execute our algorithm and the comparison methods. FedAccum could significantly reduce the time consumption of
model reconstruction, but at the cost of the prediction accuracy.
B. Performance of FedEraser 2) Performance on Target Data: In addition, we compare
the prediction performance of FedEraser and the comparison
In this section, we evaluate the performance of FedEraser methods on the target client’s data. The experiment results are
from two perspectives: model utility and client privacy. We shown in Fig. 1(b). For the target data, FedEraser achieve a
have to emphasize here that there is no overlap between the mean prediction accuracy of 0.831 over all datasets, which
testing data and the target data. is close to that of FedRetrain but much lower than that
TABLE II
T IME CONSUMPTION OF FEDERATED MODEL CONSTRUCTION .

Time Consumption (second)


Method
Adult MNIST Purchase CIFAR-10
2 3 2
FedAvg 1.74 × 10 1.16 × 10 3.87 × 10 3.38 × 103
FedRetrain 1.69 × 102 1.08 × 103 3.75 × 102 3.01 × 103
−2 −1 −1
FedAccum 2.14 × 10 6.2 × 10 1.56 × 10 1.29 × 10−1
FedEraser 4.97 × 101 2.23 × 102 9.21 × 101 7.51 × 102

TABLE III
P REDICTION LOSS ON THE TARGET CLIENT ’ S DATA .

Prediction Loss of Target Data


Method
Adult MNIST Purchase CIFAR-10
−3 −4 −5
FedAvg 5.26 × 10 3.85 × 10 2.49 × 10 1.19 × 10−2
FedRetrain 5.49 × 10−3 7.85 × 10−4 2.28 × 10−3 3.51 × 10−2
−2 −3 −2
FedAccum 1.45 × 10 1.26 × 10 1.45 × 10 1.66 × 10−1
FedEraser 5.42 × 10−3 1.03 × 10−3 3.85 × 10−3 2.03 × 10−2

of FedAvg. Compared with FedAccum, FedEraser performs is smaller than that of FedRetrain.
11.5% better. As shown in Fig. 1(b), on the Adult and MNIST 3) Evaluation from the Privacy Perspective: In our exper-
datasets, the performance of our method is slightly worse than iments, we leverage MIAs towards the target client’s data to
baseline by 0.52% and 0.65% respectively. However, in these assess how much information about the these data is still
two cases, FedEraser still performs better than FedAccum contained in our unlearned model. Since the attack classifier
by 10.9% and 4.54%. For Purchase dataset, FedEraser can is trained on the data derived from the original global model,
achieve a mean accuracy of 0.934 on the target client’s data, the attack classifier can distinguish the information related to
which is better than that of FedAccum by 8.82%. Nevertheless, the target data precisely. The worse the performance of the
FedRetrain achieves an accuracy of 0.952 on the target data MIA is, the less influence of the target data is stored in the
and performs better than FedEraser by 1.8%. As for the per- global model.
formance on the target client’s data of CIFAR-10, FedEraser For executing MIAs towards the unlearned model, we adopt
obtains a mean accuracy of 0.556 while FedRetrain achieves the strategy of shadow model training [21] to derive the data
0.577. However, FedAccum only can get a prediction accuracy for construct an attack classifier. For ease of presentation, we
of 0.339 on the target data. In general, an ML model has a treat the original model trained by FedAvg as the shadow
higher prediction accuracy of the training data than that of model. Then we execute the attack against the global models
testing data. Therefore, the prediction similarity between the trained by FedEraser and FedRetrain.
unlearned model and the retrained model further reflects the From the results in Fig. 2, we can see that the attack
removal effectiveness of FedEraser. achieves resemble performances on our unlearned model and
Furthermore, we measure the loss values of the target the retrained model. Over all datasets, the inference attacks can
client’s data obtained from different models trained by Fed- only achieve a mean attack precision (resp. recall) around 0.50
Eraser and the comparisons. The experiment results are shown (resp. 0.726) on the global models reconstructed by FedEraser.
in Table III. In general, the prediction loss of FedEraser is Specifically, for Adult dataset, the attack against the original
relatively close to that of FedRetrain, and FedAccum has model can achieve an F1-score of 0.714. The F1-score of the
the largest prediction loss among all comparison methods. attack on the unlearned model (resp. retrained model) is 0.563
For Adult dataset, FedEraser achieves a prediction loss of (resp. 0.571). As for MNIST and Purchase datasets, it only
5.42 × 10−3 that is very close to that of FedAvg and Retrain. differs by 0.34% (resp. 0.15%) on the F1-score difference
However, the loss of FedAccum is 2.7× larger than that of of the attacks against the unlearned and retrained models.
FedEraser. As for MNIST dataset, FedEraser gets a mean loss Besides, compared with FedRetrain, FedEraser can effectively
on the target data of 1.03 × 10−3 , which is 1.3× greater than erase the target data even for a complex model trained on
the baseline but 0.8× smaller than directly accumulating. For CIFAR-10. The inference attack can achieve an F1-score of
Purchase dataset, the loss of FedEraser is 3.85 × 10−3 , which 0.951 on the original model. Moreover, when attacking against
is much closer to the baseline than that of FedAccum. Besides, the unlearned model, the F1-score can just reach to 0.629
FedEraser even achieves a prediction loss of 2.03 × 10−2 that which is even lower than attack on the retrained model by
(a) Attack Precision (b) Attack Recall

Fig. 2. Performance of membership inference attacks.

2.02%. These results illustrate that the unlearned model’s


prediction contains little information about the target data just
as the retrained model, and FedEraser can remove the influence
of the target data from the original global model.
C. Parameter Deviation of Unlearned Model
In this section, we dive into the global model trained by
FedEraser, and analyze the model parameters. To obtain the
insight on the parameter deviation between the retrained and
the unlearned model, we conduct an experiment by tracking
the last layer’s weights of these models at each global training
epoch. We also compare the parameters of the global model
trained by FedAccum.
In Fig. 3, we visualize a histogram of deviation: θ =
arccos ||wwu ||||w
u ẇr
r ||
, where wu (resp. wr ) is the last layer weight Fig. 3. Deviation of the global model parameters (Adult).
of the model trained by FedEraser (resp. FedRetrain). The
deviation between wa and wr is also represented in Fig. 3,
where wa is the last layer weight of the model trained by on the target data becomes worse but the reconstruction time
FedAccum. increases almost linearly. Specifically, for Adult dataset, when
From the results we can observe that the parameters of the r = 0.1, FedEraser achieves a prediction accuracy of 85.8%
unlearned model are much closer to the retrained model than (resp. 84.9%) on the target (resp. testing) data, and the time
that of the model reconstructed by FedAccum. As shown in consumption for unlearned model reconstruction is just 10.1s,
Fig. 3, the mean angle deviation between wu and wr reaches which attains a speed-up of 10× compared with FedRetrain.
to 3.99◦ (the red dash line in Fig. 3). The deviation of wa is When r increases to 1.0, the accuracy on the target (resp.
higher than that of FedEraser by 2.6× and reaches to 10.56◦ testing) data degrades by 0.5% (resp. 0.2%). However, the
(the green dash line in Fig. 3). The parameters of our unlearned reconstruction time increases and reaches to 100.1s.
model are mainly distributed within 30◦ difference from that For MNIST dataset, FedEraser can achieve a prediction
of the retrained model. However, for the accumulated model, accuracy of 91.9% (resp. 93.5%) on the target (resp. testing)
there are high deviations (0◦ ∼ 90◦ ) between wa and wr . data when r = 0.1. In this case, the time consumption of
Furthermore, in the deviation range greater than 30◦ , it mainly FedEraser is merely 40.3s which is faster than FedRetrain by
contains the parameters of the accumulated models rather than 9.6×. As r increases, the prediction accuracy on the testing
that of our unlearned models. data decreases slightly, which is not the case for that on the
target data. Specifically, with different calibration ratio, the
D. Impact of the Local Calibration Ratio prediction accuracy on the target data fluctuates around 92.1%
To quantify the impact that the local calibration ratio r with a standard deviation of 0.0011.
has on the performance of FedEraser, we reconstruct the As for Purchase dataset, when treating the target data, Fed-
original global model trained on three different datasets with Eraser confronts a reduction of 5.2% in the prediction accuracy
r from 0.1 to 1. When r = 1.0, FedEraser will degenerate to with the increasing calibration ratio. Nevertheless, in this case,
FedRetrain, which is the baseline in our experiments. the testing accuracy of FedEraser increases from 93.9% to
Fig. 4 shows the relationship between the calibration ration 96.2%. As for the reconstruction time of the unlearned model,
and the performance of FedEraser. The relationship is a little FedEraser only needs 19.6s when r = 0.1, yielding a speed-up
complex, but in general, as r increases, the prediction accuracy of 8.4× compared with the time consumption of FedRetrain.
(a) Performance (Adult) (b) Performance (MNIST) (c) Performance (Purchase)

Fig. 4. The impact of calibration ratio on performance of FedEraser.

(a) Performance (Adult) (b) Performance (MNIST) (c) Performance (Purchase)

Fig. 5. The impact of retaining interval on performance of FedEraser.

E. Impact of the Retaining Interval of our unlearned models prompts both on the target and
In this section, we evaluate the performance of FedEraser on testing data when FedEraser confronts MNIST and Purchase
three different datasets with retaining interval ∆t increasing datasets. For MNIST dataset, our unlearned model can achieve
from 1 to 10. The relationship between the performance of a prediction accuracy of 96.7% (resp. 96.8%) on the target
FedEraser and the retaining interval is demonstrated in Fig. 5. (resp. testing) data. As the interval increases, the accuracy on
From the results we can find that with the increasing retaining the target data increases by 2.3%, while that on the testing
interval, the time consumption of FedEraser decays while the data increases by 1.2%. As for Purchase dataset, the accuracy
prediction accuracy on the target data improves better and of our unlearned model increases from 81.3% to 93.7% on the
better. One possible reason for this phenomenon is that with target data, and the accuracy also grows from 80.1% to 92.1%
a large retaining interval, a part of the influences of the target on the testing data. As for the time consumption, FedEraser
data are still remained in the unlearned model. can yield 12× speed-up on both datasets when ∆t = 10.
Recalling that the objective of FedEraser is to eliminate
the influences of a certain client’s data in the original global F. Impact of the Number of Federated Clients
model. These influences are involved by training the original
model on these data and could help this model accurately In this section, we evaluate the performance of FedEraser
classify the target data. Therefore, the higher accuracy on on the Purchase and CIFAR-10 datasets with different number
the target data, the worse performance of FedEraser achieves. of federated clients. From the results in Fig. 6, we can
According to the results in Fig. 5, FedEraser brings about a observe that the performance of the unlearned model gradually
poor performance but a obvious when the retaining interval is degrades with the increasing number of clients. Specifically,
set to a large number. for Purchase dataset (c.f. Fig. 6(a)), when there are 5 federated
As shown in Fig. 5(a), with the retaining interval increasing, clients, the model reconstructed by FedEraser can achieve
the prediction accuracy on the target data increases from a prediction accuracy of 98.0% (resp. 98.1%) on the target
83.8% to 84.4% but the testing accuracy decreases by 0.21%. (resp. testing) data. When the number of clients increasing to
When ∆t = 1, FedEraser spends 36.7s to reconstruct the 25, the prediction accuracy decreases by 1.8% but can still
original global model. But when ∆t = 10, it consumes 19.1s reach 96.3%. However, for the target data, the performance of
to derive the unlearned model which brings a 12× speed-up. FedEraser would degrade by 9.7% and achieve a prediction
As shown in Figs. 5(b) and 5(c), the prediction accuracy accuracy of 88.3%.
(a) Prediction Accuracy (Purchase) (b) Prediction Accuracy (CIFAR-10)

Fig. 6. The impact of the number of federated clients.

For CIFAR-10 dataset (c.f. Fig. 6(b)), with total 5 federated B. Differential Privacy
clients, the unlearned model can achieve a prediction accuracy Differential privacy [25] provides a way to preserve the pri-
of 32.4% (resp. 27.3%) on the testing (resp. target) data. As the vacy of a single sample in a dataset such that an upper bound
number of clients increasing, our unlearned model performs on the amount of information about any particular sample can
worse on both the testing and target data gradually. When be obtained. There have been a series of differentially private
there are 25 clients, the prediction accuracy on the target (resp. versions of ML algorithms, including linear models [26],
testing) data decreases by 5.4% (resp. 7.2%). principal component analysis [27], matrix factorization [28],
Overall, all the results demonstrate that FedEraser can and DNN [29], the parameters of which are learned via adding
achieve a satisfied performance in different datasets with noise in the training phase. In the setting of data forgetting,
different settings. In general, if a data sample has taken however, the removal is expected to be done after the training.
part in a model’s training process, it would leave its unique Drawing on the indistinguishability of differential privacy,
influence on this model so that the model can correctly classify Guo et al. [17] define the notion of -certified removal and pro-
on it. Therefore, the prediction accuracy on the target data vide an algorithm for linear and logistic regression. Golatkar et
can measure how much influence of these data left in the al. [30] propose a selective forgetting procedure for DNNs by
unlearned model. The less influence of the target data leaves changing information (adding noises) in the trained weights.
on the model reconstructed by FedEraser, the lower prediction They further extend this framework to disturb activations [31],
accuracy on the target data this model can achieve, and the using a neural tangent kernel based scrubbing procedure. The
better performance of FedEraser will be. major challenge in differential privacy based unlearning is how
to balance the protected information and the model utility.

V. R ELATED W ORK C. MIAs against ML Models


MIA against ML models was first studied by Shokri et
A. Machine Unlearning al. [21], where multiple shadow models with the same struc-
ture as the victim model are constructed to facilitate training
The term “machine unlearning” is introduced by Cao et the attack models. Later on, Salem et al. [22] show that it
al. [14], where an efficient forgetting algorithm in the restricted is possible to achieve the resemble attack performance with
context of statistical query learning is proposed. Thereafter, only one shadow model. Liu et al. [32] leverage the idea
machine unlearning for different ML models have been ex- of generative adversarial networks (GAN) to train a mimic
plored. Ginart et al. [15] examine the problem of data removal model instead of the shadow model. Apart from mimicking
algorithm for stochastic algorithms, in particular for variants the prediction behavior, other knowledge of the victim model
of k-means clustering, but cannot be applied to supervised is adopted to launch MIAs, including the training loss [33],
learning. Izzo et al. [23] focus on supervised linear regression model parameters [12], model gradients [18], and output
and develop the projective residual update technique that scales distributions [34]. Intuitively and universally acknowledged,
linearly in the dimension of the data. Baumhauer et al [24] such attacks can serve as one of the best befitting manners
propose a forgetting procedure for logit-based classification for measuring the quality of unlearning [35], [36], especially
models by applying linear transformation to the output logits, given few eligible metrics for evaluating the performance of
but do not remove information from the weights. Most re- such attacks. Therefore in this paper, we also adopt MIAs to
cently, Bourtoule et al. [16] introduce a more general algorithm evaluate the effectiveness of FedEraser.
named SISA, which takes advantage of sharding and slicing
during the training. Nevertheless, existing machine unlearning VI. C ONCLUSION
studies focus on ML models in traditional centralized settings, In this paper, we have presented FedEraser, the first feder-
and is ineligible for unlearning in FL scenarios. ated unlearning methodology that can eliminate the influences
of a federated client’s data on the global model while signif- [13] M. Song, Z. Wang, Z. Zhang, Y. Song, Q. Wang, J. Ren, and H. Qi,
icantly reducing the time consumption used for constructing “Analyzing user-level privacy attack against federated learning,” IEEE
Journal on Selected Areas in Communications, vol. 38, no. 10, pp. 2430–
the unlearned model. FedEraser is non-intrusive and can serve 2444, 2020. 1
as an opt-in component inside existing FL systems. It does [14] Y. Cao and J. Yang, “Towards making systems forget with machine
not involve any information about the target client, enabling unlearning,” in Proceedings of IEEE S&P, 2015, pp. 463–480. 1, 9
[15] T. Ginart, M. Y. Guan, G. Valiant, and J. Zou, “Making AI forget you:
the unlearning process performed unwittingly. Experiments Data deletion in machine learning,” in Proceedings of NeurIPS, 2019,
on four realistic datasets demonstrate the effectiveness of pp. 3513–3526. 1, 9
FedEraser, with an obvious speed-up of unlearning compared [16] L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia,
A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,”
with retraining from scratch. We envision our work as an in Proceedings of IEEE S&P, 2020. 1, 4, 9
early step in FL towards compliance with legal and ethical [17] C. Guo, T. Goldstein, A. Hannun, and L. van der Maaten, “Certified
criteria in a fair and transparent manner. There are abundant data removal from machine learning models,” in Proceedings of ICML,
2020. 1, 9
interesting directions opened up ahead, e.g., instance-level fed- [18] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy
erated unlearning, federated unlearning without client training, analysis of deep learning: Passive and active white-box inference attacks
federated unlearning verification, to name a few. We plan to against centralized and federated learning,” in Proceedings of IEEE S&P,
2019, pp. 739–753. 2, 9
investigate these appealing subjects in the near future. [19] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated
optimization: Distributed machine learning for on-device intelligence,”
ACKNOWLEDGEMENT CoRR, arXiv:1610.02527, 2016. 2
[20] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and
D. Bacon, “Federated learning: Strategies for improving communication
This work was supported in part by the National Natural efficiency,” CoRR, arXiv:1610.05492, 2016. 2
Science Foundation of China under Grants 61872416 and [21] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership
62002104; by the Fundamental Research Funds for the Central inference attacks against machine learning models,” in Proceedings of
IEEE S&P, 2017, pp. 3–18. 4, 6, 9
Universities of China under Grant 2019kfyXJJS017; by the [22] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes,
Natural Science Foundation of Hubei Province of China under “ML-Leaks: Model and data independent membership inference attacks
Grant 2019CFB191; and by the special fund for Wuhan Yellow and defenses on machine learning models,” in Proceedings of NDSS,
2019. 4, 9
Crane Talents (Excellent Young Scholar). The corresponding [23] Z. Izzo, M. A. Smart, K. Chaudhuri, and J. Y. Zou, “Approximate data
author of this paper is Chen Wang. deletion from machine learning models: Algorithms and evaluations.”
CoRR, arXiv:2002.10077, 2020. 9
[24] T. Baumhauer, P. Schöttle, and M. Zeppelzauer, “Machine unlearning:
R EFERENCES Linear filtration for logit-based classifiers.” CoRR, arXiv:2002.02730,
2020. 9
[1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, [25] C. Dwork, “Differential privacy,” in Proceedings of International Col-
“Communication-efficient learning of deep networks from decentralized loquium on Automata, Languages and Programming, M. Bugliesi,
data,” in Proceedings of AISTATS, 2017, pp. 1273–1282. 1, 5 B. Preneel, V. Sassone, and I. Wegener, Eds., 2006, pp. 1–12. 9
[2] J. Konečný, H. B. McMahan, F. X. Yu, A. T. Suresh, D. Bacon, and [26] Y. Chen, A. Machanavajjhala, J. P. Reiter, and A. F. Barrientos, “Differ-
P. Richtárik, “Federated learning: Strategies for improving communica- entially private regression diagnostics,” in Proceedings of IEEE ICDM,
tion efficiency,” CoRR, arXiv:1610.05492, 2016. 1 2016, pp. 81–90. 9
[3] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: [27] W. Jiang, C. Xie, and Z. Zhang, “Wishart mechanism for differentially
Concept and applications,” ACM Transactions on Intelligent Systems and private principal components analysis.” in Proceedings of AAAI, 2016,
Technology, vol. 10, no. 2, pp. 1–19, 2019. 1 pp. 1730–1736. 9
[4] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: [28] H. Imtiaz and A. D. Sarwate, “Distributed differentially private algo-
Challenges, methods, and future directions,” IEEE Signal Processing rithms for matrix and tensor factorization,” IEEE Journal of Selected
Magazine, vol. 37, no. 3, pp. 50–60, 2020. 1 Topics in Signal Processing, vol. 12, no. 6, pp. 1449–1464, 2018. 9
[5] G. Liu, C. Wang, X. Ma, and Y. Yang, “Keep your data locally: Federated [29] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov,
learning based data privacy preservation in edge computing,” IEEE K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in
Network Magazine, vol. 35, no. 2, pp. 60–66, 2021. 1 Proceedings of ACM CCS, 2016, pp. 308–318. 9
[6] P. Voigt and A. Bussche, The Eu General Data Protection Regulation [30] A. Golaktar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless
(GDPR): A Practical Guide. Springer, 2017. 1 net: Selective forgetting in deep networks,” in Proceedings of CVPR,
[7] E. Harding, J. J. Vanto, R. Clark, L. H. Ji, and S. C. Ainsworth, 2020. 9
“Understanding the scope and impact of the California Consumer [31] A. Golatkar, A. Achille, and S. Soatto, “Forgetting outside the box:
Privacy Act of 2018,” Journal of Data Protection & Privacy, 2019. Scrubbing deep networks of information accessible from input-output
1 observations,” in Proceedings of ECCV, 2020. 9
[8] C. Xie, K. Huang, P.-Y. Chen, and B. Li, “DBA: Distributed backdoor [32] G. Liu, C. Wang, K. Peng, H. Huang, Y. Li, and W. Cheng, “SocInf:
attacks against federated learning,” in Proceedings of ICLR, 2020. 1 Membership inference attacks on social media health data with machine
[9] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How learning,” IEEE Transactions on Computational Social Systems, vol. 6,
to backdoor federated learning,” in Proceedings of AISTATS, 2020, pp. no. 5, pp. 907–921, 2019. 9
2938–2948. 1 [33] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in ma-
[10] C. Fung, C. J. Yoon, and I. Beschastnikh, “The limitations of federated chine learning: Analyzing the connection to overfitting,” in Proceedings
learning in sybil settings,” in Proceedings of RAID, 2020, pp. 301–316. of IEEE CSF, 2018, pp. 268–282. 9
1 [34] B. Hui, Y. Yang, H. Yuan, P. Burlina, N. Z. Gong, and Y. Cao, “Practical
[11] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy blind membership inference attack via differential comparisons,” in
analysis of deep learning: Stand-alone and federated learning under Proceedings of NDSS, 2021. 9
passive and active white-box inference attacks,” in Proceedings of IEEE [35] D. M. Sommer, L. Song, S. Wagh, and P. Mittal, “Towards probabilistic
S&P, 2019, pp. 739–753. 1 verification of machine unlearning,” CoRR, arXiv:2003.04247, 2020. 9
[12] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting [36] A. Sablayrolles, D. Matthijs, C. Schmid, and H. Jegou, “Radioactive
unintended feature leakage in collaborative learning,” in Proceedings of data: tracing through training,” in Proceedings of ICML, 2020. 9
IEEE S&P, 2019, pp. 691–706. 1, 9

You might also like