A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
A Novel Edge-Based Multi-Layer Hierarchical Architecture For Federated Learning
Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) | 978-1-6654-2174-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/DASC-PICOM-CBDCOM-CYBERSCITECH52372.2021.00047
2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl
Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress
Abstract—In the last few years, Internet of Things (IoT) devices edge computing, recently born is the Federated Learning (FL)
are multiplicating their presence in our daily life. This means that [10]. FL is a machine learning approach permitting distributed
the data generated in our houses, offices, and common places learning across edge devices. These edge devices collaborate to
is starting to be too big to be elaborated in a limited number
of places. In this scenario, the advent of Edge Computing, in
train a common model that has been set by a high level cloud
general, and Edge Intelligence, in particular, is favoring the entity. Such model, once refined, is sent back to the cloud,
scalability and the efficiency of IoT systems. Such paradigms where it can be aggregated with other models elaborated on
allow, by using devices placed at the edge of the network, different edge devices. Once the model is aggregated, it can be
the distributed elaboration of data created by IoT devices so scattered again to all the edge nodes involved in the training.
permitting to transmit to the cloud only synthetic information.
Edge Intelligence supports the so-called Federated Learning (FL),
FL was firstly thought by Google engineers to tackle privacy
which is a novel paradigm that allows the distributed training problems with Android devices [11]. The privacy of such
of neural network models. Such models are initially distributed edge devices was greatly achieved thanks to FL that allows to
from the cloud to edge nodes and, on such edge nodes, they are exchange over Internet only the weights of a common model
refined based on data gathered from IoT nodes. Such refined shared among the edge (mobile/IoT) devices. Moreover, FL
models are sent back to the cloud and merged with other models
elaborated on different edge nodes.
is a good instrument to reduce the data sent to the cloud, so
This paper presents a novel architecture for Federated Learning decreasing the size of the packets sent to remote locations.
enabling a Multi-Layer Hierarchical Federated Learning (MLH- This is an important point of FL that, reducing communica-
FL) that allows to execute the traditional FL with model tions, also helps in saving energy from a communication point
aggregation at different layers. The proposed approach will be of view. Anyway, a limit of the traditional FL is that it does
also evaluated with some simulations and the final accuracy and
loss of the obtained models will be compared with the traditional
not conceptualize an edge part comprehending several layers
FL approach. to make aggregations at different levels so reducing again the
Index Terms—Internet of Things, Federated Learning, Edge communications towards the cloud.
Computing, Edge Intelligence, Hierarchical FL, Neural Net- This paper wants to introduce a multi-layer architec-
works, Machine Learning. ture, called Multi-Layer Hierarchical Federated Learning
(MLH-FL). MLH-FL uses a hierarchical architecture at the
I. I NTRODUCTION edge to take advantage of model aggregations at different
Recently, several technologies have been developed to en- layers. This permits to apply the MLH-FL to multi-layered
hance the quality of life of people. Such people are now networks and execute model aggregations also when such
conscious to live in Smart Environments [1] [2] [3] where networks are not connected to the Internet. Moreover, in our
several smart devices, belonging to the Internet of Things architecture, we introduce the concept of low level round, to
(IoT) [4] [5] world, are monitoring the environment, actuating repeat several times aggregations to the edge, so greatly saving
actions, and elaborating data. All of this, gives the basis on the number of communications towards the cloud and,
of the so-called Cognitive Environments [6], which utilize consequently, saving on the communication energy. Experi-
the data gathered to learn, adapt to, and predict user needs. ments will show that the accuracy and the loss of the models
Cognitive environments need lots of computation power to obtained by using the proposed approach will be comparable
execute their algorithms and, moreover, they require to have with the traditional FL approach. Anyway, these models will
fast execution of some tasks. One of the fundamental bricks be obtained reducing a lot the communications towards the
of the cognitive environments is represented by the Edge cloud.
Computing [7], which enables the distributed elaboration of The remainder of the paper will show, in Section II, some
data so increasing the system reactivity and reducing the related work. Section III will explain the proposal of the paper,
interactions/communications with a (sometimes expensive) while Section IV will show some comparison results between
cloud system [8] [9]. One of the techniques, leveraging the the proposal and the traditional FL. Finally, some conclusions
will be drawn.
F. De Rango and P. Raimondo are with University of Calabria, Rende (CS),
Italy, e-mail: (derango,p.raimondo)@dimes.unical.it II. R ELATED W ORK
A. Guerrieri and G. Spezzano are with ICAR-CNR - Institute for high
performance computing and networking - National Research Council of Italy, The Federated Learning (FL) [12] is a decentralized and
Rende (CS), Italy, e-mail: (guerrieri,spezzano)@icar.cnr.it collaborative approach specifically conceived to preserve pri-
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on September 08,2022 at 12:17:40 UTC from IEEE Xplore. Restrictions apply.
vacy of the data collected at the edge and, together with this, adopt the Federated Averaging (FedAvg) [18] as algorithm for
to limit data exchange towards cloud datacenters. FL is a aggregations because it is the most used algorithm in literature
topic of great interest both for the research and the industrial for aggregating trained models. What we suppose is that the
communities. In the last few years, many studies have been 𝑁0 nodes at layer 0 and the 𝑁1 nodes at layer 1 are of a
done and many experiments have been performed to highlight significant amount and that every layer aggregates models
the powerfulness of FL and its applicability in several fields from its subsequent layer. This allows (i) to have a small
[13]. amount of nodes in the upper layers that reside in the edge and,
Anyway, considering the communication load towards the consequentially, (ii) to reduce the communications towards the
cloud, the FL could be enhanced by actuating aggregations of cloud that will receive only the already aggregated models
the considered model, which has been sent from the cloud, from different sub-networks. The communication towards the
already in the edge. In this direction, in the last months, cloud can be still decreased by introducing the concept of
several works have done some steps forward. The paper in round. A round (of level 𝑖) represents the times in which a
[14] proposes a hierarchical FL framework for heterogeneous model is aggregated at layer 𝑖 before sending the resulting
cellular networks (HCNs) which uses cell base stations to model to the upper layers. In this paper we will distinguish
perform distributed aggregations of models. In the paper, between low level rounds, that are the iterations of aggrega-
anyway, just one layer for model aggregation is added to tions of the model executed at layer 1, and high level rounds,
the traditional FL architecture. In [15] the authors propose that are the times in which the models are aggregated in the
a LAN-aware FL, a new FL paradigm that takes advantages cloud. It is worth noting that, by increasing the number of the
of local-area networks to make aggregations of models at the low level rounds, the communications towards the cloud are
edge (namely, in the LAN) before sending the model itself to still reduced. Obviously, in the traditional FL, only the high
the cloud. Also this work does not take in consideration the level round (hereafter called just "round") is possible because,
possibility of adding more than one layer between the cloud in this case, IoT devices directly communicate with the cloud
and the IoT devices. The work at [16] proposes a client-edge- node, which iterates some rounds to refine its model.
cloud hierarchical Federated Learning system that allows mul- To better make a comparison between the traditional FL and
tiple edge servers to perform partial model aggregation. Here, the MLH-FL approaches, the algorithms executed in both the
the authors show that, by introducing the intermediate edge cases are reported in the following. The algorithm in Alg.1
servers, the model training time and the energy consumption represents the traditional FL approach, which executes on
of the end devices can be simultaneously reduced compared to networks where there are only the IoT Devices and the cloud
cloud-based traditional FL. Also this paper does not introduce layers. Here, after that the initial model is sent from the cloud
the benefits of a multi-layered edge architecture for FL. The to the IoT devices (edge nodes), such devices collect IoT data
authors of [17] start with the conceptualization a multi-layer and update their own internal model (these two operations
FL, but they do not define accurately how it can be used and, can be repeated per 𝑒 𝑝𝑜𝑐ℎ times). All of this is repeated for
moreover, they do not make experimentations to compare their 𝑅𝑜𝑢𝑛𝑑𝑠 times together with the migration of the edge models
proposal with the traditional FL one. to the cloud, their aggregation, and the final dissemination of
In the direction of this last related work, our proposal wants such an aggregation to the edge nodes.
to present a Multi-Layer Hierarchical Federated Learning Alg.2 represents the algorithm specifically thought for the
(MLH-FL) architecture where aggregations can be realized at proposed MLH-FL architecture. In this case, it is essential
different layers and several times before sending the resulting to know the total layers 𝑌 of the architecture since the
models to the cloud. We will also propose some preliminary dissemination of the model from the cloud has to regard these
experiments to show the effectiveness of the proposal. layers. The biggest differences of this algorithm with respect to
previous one are represented by (i) the while at lines 7−13 that
III. P ROPOSAL makes some aggregations (𝐿𝑜𝑤𝐿𝑒𝑣𝑒𝑙 𝑅𝑜𝑢𝑛𝑑𝑠 aggregations) at
This section will detail the proposed MLH-FH architecture level 1, and (ii) the while at lines 15−18 where we have several
which will merge models for object classification at different model aggregations towards the cloud. These two while clearly
levels leveraging on data collected by IoT devices. The ar- reduce the communication between edge nodes and the cloud
chitecture proposed is shown in Figure 1, where the network so increasing energy saving in the communication. This last
taken in consideration is simplified as a tree in which the IoT algorithm could be enhanced by still adding rounds at different
devices are in the bottom edge layer, the orange nodes are the layers between layer 2 and layer Y. Such enhancement will be
edge devices, the blue node is the cloud node, and the links investigated in the future work.
among these nodes represent the communication links.
As can be seen, the nodes at layer 1 (𝑁1 nodes) are directly IV. C ASE S TUDY S CENARIO
connected with IoT devices (layer 0 - 𝑁0 nodes). Layer 0 nodes The case study here described will compare the Traditional
collect IoT data and make training starting from the model they FL with the proposed MLH-FL approach.
received from the cloud (such training is usually repeated for a For the experiments and the analysis in the following, we
number of iterations called epochs). Nodes at the upper layers will consider a network configuration comprehending 4 layers
only work as aggregators of models. In particular, we will of nodes (layer 0 to layer 3, 𝑌 = 3). At layer 0 we have 20 IoT
222
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on September 08,2022 at 12:17:40 UTC from IEEE Xplore. Restrictions apply.
<sIg9
Y]kG"]GI
Â
Y]kGOOgIO<j]g
<sIg9Â
"9ÂGOIOOgIO<j]gh
GOI"]GI GOI"]GI
<sIgÃ
"ÃGOIOOgIO<j]gh GOI"]GI GOI"]GI GOI"]GI GOI"]GI
<sIgÂ
"ÂGOIOOgIO<j]gh
GOI"]GI GOI"]GI GOI"]GI GOI"]GI GOI"]GI
<sIgÁ
"Á]0IpQEIh
devices (splitted in 4 groups of 5 elements each) collecting and able to give to each client an heterogeneous part of the dataset
labelling data, and performing the training on their model; at with a number of classes between 2 and 3.
layer 1 there are 4 edge nodes (each corresponding to a group For the experiments, we used a Neural Network that is well
of IoT devices) merging the models from layer 0; at layer known in literature, called Visual Geometry Group (VGG) 11
2 there are 2 edge nodes aggregating the models from the [20], that is typically used for image recognition.
layer 1; and at layer 3 we have one only cloud aggregator In the following, the proposed MLH-FL approach, applied
that receives the models from the two edge nodes of layer 2, to the network introduced above, will be compared to the
aggregates them, and disseminates the resulting model to the traditional FL one, which will be applied on a 2 layers network
edge nodes until layer 0. (layer 0 and layer 1) with the same number of nodes at layer
The dataset we used is the CIFAR-10 dataset1 . CIFAR- 0 (i.e. 20) and one node only in the cloud.
10 is broadly used for many cases in which image classi-
fication models are trained and evaluated [19]. The dataset V. S IMULATION R ESULTS
contains 60000 figures of 10 different classes and is divided All the experiments reported in the following have been
in 50000 images for training and 10000 for testing purposes. simulated by using Google Colaboratory2 (or "Colab"), which
In our experiments, in order to emulate a real case study, allows to write and execute in any browser Python code, while
we preprocessed the dataset so to realize a NON-Independent Pytorch3 has been used for Neural Networks training and
and Identically Distributed (IID) distribution of the CIFAR-10 aggregation.
among the twenty IoT devices at layer 0. In this way, we were
2 Google Colaboratory: https://research.google.com/colaboratory/
1 The 3 PyTorch: https://pytorch.org/
CIFAR-10 Dataset: https://www.cs.toronto.edu/~kriz/cifar.html
223
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on September 08,2022 at 12:17:40 UTC from IEEE Xplore. Restrictions apply.
Algorithm 2 Multi-Layer Hierarchical Federated Learning (MLH-FL) Algorithm
1: procedure MLH-FL(𝐻𝑖𝑔ℎ𝐿𝑒𝑣𝑒𝑙 𝑅𝑜𝑢𝑛𝑑𝑠, 𝐿𝑜𝑤𝐿𝑒𝑣𝑒𝑙 𝑅𝑜𝑢𝑛𝑑𝑠, 𝑌 )
2: 𝑐𝑢𝑟𝑟 𝐻𝑖𝑔ℎ𝐿𝑅 = 0, 𝑐𝑢𝑟𝑟 𝐿𝑜𝑤𝐿𝑅 = 0, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 = 𝑌
3: while 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 > 0 do ⊲ the model is propagated from the cloud until edge layer 0
4: 𝑠𝑒𝑛𝑑𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 − 1)
5: 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 − −
6: while 𝑐𝑢𝑟𝑟 𝐻𝑖𝑔ℎ𝐿𝑅 < 𝐻𝑖𝑔ℎ𝐿𝑒𝑣𝑒𝑙 𝑅𝑜𝑢𝑛𝑑𝑠 do
7: while 𝑐𝑢𝑟𝑟 𝐿𝑜𝑤𝐿𝑅 < 𝐿𝑜𝑤𝐿𝑒𝑣𝑒𝑙 𝑅𝑜𝑢𝑛𝑑𝑠 do
8: 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝐼𝑜𝑇 𝐷𝑎𝑡𝑎(𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟) ⊲ data is collected in the IoT devices
9: 𝑢 𝑝𝑑𝑎𝑡𝑒𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙 𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟) ⊲ the current internal model is updated based on the data collected
10: 𝑠𝑒𝑛𝑑𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1)
11: 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1) ⊲ the model is aggregated at layer 1
12: 𝑠𝑒𝑛𝑑𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟)
13: 𝑐𝑢𝑟𝑟 𝐿𝑜𝑤𝐿𝑅 + +
14: currLayer++
15: while 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 < 𝑌 do
16: 𝑠𝑒𝑛𝑑𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1) ⊲ the updated model is sent to the upper layers
17: 𝑎𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1) ⊲ the updated model is merged with other models at layer 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + 1
18: 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 + +
19: while 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 > 0 do ⊲ we still propagate the model from the cloud until layer 0
20: 𝑠𝑒𝑛𝑑𝑀𝑜𝑑𝑒𝑙 (𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟, 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 − 1)
21: 𝑐𝑢𝑟𝑟 𝐿𝑎𝑦𝑒𝑟 − −
22: 𝑐𝑢𝑟𝑟 𝐻𝑖𝑔ℎ𝐿𝑅 + +
23: 𝑐𝑢𝑟𝑟 𝐿𝑜𝑤𝐿𝑅 = 0
224
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on September 08,2022 at 12:17:40 UTC from IEEE Xplore. Restrictions apply.
[5] A. Guerrieri, V. Loscri, A. Rovella, and G. Fortino, Management of cyber
physical objects in the future internet of things. Springer International
Publishing, 2016.
[6] F. Cicirelli, A. Guerrieri, A. Mercuri, G. Spezzano, and A. Vinci,
“Itema: A methodological approach for cognitive edge computing iot
ecosystems,” Future Generation Computer Systems, vol. 92, pp. 189–
197, 2019.
[7] W. Shi and S. Dustdar, “The promise of edge computing,” Computer,
vol. 49, no. 5, pp. 78–81, 2016.
[8] A. F. Santamaria, F. De Rango, A. Serianni, and P. Raimondo, “A
real iot device deployment for e-health applications under lightweight
communication protocols, activity classifier and edge data filtering,”
Computer communications, vol. 128, pp. 60–73, 2018.
[9] A. F. Santamaria, P. Raimondo, M. Tropea, F. De Rango, and C. Aiello,
“An iot surveillance system based on a decentralised architecture,”
Sensors, vol. 19, no. 6, p. 1469, 2019.
[10] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
learning,” Synthesis Lectures on Artificial Intelligence and Machine
Learning, vol. 13, no. 3, pp. 1–207, 2019.
[11] T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ram-
age, and F. Beaufays, “Applied federated learning: Improving google
keyboard query suggestions,” arXiv preprint arXiv:1812.02903, 2018.
[12] L. Li, Y. Fan, M. Tse, and K.-Y. Lin, “A review of applications in
Fig. 3. Loss Comparison between the Traditional FL and the MLH-FL federated learning,” Computers & Industrial Engineering, p. 106854,
approaches. 2020.
[13] J. C. Jiang, B. Kantarci, S. Oktug, and T. Soyata, “Federated learning
in smart city sensing: Challenges and opportunities,” Sensors, vol. 20,
updated models to the cloud. Such innovation allows to still no. 21, p. 6230, 2020.
[14] M. S. H. Abad, E. Ozfatura, D. Gunduz, and O. Ercetin, “Hierarchical
save on the number of communications towards the cloud. federated learning across heterogeneous cellular networks,” in ICASSP
Experiments have shown that the accuracy and the loss 2020-2020 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP). IEEE, 2020, pp. 8866–8870.
of the models obtained by using the proposed approach are [15] J. Yuan, M. Xu, X. Ma, A. Zhou, X. Liu, and S. Wang, “Hierarchi-
comparable with the traditional FL approach. Anyway, these cal federated learning through lan-wan orchestration,” arXiv preprint
models have been obtained reducing a lot the communications arXiv:2010.11612, 2020.
[16] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud
towards the cloud. hierarchical federated learning,” in ICC 2020-2020 IEEE International
Future work will try to evaluate the energy saved by Conference on Communications (ICC). IEEE, 2020, pp. 1–6.
applying the proposed approach. Moreover, more experiments [17] A. Wainakh, A. S. Guinea, T. Grube, and M. Mühlhäuser, “Enhancing
privacy via hierarchical federated learning,” in 2020 IEEE European
will be done to better analyze the differences, in terms of Symposium on Security and Privacy Workshops (EuroS&PW). IEEE,
accuracy and loss, between the proposed MLH-FL and the 2020, pp. 344–347.
traditional FL. Also, model aggregations at different layers [18] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
“Communication-efficient learning of deep networks from decentralized
will be introduced. Finally, in the future we will consider data,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273–
running the tree model on nodes with different computational 1282.
capabilities and variable communications abilities. [19] B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, “Do cifar-10 clas-
sifiers generalize to cifar-10?” 34th Conference on Neural Information
Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
ACKNOWLEDGMENT [20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in International Conference on Learning
This work has been partially supported by the COGITO Representations, 2015.
(A COGnItive dynamic sysTem to allOw buildings to learn
and adapt) project, funded by the Italian Government (PON
ARS01 00836) and by the CNR project "Industrial transition
and resilience of post-Covid19 Societies - Sub-project: Energy
Efficient Cognitive Buildings (TIRS-EECB)".
R EFERENCES
[1] A. Souri, A. Hussien, M. Hoseyninezhad, and M. Norouzi, “A systematic
review of iot communication strategies for an efficient smart environ-
ment,” Transactions on Emerging Telecommunications Technologies, p.
e3736, 2019.
[2] D. Singh, E. Merdivan, S. Hanke, J. Kropf, M. Geist, and A. Holzinger,
“Convolutional and recurrent neural networks for activity recognition
in smart environment,” in Towards integrative machine learning and
knowledge extraction. Springer, 2017, pp. 194–205.
[3] F. Cicirelli, A. Guerrieri, C. Mastroianni, G. Spezzano, and A. Vinci, The
Internet of Things for smart urban ecosystems. Springer International
Publishing, 2019.
[4] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,”
Computer networks, vol. 54, no. 15, pp. 2787–2805, 2010.
225
Authorized licensed use limited to: Bibliothèque ÉTS. Downloaded on September 08,2022 at 12:17:40 UTC from IEEE Xplore. Restrictions apply.