0% found this document useful (0 votes)
25 views

IEEE2023 Deep Reinforcement Learning Competitive Task Assignment Enterprise Blockchain

This document describes a deep reinforcement learning approach for competitive task assignment on an enterprise blockchain platform. The approach uses a blockchain-enabled task assignment platform built on Hyperledger Fabric. It proposes a deep reinforcement learning framework for agents to predict task runtimes based on their current workload. This allows agents to compete for tasks based on either predicted accuracy or lower price, improving over time. The approach aims to assign tasks in the least time for enterprise environments using this blockchain platform and deep reinforcement learning model.

Uploaded by

balavinmail
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

IEEE2023 Deep Reinforcement Learning Competitive Task Assignment Enterprise Blockchain

This document describes a deep reinforcement learning approach for competitive task assignment on an enterprise blockchain platform. The approach uses a blockchain-enabled task assignment platform built on Hyperledger Fabric. It proposes a deep reinforcement learning framework for agents to predict task runtimes based on their current workload. This allows agents to compete for tasks based on either predicted accuracy or lower price, improving over time. The approach aims to assign tasks in the least time for enterprise environments using this blockchain platform and deep reinforcement learning model.

Uploaded by

balavinmail
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 31 March 2023, accepted 6 May 2023, date of publication 16 May 2023, date of current version 23 May 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3276859

A Deep Reinforcement Learning Approach for


Competitive Task Assignment in
Enterprise Blockchain
GAETANO VOLPE , AGOSTINO MARCELLO MANGINI , (Senior Member, IEEE),
AND MARIA PIA FANTI , (Fellow, IEEE)
Department of Electrical and Information Engineering, Polytechnic University of Bari, 70125 Bari, Italy
Corresponding author: Gaetano Volpe ([email protected])

ABSTRACT With the advent of Industry 4.0, the demand of high computing power for tasks such as data
mining, 3D rendering, file conversion and cryptography is continuously growing. To this extent, distributed
and decentralized environments play a fundamental role by dramatically increasing the amount of available
resources. However, there are still several issues in the existing resource sharing solutions, such as the
uncertainty of task running time, the renting price and the security of transactions. In this work, we present
a blockchain-enabled task assignment platform by performance prediction based on Hyperledger Fabric,
an open-source solution for private and permissioned blockchains in enterprise contexts that outperforms
other technologies in terms of modularity, security and performance. We propose a model-free deep
reinforcement learning framework to predict task runtime in agents current load state while the agent is
engaged in multiple concurrent tasks. In addition, we let clients choose between prediction accuracy and
price saving on each request. This way, we implicitly give inaccurate agents a chance to get assignments
by competing in price rather than in time, allowing them to collect new experiences and improve future
predictions. We conduct extensive experiments to evaluate the performance of the proposed scheme.

INDEX TERMS Blockchain, cloud, deep reinforcement learning (DRL), resource sharing.

I. INTRODUCTION in enterprise environments that they are completed in the least


The migration from on-premise to Cloud-based platforms has time possible, especially when they are part of a production
been rising over the past years. The public Cloud infras- flow.
tructure as a service (IaaS) spending forecast for 2023 is of However, due to the high volatility of resources availabil-
156 billions U.S. dollars, whereas in 2019 it was only 45 [1]. ity in the Cloud, application runtime is not always static
Consequently, due to the high requirements of several appli- and strictly depends on the assigned environment current
cation categories, the rapid increase of company investments workload, therefore it needs to be somehow estimated. This
in the Cloud leads to a continuous growing of computing problem is well known in literature as performance prediction
power, memory and storage demands. and it has been addressed since 2005 with different techniques
In this context, scientific workflows, data mining algo- based on the continuous collection of resource consumption
rithms, file format conversion (e.g. video and audio file), 3D data such as CPU and memory usage. In [2], a systematic
rendering and cryptography-based solutions are only some review of performance prediction of parallel applications is
examples. All of those applications are resource consumption provided. The authors observe that in the 81.7% of considered
intensive in terms of CPU and memory and it is often required papers, the estimation is based on analytic methods in which
the corresponding equations are either manually or auto-
The associate editor coordinating the review of this manuscript and matically derived mostly by stochastic linear and non-linear
approving it for publication was Abderrahmane Lakas . approaches. On the other hand, in the remaining 18.29%,

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


48236 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

non-analytic methods, such as Support Vector Machines or proposed in [3] and [4], which are restricted to only some
Artificial Neural Networks are used. Although most of these classes of platforms and applications, the novel method is
approaches are quite accurate, they cannot be easily used in neither application-specific or platform-dependent and is able
modern Cloud environments for at least three reasons: to work in a generic cloud environment in which, for each
agent, a certain number of concurrent different processes are
i) they are too application-specific or platform-dependent
running. The main objective of the proposed approach is
and, therefore, they have a limited field of application;
to find, according to the client preferences, the agent that
ii) they are intended for single-core processors or for a
completes the execution in least time or, in alternative, the
specific class of multi-core architectures and they do
cheapest agent.
not consider the vast class of available platforms;
Differently from [9], which is focused mostly on the
iii) they assume a free workload context in which the
use of Blockchain for the implementation of optimal auc-
underlying resources are not being used by other con-
tion and bidding strategies, we leverage on Hyperledger
current processes, which is unrealistic in a typical
Fabric to manage both the agent selection process by auc-
Cloud environment.
tion and the task execution. In addition, we adopt a DRL
Indeed, these limitations are reflected in some recent incremental learning approach [10] to enable each agent to
works. For example, in [3], the proposed performance pre- predict task runtime and therefore place a bid for a sub-
diction approach is restricted to scientific applications on mitted task. The proposed DRL-based algorithm overcomes
HPC Cluster platforms. In contrast, the method proposed the limitations of [8] as it does not assume any preliminary
in [4] is not application-specific but it is restricted to Grid consumption-related information in advance.
environments. Moreover, many of the existing works, such In more detail, the paper contributions are listed as follows:
as [5] and [6], suggest an offline approach, whereas all mod-
1) A Blockchain-based trading platform is designed
ern applications require an online continuous incremental
on top of Hyperledger Fabric to orchestrate clients
learning.
requests, agents bids and resulting task assignment.
As a result, due also to the heterogeneity of the applications
We formulate a double bidding strategy according to
and the volatility of configuration and availability of the
client preferences. On the one hand, agents provide
underlying resources, the existing methods are not generally
both a runtime estimation and a price. On the other
applicable in a Cloud environment.
hand, clients choice between the least price and the
In alternative, the adoption of Reinforcement Learning-
least estimated time according to their current setting.
based (RL) techniques is a valid approach. In [7] and [8],
2) A model-free DRL framework for task runtime esti-
the authors propose RL-based methods for solving specific
mation to support the agent selection process. The
resource management and task mapping problems and min-
proposed algorithm enables the agent to incrementally
imizing application runtime by prediction. However, [7] is
learn how to do predictions for generic tasks repre-
restricted to the problem of optimizing the performance of
sented by only two parameters, considering its current
communication bound applications on parallel computing
load in terms of resources consumption and already
systems. In contrast, [8] proposes a mapping problem for
running tasks.
generic tasks in a multi-resource cluster but, in its prediction
approach, it assumes that the resource demand for each task To the best of our knowledge, it is the first time that Hyper-
is known in advance. When it comes to Cloud environments, ledger Fabric Blockchain and DRL are combined together for
this is certainly a strong assumption as the task demands are task assignment orchestration and performance prediction.
not necessarily known upon arrival on the system. In this regard, in our previous works [11] and [12], we pro-
In addition, decentralization is another key factor in Cloud pose a similar framework, which is based on Ethereum [13]
environments. In fact, in a typical scenario multiple resources and implements a traditional offline deep learning-based pre-
are available from different locations on a variable time-based diction strategy, rather than the incremental online approach
fee. The resulting resources trading problem among nodes proposed in [13].
poses several security and integrity issues for transactions that We conduct extensive experiments to evaluate the perfor-
have been addressed in recent literature with auction systems mance of the proposed DRL algorithm with different training
implemented on Blockchain, a popular secure distributed factors. In particular, we show how it works by varying
ledger technology, and combined with RL-based techniques discount factor, number of episodes and policy exploration
for various optimization objectives such as maximizing par- vs. exploitation probability.
ticipants payoff, minimizing energy consumption or adjusting The rest of the paper is organized as follows: Section II
block size [9]. gives some preliminaries on Blockchain, Hyperledger Fabric
In this paper, we overcome the limitations of the exist- and DRL. Section III describes some works about perfor-
ing methods and we formulate a task assignment prob- mance prediction, DRL-based task assignment algorithms
lem by performance prediction for Cloud applications in and resources trading systems in Blockchain environments
a multi-agent environment that is based on an incremen- combined with DRL. Section IV introduces the proposed
tal online learning process. Diversely from the approaches system model. The DRL-based task assignment approach is

VOLUME 11, 2023 48237


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

described in section V, whereas in Section VI experimental III. RELATED WORK


results are presented in detail. Finally, Section VII concludes The prediction of performance for a vast class of tasks with
the paper. machine learning (ML) techniques has been extensively stud-
ied over the past years. Some of the earliest works suggest
II. BACKGROUND how to predict resource consumption over time, such as CPU
A. BLOCKCHAIN utilization, amount of used memory, disk space and network
Blockchain is a distributed ledger technology which is used bandwidth, given a set of environment attributes.
to record transactions without the need of a central author- In [24], the suitability of several machine learning tech-
ity. Each transaction is signed at least by the issuer and it niques for predicting spatio-temporal utilization of resources
is verifiable by the nodes participating in the peer-to-peer by two bioinformatics applications, BLAST and RAxML,
network [14]. Transactions are also grouped in blocks and is studied. Some of the investigated methods are: k-nearest
each block is connected to the previous one by including its neighbor, linear regression, decision table, Radial Basis Func-
Secure Hash Algorithm (SHA) SHA-256 in the header, thus tion network, Predicting Query Runtime and Support Vector
creating a tamper-proof chain [15]. Moreover, all nodes must Machine. The authors conclude that different algorithms per-
agree to which transactions are valid and the order in which form better in different situations and that including as many
they have to be stored in the ledger. This is the goal of the attributes as available, even from monitoring systems, can
so-called consensus mechanism and different methods exist improve prediction performance.
to reach such consensus in a Blockchain [16]. In [25], a novel approach for learning ensemble predic-
A Smart Contract is a decentralized program code tion models of tasks runtime is presented. The authors use
deployed in the Blockchain that enforces the terms and con- bagging techniques to produce ensembles of regression trees
ditions of a specific agreement between two or more parties to be used in scientific workflows, such as gene expression
[17]. It is executed automatically when triggering conditions analysis, made of different sub-tasks, and show that their
are satisfied. method leads to significant prediction-error reductions when
compared with standalone models.
Still referring to scientific workflows, an online incre-
B. HYPERLEDGER FABRIC
mental learning approach to predict the runtime of tasks in
Hyperledger Fabric is an enterprise-grade Blockchain plat- cloud environments is suggested in [26]. The incremental
form which has great advantages, compared to other plat- approach enables to capture some typical cloud features, such
forms such as Ethereum, in terms of transactional privacy, as the continuous environmental changes and the heterogene-
flexibility and data-query capabilities [18]. Since it is mainly ity of the different platforms whereas. Recurrent Neural Net-
designed for permissioned networks, a new node must be work (RNN) and K-Nearest Neighbors (KNN) are used for
authorized to join the network by another node with appro- estimation. To improve predictions, fine-grained time-series
priate permissions. In the same way, a user can be granted resources monitoring data related to CPU utilization, memory
access to all of part of the data according to its role. More- usage, and I/O activities are collected for each unique task in
over, in order to ease deployment, it leverages on a series the form of time-series records. The authors show that their
of single micro-services based on Docker [19] containers. approach significantly outperforms state-of-the-art solutions
The list of services includes peer, Certification Authority in terms of estimation error.
(CA) and orderer, each with a specific role. A CouchDB [20] A similar approach, called two-stage prediction, is pro-
no-sql database is often used to deploy the ledger, while Smart posed in [27]. In this work, first the resources consumption
Contracts are called Chaincodes. Some of the most popu- in terms of CPU utilization, memory, storage, bandwidth, I/O
lar languages are supported to develop chaincodes includ- is estimated for a single task instance in a cloud environment
ing Java, NodeJs and GoLang. Finally, on consensus layer, and then the result is used for runtime prediction. However,
Hyperledger Fabric supports practical leader-based algo- that is an offline machine learning approach and has some
rithms, such as RAFT [21], in which a recognized leader node limitations compared to online approach in which data are
publishes the blocks while all other nodes verify and validate processed as soon as they arrive in a near real-time fashion
transactions. and it does not reflect the streaming nature of workloads in
cloud environments.
C. DEEP REINFORCEMENT LEARNING Furthermore, in [2] a systematic literature review of per-
DRL is a technique that combines traditional RL and deep formance prediction methods for parallel applications is pre-
learning. One of the first approaches of DRL was DQN for sented that includes analytic and non-analytic methods and
Atari games [10], in which a deep neural network (DNN) was indicates future research trends and some unsolved issues.
used as the function approximator in the place of traditional In particular, this work shows that performance prediction
Q-learning [22]. A more recent application is Deep Deter- has been applied on a wide range of domains and reviews
ministic Policy Gradient (DDPG) [23], which is designed for 82 different approaches developed between 2005 and 2020.
scenarios with exponentially large continuous action space. However, the authors conclude that most of these methods

48238 VOLUME 11, 2023


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

focus on a specific application type and platform and there- introduce an Ethereum-based Blockchain to safely manage
fore there’s a lack of independent solutions able to work in data sharing with a DRL approach to achieve the maximum
unknown environments. amount of collected data, geographic fairness and minimize
Resource management and task assignment are two other energy consumption. Several simulations experiments show
problems that have received great attention in the related that their method outperforms traditional database based
literature. In this context, the DRL approach, that is often used approaches in terms of reliability and security.
for intelligent-robot related problems such as optimal path In [32], a smart grid blockchain combined with fog com-
planning and obstacle avoidance [28], has recently become puting is suggested. This work includes a Hyperledger Fabric
very popular. For instance, in [8], a multi-resource cluster Blockchain in which the nodes are part of a fog computing
scheduler named DeepRM is presented that is able to learn environment. A verifiable random function is proposed to
how to manage resources directly from experience in an ensure randomness and increase safety in the selection of
online fashion and to optimize various objectives such as the primary node while keeping the probability proportional
minimizing average job slowdown or completion time. The to the computing power provided by each member. Based
authors show that their method performs comparably to state- on storage cost and security constraints, a DRL scheme is
of-the-art heuristics, adapts to different conditions and con- implemented to adjust the block size and the block interval
verges relatively quickly. in the proposed Blockchain. By conducting extensive simu-
In [7], a DRL approach for solving task mapping problems lations, the authors show the superiority of their scheme in
with dynamic traffic on parallel systems is discussed. The terms of throughput and latency. A similar approach, though
algorithm explores better task mappings by using a network intended to decrease energy consumption and to improve the
simulator that predicts performance and runtime commu- efficiency of the consensus process in Blockchain-enabled
nication behaviors. Since communication pattern are often Industrial Internet of Things systems by adjusting block size
changing and unknown, network performance is difficult to and offloading some tasks to computing servers, is suggested
be accurately estimated, therefore the authors claim that DRL in [33]. The problem is formulated as a high-dynamic and
is an efficient solution in this dynamic context and show that high-dimensional Markov Decision Process for whom a DRL
their method performs comparably or better than previous approach is used to converge to an optimal solution.
approaches. A peer-to-peer energy trading problem among microgrids
Diversely from the aforementioned works, the DRL is investigated in [34]. In this work a multi-agent deep deter-
approach proposed in this paper does not rely on a specific ministic policy gradient, based on energy trading algorithm,
class of application, but it is intended to be an abstract frame- is proposed to enable each microgrid to maximize its own
work combined with modern container based technologies, utility in a local market. Given the uncertainties and the con-
such as Docker, with the purpose to learn the behavior and straints in renewable energy and power demand, the authors
estimate the execution time of a generic software task that claim that the DRL-based approach is suitable to help each
can be packed in a container image and can be monitored by microgrid to find its optimal policy. An Ethereum Blockchain
Docker resources metrics in a competitive environment. is adopted to ensure the integrity of transaction data.
Blockchain-enabled solutions for a vast range of problems A Blockchain-enabled computing resource trading system
have been recently arising. For example, in our previous is proposed in [9]. This system takes into account pricing
works [11] and [12], a Smart Contracts-based platform is and bidding strategies to enable providers and customers
proposed for improving digital processes in a Cloud man- to trade computing resources on a safe and tamper-proof
ufacturing environment. In detail, we combine Blockchain environment. A decision-making problem in the continuous
with Docker and Cloud Storage and introduce a deep learning double auction is formulated with the goal of maximizing
approach in a task mapping framework. In [29], a blockchain- each participant payoffs, while a DRL approach is adopted
based two-stage secure spectrum intelligent sensing and shar- to help them building their optimal bidding strategies. The
ing auction mechanism for mobile devices is designed in authors conduct extensive simulations and show that their
a consortium blockchain to guarantee secure and efficient scheme outperforms other existing methods.
spectrum auction with low complexity. In [30], the authors Finally, in [35], a DRL approach is used to solve a joint
introduce a novel scalable and multi-layer blockchain-based optimization problem to enhance adaptivity and scalability in
energy trading framework for cooperative microgrid sys- Blockchain environments. The proposed approach considers
tems that considers the issue of of perceiving the sta- the optimal selection of consensus protocols and the alloca-
tus of block generation over temporary network disruption tion of computation and bandwidth resources. The authors
and improves the consensus and the reliability of energy show through extensive simulations the effectiveness of their
trading. scheme.
Moreover, several recent works have been proposed
addressing security, data integrity and optimization problems IV. SYSTEM MODEL
with a combination of Blockchain and DRL. For instance, In this section, we introduce a new system that combines
in [31], a reliable data collection and a secure sharing a novel runtime estimation algorithm based on DRL in a
scheme for smart mobile terminals are proposed. The authors competitive task assignment framework safely managed by

VOLUME 11, 2023 48239


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

FIGURE 1. System architecture.

a Hyperledger-based Enterprise Blockchain. The proposed the strategy and the task details. Considering the used strat-
architecture is depicted in Fig. 1. The system is made up egy, we define two alternative options: time-sensitive and
of three layers: the Blockchain layer in the middle, and the price-sensitive. In the first case, we assume that the client
Agent and Client layers on the sides. objective is to complete the task in the least time possi-
ble, regardless of the charged fee. In the second case, the
A. BLOCKCHAIN LAYER requirement is to pay the lowest fee, regardless of the exe-
Let us consider a set of pre-authorized organizations O = cution time. In order to specify task details, we define the set
{o1 , o2 , on , . . . , o|O| }, called providers, that constitute the T = {t1 , . . . , tk , . . . , t|T | }, where tk represents a generic task
nodes of the permissioned Hyperledger Blockchain. An orga- that can be requested by the clients with an arbitrary input
nization in Hyperledger Fabric is simply a firm that decides parameter.
to join a network and that is authorized by other existing Now, each request ri,j can be defined by a triple ri,j =
members. Note that symbol |A| denotes the cardinality of the ⟨gi,j , ti,j , pi,j ⟩, where gi,j ∈ [0, 1] is the strategy, with 0 repre-
generic set A. Moreover, each provider on delivers a set of senting time-sensitive and 1 price-sensitive, ti,j = tk ∈ T is
peers Pn = {pn,1 , . . . , pn,i , . . . , pn,|Pn | } joining the network, the required task and pi,j is the input argument.
where pn,i is the i-th peer of provider on . Each provider on
has one Certification Authority CAn to provide and renew C. AGENT LAYER
certificates for Agents, Clients and other components. All Each request ri,j submitted by a client requires an Agent to
peers in the set Pn hold a copy of the chaincode CH and of execute the associated task. In this fully distributed environ-
the ledger L. The chaincode CH is the main Smart Contract ment, we define the set of agents A = {a1 , . . . , am , . . . , a|A| },
that regulates the requests of the clients, the bids received by where am is a node that joins the network and provides
the agents and coordinates the task assignment process. The its computational resources, such as CPU and memory,
ledger L stores the current state of the tasks and the related to the system. For the sake of generalization, we use Docker
transactions. Finally, the Orderer node OD is responsible for container-based technology to embed the tasks in self-
packaging transactions into blocks and distributing them to dependent images. In this way, since everything is packed in
the peers in the sets Pn across the network. the Docker image, the nodes do not have any constraint in
terms of operating system and dependencies.
B. CLIENT LAYER Now, at each request ri,j , we associate an agent am ∈ A that
In the proposed system, we define the set of clients C = is denoted ai,j . Such an agent is paid a variable fee upon suc-
{c1 , . . . , , ci , . . . , c|C| }, where ci implements a step of a cessful execution of the related task. Moreover, since for each
generic process flow, for which it is required to run com- agent am we allow the execution of concurrent processes, the
putationally intensive tasks. Since ci may not have enough execution time strictly depends on its current load state.
resources to run tasks locally, it interacts with the Blockchain As illustrated in section IV-B, the client can choose
to submit execution requests. For each successful task execu- between time-sensitive and price-sensitive strategies. More
tion, a variable fee is charged. specifically, when the former strategy is chosen, the client
For each ci , we define the set goal is to minimize the run-time of its processes and thus
each agent am is required to provide a reliable prediction of
Ri = {ri,1 , . . . , ri,j , . . . , ri,|Ri | }
the execution time for each request rn,i based on its current
where ri,j represents the j-th execution request of client ci . state. Conversely, in case the latter strategy is selected, only
In addition, for each request ri,j , the client ci has to specify the proposed price is relevant for the assignment. The main

48240 VOLUME 11, 2023


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

Algorithm 1 Agent Selection Algorithm


Input: A Client Request ri,j = ⟨gi,j , ti,j , pi,j ⟩
Output: Winner Agent ai,j ∈ A
Step 1:
foreach am ∈ A do
Forward request ri,j to Agent am ;
Collect bid bi,j,m = ⟨eti,j,m , pri,j,m ⟩;
B ← B ∪ {bi,j,m };
end
Step 2:
if gi,j is time-sensitive then
Sort set B by estimated time eti,j,m ;
else if s is price-sensitive then
Sort set B by bid price pri,j,m ;
Step 3:
Set h ← 1;
while B ̸= ∅ do
Take bid bi,j,m (h) ∈ B;
if Agent am is available then
Agent am is the winner;
ai,j ← am ;
return ai,j ;
FIGURE 2. Agent selection UML sequence diagram. else
B ← B \ {bi,j,m };
Set h = h + 1;
scope of the price-sensitive option is to allow an agent that end
has recently joined the network and it is not able to provide a
reliable execution time prediction to compete in price rather
than in time and collect new experiences to progressively
improve its predictions. proposed price. All received bids are collected in the set B =
{bi,j,1 , . . . , bi,j,m , . . . , bi,j,|A| }. In Step 2, given the strategy
V. DRL-BASED TASK ASSIGNMENT PROCESS gi,j in ri,j , the members of B are sorted by et i,j,m or pr i,j,m
In this section, we introduce a competitive task assignment accordingly. Finally, in Step 3, the first available agent am
process by expected runtime prediction that implements associated to bi,j,m in the sorted rank is elected as the winner
the two client strategies: time-sensitive and price-sensitive. and therefore ai,j = am .
The agents leverage on a novel DRL approach to provide The estimated time et i,j,m leverages on the DRL approach
predictions. that is described in Section V-B. On the contrary, the bid price
pri,j,m is determined autonomously by the agent to compete
A. TASK ASSIGNMENT PROCESS in the current race.
After a request ri,j has been successfully assigned, the
The main process is designed in the chaincode CH and is
winner agent ai,j executes the task ti,j with the parameter pi,j
depicted in the UML sequence diagram in Fig. 2. Both the
and notifies the completion to the chaincode CH . Finally,
clients C and the agents A continuously interact with the
the instance metrics, such as the effective runtime, are stored
Blockchain for requests and tasks management. More specif-
in the tamper-proof Blockchain and the client ci is charged
ically, in the first part of the diagram, the agent selection is
according to the offset between the actual execution time and
performed and in the second part, post-assignment actions are
the prediction.
implemented.
The agent selection process is detailed in Algorithm 1,
which is implemented in the chaincode in GO [36] program- B. DRL APPROACH FOR RUNNING-TIME PREDICTIONS
ming language. The input is the client request ri,j and the In this section, we propose a DRL approach for task runtime
output is the winner agent ai,j ∈ A. prediction. In particular, the Deep-Q-Network (DQN) algo-
In Step 1, a request ri,j for a bi,j,m is sent to each avail- rithm proposed in [10] is applied to the considered problem.
able agent am ∈ A, where bi,j,m is the bid of agent m The triple (S, A, R) defines a RL deterministic model, where
for the request j of client ci . In addition, bi,j,m is defined S is the State Space, A is the Action Space and R is the
as bi,j,m = ⟨eti,j,m , pri,j,m ⟩ where et i,j,m is the estimated Reward. In the considered application the triple (S, A, R) is
completion time provided by the agent am and pr i,j,m is the specified in the following.

VOLUME 11, 2023 48241


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

1) State Space (S): We assume that each agent is able TABLE 1. Node configuration features.
to run a fixed number k of concurrent tasks as indi-
vidual containers. As a consequence, execution time is
potentially impacted by three combined parameters to
be considered on a new incoming request ri,j :

I) the type of task ti,j ∈ T , its associated input


parameter pi,j and submission time tsi,j ;
II) total available resources for each agent am : TABLE 2. Resources usage metrics.

cpu_speed m , number of cores n_coresm , maxi-


mum memory max_memorym , hdd_typem (rota-
tional or high-speed SSD), available network
bandwidth bwm ;
III) current resources consumption of previous k −
1 processes running on am : cpu_timem,n , memm,n ,
net_i/om,n , block_i/om,n , number of processes
n_pidsm,n where n = 1, 2, . . . , k − 1.

Now, we introduce the following three tuples:

I) sm,1 = ⟨ti,j , pi,j , tsi,j ⟩;


II) sm,2 = ⟨cpu_speed m , n_coresm , max_memorym ,
hdd_typem ⟩; In detail, RDi,j,m is defined by the following formula:
III) sm,3 = ⟨cpu_timem,n , memm,n , net_i/om,n ,  at
i,j,m
block_i/om,n , n_pidsm,n ⟩, where n = 1, 2, . . . ,  at i,j,m ≤ et i,j,m
et

k − 1. RDi,j,m = TK max et i,j,m i,j,m
(1)

 otherwise.
Moreover, we define Sm = {sm,1 , sm,2 , sm,3 } that rep- at i,j,m
resents the current state of agent am in the proposed The adopted DQN scheme is described by Algorithm 2.
DQN framework. The structure of sm,2 reflects the Firstly, according to the approach in [10], a number of
current node configuration. Those features are assumed episodes E for training and the condition for a state Sm to be
to be static and are described in Table 1. In addition, terminal are defined. In the proposed approach, the number of
sm,3 denotes the current running containers resources episodes is arbitrary and is strictly related to the number and
usage and is continuously updated. Since in the pro- types of tasks in the set T . Basically, it should be as long as to
posed system, each task is represented by a Docker guarantee an accurate prediction of execution runtime eti,j,m
image and, consequently, each instance is essentially in every agent load condition. Similarly, a maximum number
a container based on that image, we can leverage of iterations I is set after which the state Sm is terminal and
on live metrics exposed by Docker to collect live single episode is completed.
resources usage data. In Table 2, all those metrics are Step 1 initializes the replay memory set D with capacity N
described. and the Q network that will be trained.
2) Action Space (A): Given the current state of the can- Moreover, in Step 2, for each iteration of each episode,
didate agent am , the target is to predict the expected whenever a new request ri,j is submitted to agent am , a new
runtime in seconds for the incoming client request state Sm is built and a value for et i,j,m is randomly selected
ri,j . In the proposed DRL approach, the bid estimated or predicted with probability ϵ. Subsequently, price pri,j,m
value et i,j,m constitutes the action. In order to ease the is arbitrarily determined by agent am and the bid bi,j,m is
training, we consider a discrete set of execution times definitely set. As shown in Fig. 2, bi,j,m is sent to the Smart
in steps of 1 second with a predefined upper bound Contract on which the Algorithm 1 is executed after all bids
value that constitutes a global time out for each instance from all agents are collected.
processed by the system. In Step 3, if the agent am is the winner, ti,j is run locally
3) Reward (R): Once task ti,j has been executed, each and the actual execution time ati,j,m is observed from Docker
state Sm and action et i,j,m produce the actual elapsed metrics to calculate the step reward RDi,j,m . At the same time,
time as observation and contribute to the reward by a new training step begins.
In Step 4, on next incoming request ri,j ′ , the future state
assessing the level of prediction accuracy. Thus, if we
set a constant token TK max as a maximum reward for ′
Sm is determined. Then, the whole transition trh is stored in
each successful calculation, we can calibrate the reward experience replay memory set D. Subsequently, M transitions
value RDi,j,m by considering the relative error between are randomly selected from set D in the subset Dh . Then,
actual execution time at i,j,m and estimated time et i,j,m . a new value yn that considers maximum future reward as

48242 VOLUME 11, 2023


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

Algorithm 2 Deep Q Learning TABLE 3. Tasks details.

Step 1:
Initialize replay memory set D with capacity N
Initialize Q(Sm , ebt) arbitrarily
Step 2:
Set e = 1
while e ≤ E do
Set î = 1
while î ≤ I do
Wait for request ri,j in order to get new θ for next iteration h + 1. The target
With probability ϵ select a random action network is updated every w iterations to stabilize learning,
ebt where w must be preliminary assigned.
otherwise select ebt = max Q(s, ebt; θ)
ebt VI. PERFORMANCE EVALUATION
Set eti,j,m = ebt
In this section, we evaluate the performance of the proposed
Set arbitrary price pri,j,m
task runtime prediction DQN driven algorithm in a simulated
Set bi,j,m = ⟨eti,j,m , pri,j,m ⟩
environment.
Send bi,j,m to Smart Contract and wait for
Algorithm 1
A. SIMULATION SETTINGS
Step 3:
if agent am is the winner then In the proposed case study, we identify three common soft-
Take action et i,j,m = ebt, observe ware algorithms with different complexities to evaluate the
RDi,j,m , Sm′ proposed system, therefore we set T = {t1 , t2 , t3 }. We code
Step 4: the tasks using Python language in a single Docker image
Wait for next request ri,j ′ that requires two parameters on launch: task ti,j ∈ T and
Store transition parameter pi,j . In the considered case, the value of pi,j is
trh = ⟨Sm , et i,j,m , RDi,j,m , Sm′ ⟩ in D restricted to the members of the set P = {1, . . . , 5} with
Sample random minibatch of M pi,j ∈ P.
transitions In particular, we consider the following three well known
Dh = {tb1 , tb2 , tbn , . . . , tbM } where algorithms:
tbn ∈ D 1) Standard Array Sorting (t1 ): it builds a random big
foreach tbn ∈ Dh do Python integer list whose number of elements is based
Set
 yn ← on pi,j . After building the list, the sort method is called
 RDi,j,m,n that implements the Timsort algorithm [37]. This algo-


 ′


 for terminal Sm,n rithm has a runtime complexity of O(n log n) in the

RDi,j,m,n + worst case;


 γ max Q(Sm,n ′
, et ′ i,j,m ; θ) 2) Fast Array Sorting (t2 ): instead of constructing a


 et i,j,m list, a random NumPy [38] integer array is built that
 for non-terminal S ′

also implements the sort method. However, in NumPy
m,n
end library the quicksort algorithm [39] is adopted which
Step 5: has a runtime complexity of O(n2 ) in the worst case;
Perform gradient descent step on 3) Dijkstra Shortest Path Search (t3 ): The Dijkstra’s
(yn − Q(Sm,n , et i,j,m,n ; θ))2 where algorithm [40] is an algorithm for finding the shortest
n = 1, 2, . . . , M paths between nodes in a graph. In this implementation,
Sm ← Sm′ firstly we build a graph with a large number of vertices
Set î = î + 1 V determined by parameter pi,j . Secondly, we find the
end shortest path from the first vertex to all other vertices.
Set e = e + 1 The Dijkstra standard implementation has a complexity
end
of O((|V | + |E|) log |V |) in the worst case.
end
The proposed algorithms are currently implemented in
scientific and industrial applications. For example, sorting
tasks are used in operations research to implement both the
predicted from target network Q and discounted by γ is Shortest Processing Time First and the Longest Processing
calculated as updated reward for each transition tbn ∈ Dh . Time First rules for optimal jobs scheduling and load bal-
Finally, in Step 5, a new Q network is trained by perform- ancing [41]. Indeed, Dijkstra’s algorithm is currently been
ing a gradient descent step on (yn − Q(Sm,n , et i,j,m,n ; 2))2 , used for a vast class of problems including vehicle path

VOLUME 11, 2023 48243


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

FIGURE 3. (a), (b), (c), tasks average runtime for different values of parameter pi ,j .

planning [42] and optimal packet routing in software defined


network environments [43].
In Table 3, we summarize the three tasks, their run-
time complexities and the minimum and maximum value
of elements as determined by their linear combination with
parameter pi,j .
We set maximum concurrent tasks k = 4 and we run a
random sequence of simultaneous instances on a designated
agent. On each new submitted instance, we collect current
Docker metrics from the node as described in section V-B.
As agent, we use a t2.medium cloud instance from Amazon
AWS equipped with 2x Intel Xeon 3.3Ghz vCPU and 4GB FIGURE 4. Deep neural network for Q-function.

of memory. Since the number of available CPUs is less


than k, we simulate an elevated competition between tasks The deep neural network approximating the Q-function is
during runtime and therefore we observe high variability in depicted in Fig. 4.
execution time, which is typical of real cloud environments. To evaluate the performance of our DRL strategy, we use
In Fig. 3, the average task ti,j runtime, observed for each a t3.2xlarge cloud instance from Amazon AWS equipped with
pi,j value over a total of 9995 metrics collected in two days 8 x Intel Xeon 3.1Ghz vCPU and 32GB of memory.
simulation, is depicted.
We use the metrics set for testing the performance of the B. PERFORMANCE COMPARISON
DRL approach described in Section V. In particular, on each This section reports a performance comparison between two
iteration a new state from current item is built, then either different scenarios tested in the performed experiment. The
a random time is selected or the expected runtime is pre- first scenario consists of 5000 episodes of 150 iterations each,
dicted from current Q-function according to the exploration whereas the second scenario consists of 3000 episodes of
vs. exploitation probability. Finally, the reward is calculated 300 iterations each. Since we are using a DRL approach,
against the actual runtime as described in 1. The current we have two more critical hyper-parameters to consider:
transition is stored in experience replay, then the rest of the 1) Discount Factor γ : in DRL environments, setting the
steps in Algorithm 2 update the current Q-function. discount factor is part of the problem [45]. The discount
In this experiment, the Q-function is approximated by a factor essentially determines how much the RL agent
dense DNN made up of three layers: cares about rewards in the distant future relative to
1) The input layer represents the state and consists of those in the immediate future. If γ = 0, the agent will
21 neurons. In this layer, we combine some elements of be completely myopic and only learn about actions that
the state to reduce the dimension and we use One Hot produce an immediate reward. If γ = 1, the agent will
Encoding techniques [44] to represent some categorical evaluate each of its actions based on the sum total of
variables, such as the tasks IDs. all of its future rewards. In the simulations, we contin-
2) The intermediate layer consists of 70 neurons, as the uously feed the system with tasks and we are interested
average between input and output dimensions in evaluating how the DRL approach performs with
3) The output layer represents the discretized action space different values of γ . Therefore, we test the algorithm
and consists of 120 neurons, where 1 second is the for both scenarios with γ = 0.2, γ = 0.5 and γ = 0.8.
minimum expected runtime and 120 seconds is the last 2) Exploration vs. Exploitation Probability ϵ: in RL,
accepted value before raising a timeout error. exploration means that the agent explores randomly

48244 VOLUME 11, 2023


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

FIGURE 5. First scenario performance evaluated for different values of γ : FIGURE 6. Second scenario performance evaluated for different values of
(a) Average reward over episodes, (b) Runtime prediction accuracy γ : (a) Average reward over episodes, (b) Runtime prediction accuracy
summary statistics. summary statistics.

the whole action space to improve its knowledge about Figures 5a and 6a compare the average values CR
each action for a long-term benefit. On the other hand, vs. episodes for all three considered discount factors
exploitation means that the agent uses only its current in both scenarios. For all values of γ , as the agent
knowledge to get the most reward. In Algorithm 2, the explores new actions and trains its Q-function, the
choice between exploration and exploitation is made CR value increases almost linearly and results in 46%
with a probability coefficient ϵ that usually varies performance improvement at the end of the training,
at each episode. Normally, this value starts from 1, compared to the first exploration-only episode. More-
meaning that, since the agent doesn’t know anything over, though γ = 0.2 appears to learn faster than
at the beginning about the actions, it must explore all γ = 0.5 and γ = 0.8, in the end the performance of
available actions for each state. Successively, it slowly the highest discount factors outperforms the smallest
decays over future episodes until the training end, when one. In the last part of the training, the value of ϵ
it is very close to 0, meaning that it fully levarages on becomes very small and let the algorithm leverage only
its knowledge. For the performed evaluations, we start on exploitation for task runtime prediction. In this case,
from ϵ = 1 and we implement a linear decaying strat- the comparison shows that for higher values of γ , the
egy over episodes till a minimum value of ϵ = 0.001. accuracy of the Q-function slightly improves proving
Considering the aforementioned hyper-parameters, we are that the proposed DRL approach is able to catch a
interested in comparing the following two metrics for both sort of correlation between successive submitted tasks.
scenarios. There are no major differences between the perfor-
1) Cumulative Reward CR is defined for single episode mance of the two scenarios. It can only be observed that
of n steps as: in the second scenario, the learning speed for γ = 0.8 is
X higher than in the first scenario, compared to other val-
CR = R(n); CR ∈ [0, Cmax n]. (2) ues of γ . This metric can be influenced by the different
n number of steps in the episode and the resulting value

VOLUME 11, 2023 48245


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

of exploration vs. exploitation probability coefficient REFERENCES


ϵ in each step. In fact, in the first scenario, ϵ decreases [1] Public Cloud IaaS Market Worldwide 2023, Gartner, USA, Sep. 2022.
linearly every 150 steps, whereas in the second scenario [Online]. Available: https://www.statista.com/statistics/505251/
every 300 steps. As a consequence, although the evalu- worldwide-infrastructure-as-a-service-revenue/
[2] J. Flores-Contreras, H. A. Duran-Limon, A. Chavoya, and
ated performance shows that a higher number of steps S. H. Almanza-Ruiz, ‘‘Performance prediction of parallel applications:
per episode improves the algorithm, it is evident that A systematic literature review,’’ J. Supercomput., vol. 77, no. 4,
episode length also affects the training time. However, pp. 4014–4055, Apr. 2021.
[3] A. Mohammed, A. Eleliemy, F. M. Ciorba, F. Kasielke, and I. Banicescu,
the simulations point out that different values of γ and ‘‘An approach for realistically simulating the performance of scientific
ϵ may help in minimizing total computational effort. applications on high performance computing systems,’’ Future Gener.
2) Runtime Prediction Accuracy is measured in the last Comput. Syst., vol. 111, pp. 617–633, Oct. 2020.
[4] Y. Cho, S. Oh, and B. Egger, ‘‘Performance modeling of parallel loops
training step, in which the algorithm only exploits the on multi-socket platforms using queueing systems,’’ IEEE Trans. Parallel
approximated Q-function. We set Cmax = 10 as highest Distrib. Syst., vol. 31, no. 2, pp. 318–331, Feb. 2020.
step reward in case of exact runtime prediction, oth- [5] P. Altenbernd, J. Gustafsson, B. Lisper, and F. Stappert, ‘‘Early execution
time-estimation through automatically generated timing models,’’ Real-
erwise the value is determined in a linear fashion as Time Syst., vol. 52, no. 6, pp. 731–760, Nov. 2016.
defined in (1). Figures 5b and 6b show summary step [6] T. Doan and J. Kalita, ‘‘Predicting run time of classification algo-
reward statistics for both scenarios and for different rithms using meta-learning,’’ Int. J. Mach. Learn. Cybern., vol. 8, no. 6,
pp. 1929–1943, Dec. 2017.
values of discount factor γ . As for the previous met- [7] Y.-C. Wang, J. Chou, and I.-H. Chung, ‘‘A deep reinforcement learning
ric, in all the cases a similar performance is observed method for solving task mapping problems with dynamic traffic on parallel
among the compared schemes, having γ = 0.8 case systems,’’ in Proc. Int. Conf. High Perform. Comput. Asia–Pacific Region,
Jan. 2021, pp. 1–10.
the best median x̃ = 6.30 and x̃ = 6.89 for the first [8] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, ‘‘Resource manage-
and second scenario, respectively. The latter case with ment with deep reinforcement learning,’’ in Proc. 15th ACM Workshop Hot
γ = 0.8 shows the overall best performance, having Topics Netw., Nov. 2016, pp. 50–56.
[9] Z. Xie, R. Wu, M. Hu, and H. Tian, ‘‘Blockchain-enabled computing
the 50% of the steps a prediction accuracy ≥ ≈69%, resource trading: A deep reinforcement learning approach,’’ in Proc. IEEE
whereas only a residual 25% of the steps show an Wireless Commun. Netw. Conf. (WCNC), May 2020, pp. 1–8.
accuracy ≤ ≈49%. Finally, the upper quartile shows [10] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,
D. Wierstra, and M. A. Riedmiller, ‘‘Playing Atari with deep reinforcement
a prediction accuracy ≥ ≈82% for 25% of the steps. learning,’’ CoRR, vol. abs/1312.5602, pp. 1–9, Dec. 2013.
[11] G. Volpe, A. M. Mangini, and M. P. Fanti, ‘‘An architecture combining
Given the dimensions of state space and action space, blockchain, Docker and cloud storage for improving digital processes in
the evaluated performance is acceptable and confirms the cloud manufacturing,’’ IEEE Access, vol. 10, pp. 79141–79151, 2022.
[12] G. Volpe, A. M. Mangini, and M. P. Fanti, ‘‘An architecture for digital
effectiveness of the proposed DRL approach. processes in manufacturing with blockchain, Docker and cloud storage,’’ in
Proc. IEEE 17th Int. Conf. Autom. Sci. Eng. (CASE), Aug. 2021, pp. 39–44.
[13] (Mar. 2023). Intro to Ethereum. Accessed: Mar. 29, 2023. [Online]. Avail-
VII. CONCLUSION able: https://ethereum.org
In this paper, we introduce a Hyperledger Fabric Blockchain- [14] F. Casino, T. K. Dasaklis, and C. Patsakis, ‘‘A systematic literature review
of blockchain-based applications: Current status, classification and open
based resources trading platform to orchestrate clients issues,’’ Telematics Informat., vol. 36, pp. 55–81, Mar. 2019.
requests, agents bids and provide a solution to a common task [15] K. Christidis and M. Devetsikiotis, ‘‘Blockchains and smart contracts for
assignment problem. A double bidding strategy is proposed the Internet of Things,’’ IEEE Access, vol. 4, pp. 2292–2303, 2016.
[16] G.-T. Nguyen and K. Kim, ‘‘A survey about consensus algorithms used in
that enables client to choose either the least time among blockchain,’’ J. Inf. Process. Syst., vol. 14, no. 1, pp. 101–128, Feb. 2018.
estimations provided by the agents or the cheapest price. [17] R. Koulu, ‘‘Blockchains and online dispute resolution: Smart contracts
To support node selection and task assignment, we propose as an alternative to enforcement,’’ SCRIPTed, vol. 13, no. 1, pp. 40–69,
May 2016.
a model-free DRL framework for task runtime estimation [18] T. Hewa, M. Ylianttila, and M. Liyanage, ‘‘Survey on blockchain based
in agent current load state. In the proposed algorithm, both smart contracts: Applications, opportunities and challenges,’’ J. Netw.
the submitted and the existing tasks are represented by only Comput. Appl., vol. 177, Mar. 2021, Art. no. 102857.
[19] C. Anderson, ‘‘Docker [software engineering],’’ IEEE Softw., vol. 32, no. 3,
two parameters combined with some resources usage metrics pp. 102, May 2015.
collected at runtime. As the agent receives new tasks requests, [20] (Aug. 2019). CouchDB: The Definitive Guide. Accessed: Dec. 22, 2021.
it incrementally learns how to do predictions in an online [Online]. Available: http://guide.couchdb.org/editions/1/en/index.html
[21] D. Ongaro and J. Ousterhout, ‘‘In search of an understandable consen-
fashion. sus algorithm,’’ in Proc. USENIX Annu. Tech. Conf. (USENIX ATC).
Simulations show that DRL is suitable for task runtime Philadelphia, PA, USA: USENIX Association, Jun. 2014, pp. 305–319.
prediction and provides similar or better performance, com- [22] C. J. C. H. Watkins and P. Dayan, ‘‘Q-learning,’’ Mach. Learn., vol. 8,
nos. 3–4, pp. 279–292, 1992.
pared to other more complex existing solutions. Moreover, [23] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver,
it is important to enlighten that the proposed approach is not and D. Wierstra, ‘‘Continuous control with deep reinforcement learning,’’
tied to a specific application and can be adopted on any type 2019, arXiv:1509.02971.
[24] A. Matsunaga and J. A. B. Fortes, ‘‘On the use of machine learning
of tasks in a modern cloud environment. to predict the time and resources consumed by applications,’’ in Proc.
In future work, we plan to integrate this solution on Kuber- 10th IEEE/ACM Int. Conf. Cluster, Cloud Grid Comput., May 2010,
netes, an open-source container-orchestration system, and pp. 495–504.
[25] D. A. Monge, M. Holec, F. Železný, and C. G. Garino, ‘‘Ensemble learning
evaluate the performance of the DRL approach for workload of runtime prediction models for gene-expression analysis workflows,’’
operations such as deployment and scaling. Cluster Comput., vol. 18, no. 4, pp. 1317–1329, Dec. 2015.

48246 VOLUME 11, 2023


G. Volpe et al.: DRL Approach for Competitive Task Assignment in Enterprise Blockchain

[26] M. H. Hilman, M. A. Rodriguez, and R. Buyya, ‘‘Task runtime prediction GAETANO VOLPE received the M.Sc. degree
in scientific workflows using an online incremental learning approach,’’ in in computer and automation engineering from
Proc. IEEE/ACM 11th Int. Conf. Utility Cloud Comput. (UCC), Dec. 2018, eCampus University, Novedrate, Italy, in 2020.
pp. 93–102. He is currently pursuing the Ph.D. degree with
[27] T. Pham, J. J. Durillo, and T. Fahringer, ‘‘Predicting workflow task execu- the Laboratory of Control and Automation,
tion time in the cloud using a two-stage machine learning approach,’’ IEEE Polytechnic University of Bari. He is also an inde-
Trans. Cloud Comput., vol. 8, no. 1, pp. 256–268, Jan. 2020. pendent consultant for private companies in the
[28] L. Jiang, H. Huang, and Z. Ding, ‘‘Path planning for intelligent robots
field of cybersecurity, IT infrastructures, and soft-
based on deep Q-learning with experience replay and heuristic knowl-
edge,’’ IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 1179–1189,
ware architectures. His research interests include
Jul. 2020. cybersecurity topics, blockchains in manufactur-
[29] R. Zhu, H. Liu, L. Liu, X. Liu, W. Hu, and B. Yuan, ‘‘A blockchain- ing environments, discrete-event systems, and petri nets.
based two-stage secure spectrum intelligent sensing and sharing auction
mechanism,’’ IEEE Trans. Ind. Informat., vol. 18, no. 4, pp. 2773–2783,
Apr. 2022.
[30] H. Huang, W. Miao, Z. Li, J. Tian, C. Wang, and G. Min, ‘‘Enabling energy
AGOSTINO MARCELLO MANGINI (Senior
trading in cooperative microgrids: A scalable blockchain-based approach
Member, IEEE) received the Laurea degree in
with redundant data exchange,’’ IEEE Trans. Ind. Informat., vol. 18, no. 10,
pp. 7077–7085, Oct. 2022.
electronics engineering and the Ph.D. degree in
[31] C. H. Liu, Q. Lin, and S. Wen, ‘‘Blockchain-enabled data collection and electrical engineering from the Polytechnic Uni-
sharing for industrial IoT with deep reinforcement learning,’’ IEEE Trans. versity of Bari, Bari, Italy, in 2003 and 2008,
Ind. Informat., vol. 15, no. 6, pp. 3516–3526, Jun. 2019. respectively. He has been a Visiting Scholar with
[32] W. Zheng, W. Wang, G. Wu, C. Xue, and Y. Wei, ‘‘Fog computing enabled the University of Zaragoza, Zaragoza, Spain. He is
smart grid blockchain architecture and performance optimization with currently an Associate Professor with the Depart-
DRL approach,’’ in Proc. IEEE 8th Int. Conf. Comput. Sci. Netw. Technol. ment of Electrical and Information Engineering,
(ICCSNT), Nov. 2020, pp. 41–45. Polytechnic University of Bari. He has authored
[33] L. Yang, M. Li, P. Si, R. Yang, E. Sun, and Y. Zhang, ‘‘Energy-efficient or coauthored over 90 printed publications. His current research interests
resource allocation for blockchain-enabled industrial Internet of Things include modeling, simulation, control of discrete-event systems, petri nets,
with deep reinforcement learning,’’ IEEE Internet Things J., vol. 8, no. 4, supply chains and urban traffic networks, distribution and internal logistics,
pp. 2318–2329, Feb. 2021. the management of hazardous materials, the management of drug distribution
[34] Y. Xu, L. Yu, G. Bi, M. Zhang, and C. Shen, ‘‘Deep reinforcement learning systems, and healthcare systems. He was on the Program Committees of
and blockchain for peer-to-peer energy trading among microgrids,’’ in
the 20072015 IEEE International SMC Conference on Systems, Man, and
Proc. Int. Conf. Internet Things (iThings) IEEE Green Comput. Commun.
Cybernetics and the 2009 IFACWorkshop on Dependable Control of Discrete
(GreenCom) IEEE Cyber, Phys. Social Comput. (CPSCom) IEEE Smart
Data (SmartData) IEEE Congr. Cybermatics (Cybermatics), Nov. 2020, Systems. He was on the Editorial Board of the 2017 IEEE Conference on
pp. 360–365. Automation Science and Engineering.
[35] C. Qiu, X. Ren, Y. Cao, and T. Mai, ‘‘Deep reinforcement learning empow-
ered adaptivity for future blockchain networks,’’ IEEE Open J. Comput.
Soc., vol. 2, pp. 99–105, 2021.
[36] A. A. Donovan and B. W. Kernighan, The Go Programming Language, MARIA PIA FANTI (Fellow, IEEE) received
1st ed. Reading, MA, USA: Addison-Wesley, 2015. the Laurea degree in electronic engineering from
[37] N. Auger, V. Jugé, C. Nicaud, and C. Pivoteau, ‘‘On the worst-case com- the University of Pisa, Pisa, Italy, in 1983. She
plexity of TimSort,’’ in Proc. 26th Annu. Eur. Symp. Algorithms (ESA), was a Visiting Researcher with the Rensselaer
vol. 112, Y. Azar, H. Bast, and G. Herman, Eds. Dagstuhl, Germany:
Polytechnic Institute of Troy, New York, in 1999.
Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018, pp. 4:1–4:13.
Since 1983, she has been with the Depart-
[38] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers,
P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, and ment of Electrical and Information Engineering,
R. Kern, ‘‘Array programming with NumPy,’’ Nature, vol. 585, no. 7825, Polytechnic University of Bari, Italy, where she is
pp. 357–362, Sep. 2020. currently a Full Professor in system and control
[39] C. A. Hoare, ‘‘Quicksort,’’ Comput. J., vol. 5, no. 1, pp. 10–16, 1962. engineering and the Chair of the Laboratory of
[40] E. W. Dijkstra, ‘‘A note on two problems in connexion with graphs,’’ Automation and Control. Her research interests include management and the
Numerische Math., vol. 1, no. 1, pp. 269–271, Dec. 1959. modeling of complex systems, such as transportation, logistics and manufac-
[41] R. Sedgewick and K. Wayne, Algorithms, 4th ed. Reading, MA, USA: turing systems, discrete event systems, petri nets, consensus protocols, and
Addison-Wesley, 2011. fault detection. She has published more than 315 papers and two textbooks
[42] D.-D. Zhu and J.-Q. Sun, ‘‘A new algorithm based on Dijkstra for vehicle on her research topics. She was a Senior Editor of the IEEE TRANSACTIONS
path planning considering intersection attribute,’’ IEEE Access, vol. 9, ON AUTOMATION SCIENCE AND ENGINEERING and an Associate Editor of the
pp. 19761–19775, 2021. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS. She was a
[43] A. Buzachis, A. Celesti, A. Galletta, J. Wan, and M. Fazio, ‘‘Evaluating Member at Large of the Board of Governors of the IEEE Systems, Man, and
an application aware distributed Dijkstra shortest path algorithm in hybrid
Cybernetics Society. She is currently a member of the AdCom of the IEEE
cloud/edge environments,’’ IEEE Trans. Sustain. Comput., vol. 7, no. 2,
Robotics and Automaton Society and the Chair of the Technical Committee
pp. 289–298, Apr. 2022.
[44] D. Harris and S. Harris, Digital Design and Computer Architecture, 2nd ed. on Automation in Logistics of the IEEE Robotics and Automation Society.
Oxford, U.K.: Morgan Kaufmann, 2012. She was the General Chair of the 2011 IEEE Conference on Automation Sci-
[45] F. S. Perotto and L. Vercouter, ‘‘Tuning the discount factor in order to ence and Engineering, the 2017 IEEE International Conference on Service
reach average optimality on deterministic MDPs,’’ in Artificial Intelligence Operations and Logistics, and Informatics and the 2019 IEEE Systems, Man,
XXXV, M. Bramer and M. Petridis, Eds. Cham, Switzerland: Springer, and Cybernetics Conference.
2018, pp. 92–105.

Open Access funding provided by ‘Politecnico di Bari’ within the CRUI CARE Agreement

VOLUME 11, 2023 48247

You might also like