Secure Data Analytics For Cloud-Integrated Internet of Things Applications
Secure Data Analytics For Cloud-Integrated Internet of Things Applications
net/publication/303532783
CITATIONS READS
37 401
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ibrahim Khalil on 03 October 2017.
Secure Data
Analytics for Cloud-
Integrated Internet of
Things Applications
Heshan Kumarage, Ibrahim Khalil, Abdulatif Alabdulatif,
Zahir Tari, and Xun Yi, RMIT University
Diagnosis Billing
Statistics
Figure 1. Application domains of cloud-integrated Internet of Things. These applications generate large volumes of data through
ubiquitous sensing. To effectively perform analytics and enable smart functionality, they require access to large storage and high
computational power that’s unavailable at the host application.
communication/transfer, and limited analytical ser- to be revolutionized by the adoption of this emerging
vices.1 The main application areas that are being paradigm of cloud-assisted IoT implementations.
disrupted through the enabling of IoT include power The integration of cloud and IoT in providing
systems and smart grid, e-health and assisted living effective service management and composition
systems, and large-scale industrial and environmen- makes it possible to
tal monitoring applications.1,2 With the advent of
constant Internet connectivity coupled with ubiqui- • deliver real-life IoT-based services in a distrib-
tous sensing, these applications are now producing uted, dynamic, and responsive manner;
vast amounts of data that have to be communicated, • provide an effective middle layer between IoT
stored, processed, and analyzed in a secure and ef- devices/objects and applications by hiding or
ficient manner to reach the targeted levels of smart abstracting the complex functionality of service
functionality envisioned through the IoT. implementation;
Cloud computing technology has also been • streamline, enhance, and innovate through
steadily growing, becoming a mature service platform communication, computation, and storage driv-
with worldwide spending upward of US$170 billion ers for IoT in achieving scalability, flexibility, in-
in 2014.3 Cloud resources provide a pervasive, con- teroperability, reliability, and efficiency.
venient, and reliable platform for high-performance
computing and storage that’s scalable and accessible Therefore, the cloud facilitates the implementation
on demand anywhere. The integration of cloud com- of complex, data-driven analytic models at low cost
puting services over large-scale IoT implementations in a dynamic and scalable manner by connecting IoT
will help achieve the required computational and sensing and data collection with powerful communi-
storage needs for effectively analyzing large amounts cation, computation, and storage. Figure 1 shows the
of generated data. This will enable smart function- main application domains that can directly benefit
ality over a wide variety of applications that are set from cloud-IoT integration.
Smart meter
Company
Services
Consumer
Consumer
Energy supplier
Power supply
Figure 2. Secure cloud data analytics for smart grid. Billing services, on-demand usage monitoring, and consumption
information statistics for IoT smart grid applications involve relatively simple mathematical operations and can be implemented
securely using lightweight homomorphic encryption such as an additive and multiplicative privacy mechanism.8
Here, we introduce three analytical services infrastructure. Compromise in the form of security
that can be implemented securely using FHE in the attacks and unauthorized data exposure can lead to
cloud for different IoT applications and propose ru- energy disruptions and/or energy theft. In addition,
dimentary frameworks and technological solutions specific information about customer behavior can be
for performing analytical tasks securely and depend- inferred through close analysis of electricity usage
ably in cloud. monitoring and can reveal industrial clients’ pro-
prietary information or the personal behavioral and
Secure Cloud-Based Billing and habits of individual clients.
Consumption Monitoring for Smart Grid Therefore, the implementation of electronic
Service provisioning in smart grid systems is a vi- billing and consumption monitoring services on the
able application for secure cloud integration us- cloud must be accompanied by strong encryption
ing an FHE-based security model. Smart grid is models that don’t reveal the data to third parties,
an area under IoT where the stated goals of smart while still allowing the necessary operations to be
functionality through ubiquitous sensing and com- performed. Specific actions required to implement
munication are set to revolutionize the power indus- these services include simple operations such as
try. It’s also one of the earliest domains where that aggregation, addition, subtraction, and multiplica-
functionality is being implemented with the wide- tion. Therefore, a lightweight FHE scheme8 can be
spread adoption of smart meters. The global smart used as opposed to a more computationally expen-
grid data volume is set to increase exponentially sive model.7
from 10,780 terabytes in 2010 to 75,200 terabytes Figure 2 gives a general overview of the entities
in 2015.10 Within this context, cloud computing can involved, communication pathways, and tasks to be
serve as a global infrastructure to implement client performed in the cloud with the necessary mathe-
services such as consumption monitoring and bill- matical operations performed on encrypted data.
ing. However, as discussed previously, the data’s We propose using Domingo-Ferrer’s FHE
security and integrity should be preserved when scheme for this application. 8 This symmetric-key
performing data analytic tasks on the public cloud encryption scheme performs the basic arithme-
Industrial monitoring L1 Ln
VM
Lm Ln Lm Ln R
L1 Ln VM VM
R Lm Ln Lm Ln
L1 Ln VM VM VM VM
Network of sensors in
Hierarchical network Hierarchical topology replicated
the Internet of Things
topology for distributed in VMs for distributed data
anomaly detection processing
Healthcare monitoring
Figure 3. Anomaly-detection services for wireless sensor network-based IoT monitoring. Large-scale WSNs can be arranged
on a hierarchical topology and detection of anomalies performed over different network tiers. Unsupervised anomaly-detection
methods can then be employed in a distributed manner over subsets of data to reduce complexity of FHE-based approaches by
replicating the hierarchical topology on cloud virtual machines.
• It provides an unsupervised means of anomaly plication concerned. Therefore, we can adapt the
detection without requiring prior knowledge earlier work13 to accommodate homomorphic en-
while being adaptable and scalable in a dynamic cryption over a distributed data-processing model
manner. in the cloud without changing the underlying data
• The granular analysis of a distributed process is analytic model.
naturally suitable for the application domain of Having a distributed data clustering approach
discrete sensor nodes that collect data in WSN- based on fuzzy c-means13 for anomaly detection
based IoT applications. requires a more powerful FHE scheme than the
• The operations in the clustering process can be scheme proposed for smart grid applications. A
broken down into simple mathematical opera- scheme such as Domingo-Ferrer’s, though light-
tions that are straightforward to implement on weight, isn’t applicable because it has limited ca-
encrypted data. pabilities and offers minor support for different
• It supports efficient application of FHE opera- mathematical operations. Therefore, we need a
tions owing to the distributed processing of sub- more complex but capable scheme that leverages a
sets of data over a hierarchy. distributed data-processing approach13 and the sig-
• Each processing node in the model deals with nificant computational power available in cloud fa-
a small subset of data, reducing computational cilities. We propose using the functions available in
costs. the HElib library (https://github.com/shaih/HElib)
• The iterative anomaly-detection process fur- in the implementation of our distributed anomaly-
ther reduces communication and computational detection model for analyzing encrypted data as a
overhead. service in cloud.
HElib is a functional software implementation of
Figure 3 gives a graphical overview of the pro- homomorphic encryption; however, it only supports
posed model for sensor data anomaly detection in operations on encrypted integers. To overcome this
the cloud using the distributed data clustering ap- limitation, we use the IEEE Standard for Floating-
proach. Figure 4 gives a more detailed view, in- Point Arithmetic (IEEE 754)14 to perform floating-
cluding the different tasks performed on different point computation domain operations in an integer
granular levels in a hierarchical topology. Anoma- computation domain, making the required data anal-
lies are evaluated at different levels in providing ysis possible with HElib. Therefore, floating-point
more fine-grained detection capability to the ap- arithmetic can be performed in a homomorphic
Figure 4. Hierarchical framework for distributed anomaly detection. Different analytic tasks such as clustering
and detection with discrete operations are performed in a distributed manner at different levels over subsets
of data.
Table 1. Accuracy and execution time variation for a float-point multiplication operation.
manner by converting the representation of float- our third key IoT application domain of focus. The
point numbers to its integer representation based on availability of an effective system to detect abnor-
the IEEE 754 standard. We performed experiments malities in patient health monitoring records as
to validate this approach; Table 1 gives some prelimi- well as to classify patients based on the shared
nary results. The table shows the accuracy over dif- similarity of their health records is extremely at-
ferent float-point levels with the associated execution tractive for many applications. Such a system will
time in seconds for a multiplication operation using help healthcare professionals make better deci-
HElib functions based on the described approach. sions regarding patient health and will improve
We used a multiplication approach as most data ana- patient well-being through quick detection of ab-
lytic operations can be reconstituted as a combina- normalities that can be referred to doctors and
tion of addition and multiplication operations, with other medical professionals. The IoT vision for e-
multiplication being the more complex to implement health and assisted living environments includes
and execute in the current context. modern body sensor networks that constantly
Thus, the HElib library is a feasible measure to monitor different physiological parameters of
encrypt the data and perform cloud-based anomaly people with chronic illnesses.1,15 This data is typi-
detection for IoT sensor network applications. The cally stored in a private database that medical pro-
high overheads can be managed by employing cloud fessionals can refer to. Secure cloud integration
resources in a distributed manner over different sub- for these applications will significantly enhance
sets of the data. the provision of a more efficient service as well as
more accurate diagnosis and abnormality detection
Secure Patient Classification and Diagnosis through an automated process implemented over
for E-Health Systems cloud resources.
Cloud-assisted patient classification and abnormal- Given that the EHRs contain sensitive informa-
ity diagnosis in electronic health records (EHRs) is tion, exposure of this data to any third party other
than the patient and relevant medical professional defines the similarity measure between a particular
must be severely restricted. Therefore, the data data point and the cluster center. The Euclidean dis-
must be encrypted prior to analysis by a third party tance can be chosen as the similarity measure be-
such as a CSP. cause it provides an effective similarity score with
Clustering algorithms are useful in classify- comparatively low computational complexity. There-
ing patients into groups based on EHR similarity. fore, a fuzzy partitioning of X is derived by the rep-
We propose using the fuzzy c-means algorithm de- resenting matrix U = [wij]. More details are available
scribed earlier.13 Several features make this algo- elsewhere.13
rithm desirable in the current context: We can use the fuzzy clustering-based anoma-
ly-detection procedure in the application of medical
• It provides fully unsupervised classification. diagnosis through the detection of abnormalities
• It provides a soft partitioning rather than a in observation data, but without the distributed
hard (fixed) partitioning of the data, which can aspects. However, even without the process be-
then be reviewed by a medical professional. ing distributed, the involved datasets will be small
(Importantly, no final decision is made and a enough to keep computational complexity low
matrix of scores is given stating the member- because abnormality detection is performed on
ship of a particular patient to a given cluster.) a per-person basis. That is, at any given instance
• It’s scalable and adaptive, so can be implement- only the data of a particular person will be subject
ed for a large number of patients using distrib- to the clustering and anomaly-detection proce-
uted cloud resources. dure. Anomalies can be evaluated by performing
• It includes an iterative process with simple ba- the same set of relevant homomorphic operations
sic mathematical operations that can be imple- used for sensor data anomaly detection on the en-
mented in a homomorphic manner. crypted EHRs that will be sent to the CSP. Given
the similarity of the process, we suggest using the
The fuzzy clustering of a multidimensional data HElib FHE system to encrypt the data and per-
space such as is available in EHR datasets can be form the analytic tasks in cloud datacenters. Fig-
explained as follows.13 For a set of observations X = ure 5 gives an overview of the process with the
[X1, X2, . . ., Xn], where each data point Xi is an m- relevant tasks and accompanying operations to be
dimensional observation, and Xi = (xi1, xi2, . . ., xim), performed in a homomorphic manner.
a group of fuzzy clusters F1, F2, . . ., Fk is a subset of
all possible fuzzy subsets of X where
ecure and efficient data-processing frameworks
• The summation of weights for a particular data are vital for cloud-assisted IoT applications. The
point add up to 1; that is, large volumes of generated data in such applications
k mandate the use of large computational and storage
∑w
j=1
ij = 1. resources for effective analytical tasks in the provi-
sioning of services. In this context, the application
• Each cluster has at least one data point with of fully homomorphic encryption schemes capable
nonzero weight, and doesn’t have any point with of performing analytic tasks on ciphertext is of vital
a weight of 1. Therefore, importance.
n The frameworks we’ve proposed can set the
0< ∑w
i=1
ij <n. path and open avenues for further research on the
nature of the data-processing models, the composi-
tion of analytics tasks, and the criteria for encryption
In this context, the fuzzy c-means algorithm mini- schemes that will be vital to achieve effective service
mizes an objective function (denoted Om), which is provisioning of cloud-assisted IoT applications.
the weighted sum of squared errors,
References
k n
1. L. Atzori, A. Iera, and G. Morabito, “The In-
∑∑
2
Om (U, F; X ) = w ijm xi − c j ,1 < m < ∞ ,
j=1 i=1
A ternet of Things: A Survey,” Computer Networks,
vol. 54, no. 15, 2010, pp. 2787–2805.
where C = (c1, c2, . . ., ck) is a vector of unknown 2. R. Roman, J. Zhou, and J. Lopez, “On the Fea-
cluster prototypes and cj ∈ . Here wij is the mem- tures and Challenges of Security and Privacy in
bership degree for data point Xi in the jth cluster. A Distributed Internet of Things,” Computer Net-
Figure 5. Abnormality detection and patient classification in electronic health records. Body sensor networks and other health
monitoring IoT applications such as assisted living environments generate large volumes of sensitive, private health records that
should be processed without any third-party exposure. Tasks such as patient classification and abnormality detection can be
performed with fully homomorphic encryption and data clustering models which involve basic mathematical operations.
works, vol. 57, no. 10, 2013, pp. 2266–2279. Encrypted Data,” Comm. ACM, vol. 53, no. 3,
3. IHS Technology, “Cloud-Related Spending by 2010, pp. 97–105.
Businesses to Triple from 2011 to 2017,” IHS, 10. S. Sudip, “Smart Grid Data Analytics Market
news release, 14 Feb. 2014 http://press.ihs.com/ to Triple by 2022,” Transparency Market
press-release/design-supply-chain/cloud-related Research, press release, 30 Nov. 2015; www
-spending-businesses-triple-2011-2017. .transparencymarketresearch.com/pressrelease/
4. A. Botta et al., “Integration of Cloud Computing smart-grid-data-analytics.htm.
and Internet of Things: A Survey,” Future Gen- 11. B. Tian et al., “Self-Healing Key Distribution
eration Computer Systems, vol. 56, Mar. 2016, Schemes for Wireless Networks: A Survey,”
pp. 684–700. Computer J., vol. 54, no. 4, 2011, pp. 549–569.
5. C. Fontaine and F. Galand, “A Survey of Homo- 12. B. Tian et al., “A Mutual-Healing Key Distribu-
morphic Encryption for Nonspecialists,” EURA- tion Scheme in Wireless Sensor Networks,” J.
SIP J. Information Security, Jan. 2007, article Network and Computer Applications, vol. 34, no.
15; http://dx.doi.org/10.1155/2007/13801. 1, 2011, pp. 80–88.
6. R.L. Rivest, A. Shamir, and L. Adleman, “A 13. H. Kumarage et al., “Distributed Anomaly De-
Method for Obtaining Digital Signatures and tection for Industrial Wireless Sensor Networks
Public-Key Cryptosystems,” Comm. ACM, vol. Based on Fuzzy Data Modelling,” J. Parallel and
21, no. 2, 1978, pp. 120–126. Distributed Computing, vol. 73, no. 6, 2013, pp.
7. C. Gentry, “Fully Homomorphic Encryption Us- 790–806.
ing Ideal Lattices,” Proc. 41st Ann. ACM Symp. 14. IEEE Std. 754-2008, Floating-Point Arithmetic,
Theory of Computing, 2009, pp. 169–178. IEEE, 2008.
8. J. Domingo-Ferrer, “A Provably Secure Additive 15. A.R.M. Forkan et al., “A Context-Aware Ap-
and Multiplicative Privacy Homomorphism*,” proach for Long-Term Behavioural Change De-
Information Security, LNCS 2433, Springer, tection and Abnormality Prediction in Ambient
2002, pp. 471–483. Assisted Living,” Pattern Recognition, vol. 48,
9. C. Gentry, “Computing Arbitrary Functions of no. 3, 2015, pp. 628–641.
Heshan Kumarage is a research associate in computer science from RMIT University. Contact him
the School of Computer Science at RMIT University, at [email protected].
Melbourne, Australia. His research interests include
distributed systems, data mining for network security, Zahir Tari is a professor in the School of Com-
information theory, and data science. Kumarage has puter Science at RMIT University, Melbourne, Aus-
a PhD in computer science from RMIT University. tralia. His research interests include core aspects of
Contact him at [email protected]. large-scale distributed systems, such as performance,
security and reliability. Tari has a PhD in artificial
Ibrahim Khalil is an associate professor in the intelligence from the University of Grenoble, France.
School of Computer Science and Information Tech- Contact him at [email protected].
nology at RMIT University, Melbourne, Australia. His
research interests include anonymous networks, qual- Xun Yi is a professor in the School of Computer Sci-
ity of service, wireless sensor networks, and remote ence at RMIT University, Australia, where he’s a mem-
healthcare. Khalil has a PhD from the University of ber of the Cyberspace and Security Group. His research
Berne, Switzerland. Contact him at ibrahim.khalil interests include privacy protection, cloud security, pri-
@rmit.edu.au. vacy preserving data mining, and applied cryptography.
Yi has a PhD in electronic engineering from Xidian
Abdulatif Alabdulatif is a PhD student in University. Contact him at [email protected].
the School of Computer Science and Information
Technology at RMIT University, Melbourne, Australia.
His research interests include cryptography techniques,
distributed systems and networks, data mining, and Selected CS articles and columns are also available
for free at http://ComputingNow.computer.org.
cloud computing. Alabdulatif has a master’s degree in
ADVERTISER INFORMATION
Northeast, Midwest, Europe, Middle East: Advertising Sales Representatives (Jobs Board)
Ann & David Schissler
Email: [email protected], [email protected]
Phone: +1 508 394 4026 Heather Buonadies
Fax: +1 508 394 1707 Email: [email protected]
Phone: +1 973 304 4123
Fax: +1 973 585 7071