IoT Big Data Analytics For Smart Homes With Fog and Cloud Computing
IoT Big Data Analytics For Smart Homes With Fog and Cloud Computing
net/publication/327424414
IoT Big Data Analytics for Smart Homes with Fog and Cloud Computing
CITATIONS READS
3 1,445
4 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Abdulsalam Yassine on 04 September 2018.
Abstract
Internet of Things (IoT) analytics is an essential mean to derive knowledge
and support applications for smart homes. Connected appliances and devices
inside the smart home produce a significant amount of data about consumers
and how they go about their daily activities. IoT analytics can aid in per-
sonalizing applications that benefit both homeowners and the ever growing
industries that need to tap into consumers profiles. This article presents a
new platform that enables innovative analytics on IoT captured data from
smart homes. We propose the use of fog nodes and cloud system to allow
data-driven services and address the challenges of complexities and resource
demands for online and offline data processing, storage, and classification
analysis. We discuss in this paper the requirements and the design compo-
nents of the system. To validate the platform and present meaningful results,
we present a case study using a dataset acquired from real smart home in
Vancouver, Canada. The results of the experiments show clearly the benefit
and practicality of the proposed platform.
Keywords: Internet of Things (IoT), Cloud Computing, Fog Computing,
Big Data Analytics, Energy Management, Smart Homes
2
deliver timely decision-making. Fog computing nodes are resource-efficient
because they are equipped with virtual machine technologies capable of con-
tinuously processing fresh IoT streams of data and transfer the processed
data to the cloud for further processing [5]. Cloud computing offers a multi-
tude of benefits such as Infrastructure as a Service (IaaS): providing access to
unlimited storage space, Platform as a Service (PaaS): potential to execute
resource-intensive applications, Software as a Service (SaaS): facilitates soft-
ware access, and Utility Services: store massive volume of data for remote
access. Fog computing play a critical role in the IoT ecosystem to support
the processing of big data for near real-time responses. Furthermore, fog
computing fundamentally processes and stores data at the edge of the cloud
system [6]. This unified architecture allows us to resolve the latency issues
pertaining to the underlying transport communication network of cloud sys-
tems which has a significant impact on time-sensitive applications [1][7] [8][9].
For the evaluation of the proposed system, we present a case study of
analyzing and processing streams of data from a smart home. The smart
home generates continuous streams of the massive data in short time inter-
vals. Processing and analyzing such data is vital for many applications (e.g.,
healthcare systems, smart grid energy management applications etc.) [10]
[11]. The main contribution made in this paper are as follows:
• Proposing a platform for IoT smart home big data analytics with fog
and cloud computing. The system design allows the processing of mas-
sive multiple smart home IoT data in distributed fog nodes, which ac-
commodate cognitive data mining algorithms that provide insight from
processed data. This approach is rather significant for many applica-
tions that require access to information for timely functional economies
of scale, where smart home operations can be cost-effectively deployed
and used.
3
• Presenting a case study of an actual smart home. We analyzed the
smart home IoT data for behavioral and predictive analytics of occu-
pants pertaining to energy consumption routines and patterns. We dis-
cussed the applications of these finding within the context of demand
response management and electricity cost reduction. These analysis
are considered among the primary functions and applications of smart
homes, which can be scaled with fog and cloud computing to an entire
smart community[[34]].
2. Related Work
Recently, several studies have proposed systems and frameworks for IoT
data analytics using various architectures involving fog and cloud computing.
In this section, we discuss these studies especially those that are representa-
tive of the state-of-the-art and close to our work.
Many researchers tackled issues closely related to our work such as those
in [14] [33][29][24][31] and [25]. For example the work in [14], focuses on pre-
dictive analytics for smart homes that need access to historical data which
must be stored in a large database that can only be provided by a cloud
system. The work in [33] investigated the smart home services for in-depth
analysis of home appliance frequent pattern usage. Specifically, the discovery
of co-utilization behavior of appliances inside smart homes. For this purpose
the authors propose a multidimensional patterns mining framework from a
large number of residential users connected to an Internet Service Provider
(ISP). The authors in [29] developed a new gateway system to automatically
integrate and configure new home-based IoT devices for seamless analytics
in cloud systems. The SLASH framework in [24] presents new approach for
smart home adaptivity and self-learning mechanisms. The idea include the
development of big data layer with an analytical engine that supports occu-
pants behavior. The work in [25] proposes an end-to-end home automation
system that supports multiple IoT protocols for data acquisition and analy-
sis. The authors claim that their system is capable of handling data coming
from city-wide deployed devices. Similar to the work in [25], a general smart
4
city paradigm is proposed in [31] for IoT big data analytics system that in-
tegrate sensors from smart homes, traffic, vehicles, surveillance sensors, etc.
using Hadoop ecosystem real-life environment.
In addition to the above mentioned studies, the authors in [40] present
real-time data analytics engine that facilitates processing of data near the
source of information. The proposed analytical engine ensures that data is
processed before it is offloaded to the central cloud system. The system
coordinates the analytics between the physical location of the IoT devices
in the vicinity leading to the creation of device-to-device analytical layer
under the cloud system. The main issue with this approach is that it adds
complexities to the system to the point that makes it practically prohibitive.
Similar to [40], the works in [15][18] and [19] address the issues of data
analytics at the edge of the cloud system, but focused on the latency problem
of processing large amount of IoT data using fog computing. The design
approach in these systems brings resources of edge computing as close as
possible to the source where data is generated. The work in [36] further
investigates this issue and develops mechanisms to estimate the latency for
cloud-fog-IoT continuum systems.
For real-time analytics of IoT data in uncontrolled environments, the work
presented in [32] proposed a general-purpose IoT framework that integrate
wireless hub nodes to support analytical reliability and assures real-time data
acquisition. The work in [35] proposed a system that runs data analytics in
a distributed fashion using fog computing, IoT devices, edge and central
servers. The main approach is to optimize the decision-making of analytics
such that all IoT devices are fairly treated and satisfied. The results of this
work show a promising solution for enhancing the utilization of fog and cloud
computing systems. To facilitate intelligence of the edge network in providing
robust analytics for IoT systems, the work in [37] outlined a new approach to
dynamically automate the transitions between the central cloud system and
its edge taking into account the various conditions and requirements of the
applications. The author in [26] proposes a general model and architecture
that ingests IoT data streams into fog computing nodes. The model addresses
the challenges of existing techniques and the shortcomings pertaining to the
essential dimensions of data analytics related to system, data, human and
optimization.
It is important to note that a platform with fog computing nodes coupled
with cloud computing offers a resource-efficient processing of IoT big data
at near realtime basis while providing insights and processed data to cloud
5
for further processing and analysis. This integrate design facilitates us to
address the latency issues of cloud system that can have a remarkable impact
on time-sensitive applications.
Our work in this paper is in line with the work presented in [25], however,
our focus is on a scalable IoT big data analytics platform with fog computing
that is capable of managing, analyzing and transforming household energy
consumption data into actionable insights. Therefore, we present a holistic
architecture that is suitable for an end-to-end analytics of IoT connected
smart homes. We discuss the validity of the architecture and the intercon-
nectivity of analytical modules. For the evaluation of the system model, we
present a case study of data streams collected from an actual smart home
in Vancouver, Canada. Our case study addresses the challenges of data an-
alytics of smart home energy consumption for smart grid applications (e.g.
Automatic Demand Response). It must be noted that this work differs from
[4][3][27] and [28]. These works do not address the IoT big data analytics
in fog and cloud computing systems, but focus on analyzing behavioral en-
ergy consumption that lead to peak hours as in [4], activity recognition for
healthcare applications [3], and prediction models [28]. This paper intro-
duces detail system requirement and component design analysis for an IoT
big data analytics platform for smart homes via fog and cloud computing.
The smart home dataset, the platform as well as the results in this paper are
completely different from our previous work.
3. Platform Overview
The fast deployment of smart homes is taking off across the world, and
it is becoming a compelling business opportunity for various industrial ap-
plications. Smart homes that are supported by IoT paradigm generate large
useful data. However, unlocking the potential of this information hinges on
the development of sophisticated big data analytics tools and platforms capa-
ble of processing, analyzing and managing these data in cost-effective ways.
In this section, we address the system requirements for the development of
IoT big data with fog and cloud computing, and we present the components
of the proposed platform.
6
functionalities, and design structures.
• Resource Distribution: Processing large amount of data generated
from household appliances and devices require cost-effective and re-
source efficient big data analytics closer to the physical system. Mining
continuous streams of IoT data should meet the timing requirements for
many smart city applications such as automatic demand response, sen-
sitive healthcare applications, safety and surveillance operations, etc.
which require predictable latency for near real-time detection and noti-
fication. Mostly, these functionalities face serious constraints when pro-
cessing data and invoking services from the back-end cloud. The prox-
imity of resources helps overcome the high-latency that is associated
with the provisioning of cloud-based services. Therefore, optimized
scheduling mechanisms are required to coordinate the tasks among fog
computing nodes and should appropriately allocate resources from the
cloud system.
7
and storage limitations, and therefore, intensive-applications are per-
formed in the cloud servers for better performance [30].
8
time constraints. The cloud system takes the heavy lifting of processing
computationally intensive application.
In the proposed model, smart homes are the source of data. Such data
typically arrive at the smart home IoT gateway from different sources in-
cluding household appliances and smart devices. The acquisition of data
is typically performed by specific IoT protocols such as machine-to-machine
(M2M)/Message Queuing Telemetry Transport (MQTT) that communicate
with smart home devices and IoT gateways. An IoT gateway acts as an
agent that mediates between the smart home and the cloud system. The IoT
gateway may also provide local processing and storage functions including au-
tonomously controlling and filtering of data streams. In the proposed model,
the IoT gateway can be used to serve multiple households while ensuring
trusted connectivity and security by enforcing policy-based access mecha-
nisms. The acquisition of data during this communication process passes
through several stages until the data rest on cloud storage devices where
further processing may be performed in future. As mentioned in section ”In-
troduction”, smart home data have the volume, velocity, and variety char-
acteristics to be considered as big data. The analytics operations include
filtering and cleaning, clustering and aggregation where each operation takes
extensive time depending on the nature of the data. The following are the
details of the platform components.
9
Figure 1: IoT Big Data Analytics with Fog Computing
10
manage the interaction and the services of the smart home. This tier
allows for context-based privacy and security configuration that sat-
isfies the occupants’ concerns. Activity recognition, event detection,
behavioral and predictive analytics are performed by the fog and cloud
computing system and reported to the smart home applications. For
example, behavioral analytics can be very effective to understand how
users go about using their appliances and derive conclusions about en-
ergy consumption, which can be used to forecast the future demand.
Activity recognition can be used to allow caregivers in healthcare appli-
cations [3] to detect abnormal behavior of patients. The applicabilities
and benefit of such analytics and services are countless.
11
Figure 2: The IoT Big Data management service is responsible for request handling,
authentication and service registration
Figure 3: The IoT Big Data integration service is responsible for smart home functional
and third party services and applications
12
system. Data arrived at the fog node are unstructured and do not have
a predefined model. During the cleaning/pre-processing process, er-
rors, redundancies, and outliers are removed to ensure consistency. In
the pre-processing stage, all IoT streams are filtered, parsed and trans-
lated into a unified data structure for further analysis. At this stage,
raw data which contains millions of high time-resolution data records
are transformed into a pre-defined resolution for each device. The fre-
quent pattern mining techniques are conducted on the data to discover
the occurrence of appliance correlation in data streams. Frequent pat-
tern mining searches for these recurring patterns in a given dataset to
determine associations and correlations among patterns of interest[39].
In clustering stage, we employ an unsupervised form of classification
which is capable of distinguishing classes of appliances learned from
the data [39]. Prediction analytics are responsible for forecasting oc-
cupants activities or use of certain devices. Visualization provides an
interactive medium for the user to discover knowledge from data to
enhance the decision-making process. Finally, the results of the stages
mentioned above are sent to the cloud system which has an abundance
of resources for computationally intensive tasks.
It should be noted that the configuration of the fog node to ensure the
privacy of the smart home is a challenging prospect. As fog nodes are
becoming a major computation hub, smart home private data become
vulnerable to various attacks. Therefore, a new breed of trust manage-
ment systems and privacy protection mechanisms are required to tackle
such problem. These mechanisms are not considered in this paper.
However, other remedies for this problem can be found in [20][21][22]
and [23].
13
Spark, Storm, etc. The cloud system uses its back-end computation to
gain business insight and updated the fog nodes about new operational
rules.
4. Case Study
14
co-utilization of appliances), the cluster of appliances with respect to
the time of use (i.e spatio-temproal analysis), and the forecast of appli-
ance usage. The following steps illustrate the life-cycle of the analytics
at the fog node.
Data Cleaning and Preparation: The dataset contains millions of
records (sample of raw data is shown Table (1)) with a large amount
of data about appliances. Data about appliances is collected every
minute for a length of two years (April 2012-April 2014). These data
measurements include: unix timestamp, line voltage, voltage, apparent
power. The process of cleaning the data started by importing the data
files in Python scripts. The cleaning process includes eliminate un-
necessary columns, convert Unix timestamp to human readable date,
remove values that are below the standby power threshold, removing
outliers and duplicate rows. The entire cleaning process was completed
using Python with regular expressions (RegEx). The preparation of
the data includes comparing all the reading to a pattern and only the
matching patterns were stored in a database. The tuples not matching
the pattern are considered noise because the values for power and times-
tamp are supposed to be Integers only, hence, any different character
in this values would represent an error in the recording process of those
tuples. The process of pattern matching also ensures the quality of the
data, because any tuple that was incomplete or inconsistent did not
match the pattern and therefore was ignored. For the purpose of train-
ing, we developed a synthetic dataset which include the appliance, the
time of its operation, the date, and the power. With this information
in hand, we can then perform clustering analysis and frequent pattern
of appliance usage such as the hour of day, day of week, month of the
year.
Frequent Pattern Mining: For frequent pattern mining, we are in-
terested in analyzing the occurrences of when certain appliances are
being used by examining the ”ON/OFF” state and the energy con-
sumption. Being in an On state allows for the inference that a human
is currently using a particular appliance. This information can be ben-
eficial in certain applications, and as a result, the data and patterns
mined have a value to industries. For example, by knowing when an
individual is likely to have the television turned on could help compa-
nies target advertisements. We would like to derive these patterns in
15
Table 1: Sample of IoT Data from the Smart Home
Timestamp* Apparent power consumption of appliances**
1360548360 210
- -
1360548420 70
- -
1360548480 28
* - Unix timestamp
** - Example of appliances includes Dishwasher, Toaster, TV, Dryer
Home Theater, Washing Machine, Laptop etc.
16
Figure 4: Hour of the Day Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)
the dataset. Figures (4), (5), and (6) show the pattern of energy con-
sumption of six appliances in the home comprising hours of day, day of
week, month of year. We applied a minimum support threshold of 30%
on the dataset and turned all values that were below the threshold to 0
and all the ones above to 1. This allowed us to obtain a binary matrix
to check what appliances are in use at the specific time as shown in the
table (2).
The final result of the frequent pattern mining is the association among
appliances that are the result of the simultaneous use of the appliance
by occupants. Figure (7) shows an example of hourly use and day of
the week use of appliances. From figure (7-a) it is apparent the two
appliances used the most together are the dishwasher and television be-
tween the hours of 6pm-1030pm. For the three appliances (dishwasher,
dryer, and television) on at the same time, the most likely time of the
day this will happen between 8-8: 30 pm. The days of the week in
figure (7-b) demonstrates that very often the dishwasher and television
are frequently on together at the same time. Inspecting each day in-
dividually, you can see certain patterns such as Monday and Tuesday
night the dishwasher and television are on the longest amount of time
or Saturdays the television and dishwasher are on later at night.
17
Procedure-1:Generating Frequent Patterns-GP Growth
Figure 5: Day of the Week Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)
18
Table 2: Sample of a binary matrix to uncover the frequent pattern of appliance usages
10:30pm- 11pm- 11:30pm- DWE- DWE- CDE- DWE-
11pm 11:30pm 12pm CDE TVE TVE CDE-TVE
0 0 0 1 1 1 1
0 0 0 1 1 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 1 1 1
0 0 0 1 1 1 1
1 0 0 1 1 1 1
0 0 0 1 1 1 1
Figure 6: Day of the Week Energy Consumption Pattern - (a) Dishwasher (DWE), (b)
Cloth Dryer (DWE), (c) Kettle (KE), (d) TV, (e) Home Theater (HTE), (f) Laptop (EBE)
about how the smart home occupants are co-utilizing their appliances. Clus-
tering analysis allows us to interpret time-intervals associated with groups
of appliances. This is rather important to uncover deeper behavior of ap-
pliance energy consumption of specific times (e.g. peak hours). To achieve
this objective, we implement the k-mean clustering algorithm in [39]. The
basic principle of the k-mean algorithm is that it defines k centers which
are placed
P in P specific positions away from each other. Then, the function
G(z) = ki=1 C 2
j=1 (||ai − bj ||) is used to determine the squared error value,
i
19
Figure 7: Frequent Pattern of appliances- Dishwasher (DWE), Cloth Dryer (DWE), TV -
(a) Hour of the Day (b) Day of Week [D1 -Sunday - D5 Saturday ]
vital for getting better results. There are many methods for determining the
ideal number k as described in [43]. The approach in this work is using the
silhouette coef f icient as a means of calculating the optimal number k [44].
This method basically measures the quality of the cluster by evaluating how
well the data points are positioned within a cluster. It computes the average
distance of yj given as xj = average{dis(yj , yi )} to all other data points in
cluster Ci and then determine wj = min(wj ) across all the clusters except
(wj −xj )
Ci . The Silhouette coef f icient for yj is determined as ryj = max(x j ,wj )
and the Silhouette coef f icient for cluster Ci and for having k clusters as
rCi = average(syj ) f or j = d1 ..dn and rk = average(sCi ) f or i = 1..k
respectively. The higher the average silhouette value, the better the cluster-
ing. In other words, the average Silhouette provides observation about the
various values of k ∈ 1, 2, 3...m, where m represents the unique objects in
a dataset. To find out the optimal number of clusters, the process is con-
tinuously executed and the average Silhouette coefficient is calculated until
finding the optimal number of clusters k that maximizes rk .
Figure (8) shows the clustering of appliances at the hour of the day, where
cluster strength signifies the frequency of use of appliance, i.e., a higher
strength of a cluster for an appliance indicates the higher use of it during
the period. Higher or lower usage of appliance, i.e., patterns of appliance
usage can be the direct representative of energy consumption behavior of
occupants. Such an analysis can be conducted at various levels such as
individual house, group of houses, community or neighborhood, or at the
system level. When done at a higher level such as neighborhood or system
level, the outcomes can help profile houses according to energy consumption
behavior and customize demand response mechanism to be more efficient.
Further, at a single home, the outcomes can assist adapt recommendations
to reduce household energy cost while respecting the occupants expected
20
Figure 8: Smart Home Appliance Clustering - Hour of the Day
21
residents are frequently watching television at this time. Analytics in fog
nodes increases the ability of the platform to manage an integrated array of
IoT data streams for various applications in highly automated ways which
result in significant savings for service providers. Also, service providers can
design and develop their applications using fog nodes that offer abundance
elasticity to enhance performance, redundancy and storage devices for their
applications.
For future work, we plan to develop optimization mechanisms such as
those in [16][17] to determine the optimal distribution and configuration of
fog nodes while taking into consideration the computational resources and ca-
pability of processing the required data from multiple homes. Furthermore,
we plan to refine the platform component and test with different datasets
from various homes. This approach is crucial to validate the applicability of
the platform and its robustness in dealing with all kind of IoT data measure-
ments. We also plan to study a benchmarking scheme to assess and capture
the performance of the platform and analytics under different concerns in-
cluding runtime, CPU utilization, data size, incoming requests, etc.
6. References
[1] H. El-Sayed et al. Edge of Things: The Big Picture on the Integration
of Edge, IoT and the Cloud in a Distributed Computing Environment.
IEEE Access, vol. 6, pp. 1706-1717, 2018
22
[6] A. Mebrek, L. Merghem-Boulahia and M. Esseghir. Efficient green solu-
tion for a balanced energy consumption and delay in the IoT-Fog-Cloud
computing. IEEE 16th International Symposium on Network Computing
and Applications (NCA), Cambridge, MA, 2017, pp. 1-4, 2017
23
[14] H. Cai, B. Xu, L. Jiang and A. V. Vasilakos. IoT-Based Big Data Stor-
age Systems in Cloud Computing: Perspectives and Challenges. IEEE
Internet of Things Journal, vol. 4, no. 1, pp. 75-87, Feb. 2017.
[18] N. M. Gonzalez et al. Fog computing: Data analytics and cloud dis-
tributed processing on the network edges. 35th International Conference
of the Chilean Computer Science Society (SCCC), Valparaiso, 2016, pp.
1-9.
[22] A. Paverd, A. Martin, and I. Brown. Security and Privacy in Smart Grid
Demand Response Systems. Series Lecture Notes in Computer Science.
Volume 8448, pp 1-15, Smart Grid Security, Springer, 2014
24
[23] A.Yassine, S.Shirmohammadi. Privacy and the market for private data:
a negotiation model to capitalize on private data. IEEE/ACS Interna-
tional Conference on Computer Systems and Applications, Doha, 2008,
pp. 669-678.
[26] S. Yang. IoT Stream Processing and Analytics in the Fog. IEEE Com-
munication Magazine, vol. 55, no. 8, pp. 21-27, 2017.
[27] S. Singh, A. Yassine. IoT Big Data Analytics with Fog Computing for
Household Energy Management in Smart Grids. SGIoT 2018 - 2nd EAI
International Conference on Smart Grid and Internet of Things, Niagara
Falls, Canada, 2018
[28] S. Singh, A. Yassine. Big Data Mining of Energy Time Series for Behav-
ioral Analytics and Energy Consumption Forecasting. Energies, 2018,
11, 452
25
[32] G. Daneels et al. Real-Time data dissemination and analytics platform
for challenging IoT environments. Global Information Infrastructure and
Networking Symposium (GIIS), St. Pierre, 2017, pp. 23-30.
[36] J. Li, T. Zhang, J. Jin, Y. Yang, D. Yuan and L. Gao. Latency estima-
tion for fog-based internet of things. 27th International Telecommunica-
tion Networks and Applications Conference (ITNAC), Melbourne, VIC,
2017, pp. 1-6.
[37] P. Patel, M. Intizar Ali and A. Sheth. On Using the Intelligent Edge
for IoT Analytics. IEEE Intelligent Systems vol. 32, no. 5, pp. 64-69,
September/October 2017.
[38] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
generation. 2000 ACM SIGMOD International Conference on Manage-
ment of Data USA, pages. 1 12, 2000
[39] J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without
candidate generation: A frequent-pattern tree approach. Data Mining
and Knowledge Discovery vol. 8, no. 1, pp. 5387, 2004
26
[41] S. Makonin, B. Ellert, I. V. Bajic, F. Popowich. AMPds2 - Almanac of
Minutely Power dataset: Electricity, water, and natural gas consump-
tion of a residential house in Canada from 2012 to 2014. Scientific Data,
DOI 10.1038/sdata.2016.37, vol. 3, pp. 1-12, 2015.
[43] C.A. Sugar and G.M. James. Finding the Number of Clusters in a Data
Set: An Information Theoretic Approach. J. Am. Statistical Assoc., vol.
98, pp. 750-763, 2003
27