2-Edge Streamng Analytics
2-Edge Streamng Analytics
In the world of IoT, vast quantities of data are generated on the fly and often need to
be analyzed and responded to immediately.
Not only is the volume of data generated at the edge immense—meaning the
bandwidth requirements to the cloud or data center need to be engineered to
match—but the data may be so time sensitive that it needs immediate attention, and
waiting for deep analysis in the cloud simply isn’t possible.
The aggregate data generated by IoT devices is generally in proportion to the number
of devices.
The scale of these devices is likely to be huge, and so is the quantity of data they
generate.
Passing all this data to the cloud is inefficient and is unnecessarily expensive in terms
of bandwidth and network infrastructure.
Dr. Syed Mustafa, HKBKCE. 118
Module – 4 Data and Analytics for IoT
Comparing Big Data and Edge Analytics:
Some data is useful only at the edge (such as a factory control feedback system).
In cases such as this, the data is best analyzed and acted upon where it is generated.
3. Time sensitivity:
When timely response to data is required, passing data to the cloud for future
processing results in unacceptable latency.
Dr. Syed Mustafa, HKBKCE. 119
Edge analytics allows immediate responses to changing conditions.
Module – 4 Data and Analytics for IoT
Comparing Big Data and Edge Analytics:
Whereas big data analytics is focused on large quantities of data at rest, edge
analytics continually processes streaming flows of data in motion.
Streaming analytics at the edge can be broken down into three simple stages :
This is the raw data coming from the sensors into the analytics processing unit.
The APU filters and combines data streams (or separates the streams, as necessary),
organizes them by time windows, and performs various analytical functions.
It is at this point that the results may be acted on by micro services running in the APU.
3. Output streams:
The data that is output is organized into insightful streams and is used to influence the
behavior of smart objects, and passed on for storage and further processing in the
cloud.
In order to perform analysis in real-time, the APU needs to perform the following
functions:
1. Filter:
The streaming data generated by IoT endpoints is likely to be very large, and most of
it is irrelevant. For example, a sensor may simply poll on a regular basis to confirm
that it is still reachable.
2. Transform:
In the data warehousing world, Extract, Transform, and Load (ETL) operations are
used to manipulate the data structure into a form that can be used for other
purposes.
Analogous to data warehouse ETL operations, in streaming analytics, once the data is
filtered, it needs to be formatted for processing.
3. Time:
As the real-time streaming data flows, a timing context needs to be established. This
could be to correlated average temperature readings from sensors on a minute-by-
minute basis.
For example, Figure shows an APU that takes input data from multiple sensors
reporting temperature fluctuations. In this case, the APU is programmed to report the
average temperature every minute from the sensors, based on an average of the past
two minutes. Dr. Syed Mustafa, HKBKCE. 126
Module – 4 Data and Analytics for IoT
Comparing Big Data and Edge Analytics:
4. Correlate:
Streaming data analytics becomes most useful when multiple data streams are
combined from different types of sensors.
For example, in a hospital, several vital signs are measured for patients, including
body temperature, blood pressure, heart rate, and respiratory rate.
These different types of data come from different instruments, but when this data is
combined and analyzed, it provides an invaluable picture of the health of the patient
at any given time
Dr. Syed Mustafa, HKBKCE. 128
Module – 4 Data and Analytics for IoT
Comparing Big Data and Edge Analytics: Edge Analytics Core Functions:
4. Correlate:
Streaming data analytics becomes most useful when multiple data streams are
combined from different types of sensors.
For example, in a hospital, several vital signs are measured for patients, including
body temperature, blood pressure, heart rate, and respiratory rate.
These different types of data come from different instruments, but when this data is
combined and analyzed, it provides an invaluable picture of the health of the patient
at any given time
Dr. Syed Mustafa, HKBKCE. 129
Module – 4 Data and Analytics for IoT
Comparing Big Data and Edge Analytics: Edge Analytics Core Functions:
4. Correlate:
For example, historical data may include the patient’s past medical history, such as
blood test results.
Combining historical data gives the live streaming data a powerful context and
promotes more insights into the current condition of the patient (see Figure).
5. Match patterns:
Once the data streams are properly cleaned, transformed, and correlated with other
live streams as well as historical data sets, pattern matching operations are used to
gain deeper insights to the data.
For example, say that the APU has been collecting the patient’s vitals for some time
and has gained an understanding of the expected patterns for each variable being
monitored.
5. Match patterns:
The patterns can be simple relationships, or they may be complex, based on the
criteria defined by the application.
Depending on the application and network architecture, analytics can happen at any
point throughout the IoT system.
Streaming analytics may be performed directly at the edge, in the fog, or in the cloud
data center.
Fog analytics allows, to see beyond one device, giving visibility into an aggregation of
edge nodes and allowing to correlate data from a wider set.
Figure shows an example of an oil drilling company that is measuring both pressure
and temperature on an oil rig.
Dr. Syed Mustafa, HKBKCE. 135
Module – 4 Data and Analytics for IoT
Distributed Analytics Systems:
Sensors communicate via MQTT through a message broker to the fog analytics node,
allowing a broader data set.
The fog node is located on the same oil rig and performs streaming analytics from
several edge devices, giving it better insights due to the expanded data set.
Once the fog node is finished with the data, it communicates the results to the cloud
(again through a message broker via MQTT) for deeper historical analysis through big
data analytics tools. Dr. Syed Mustafa, HKBKCE. 137