0% found this document useful (0 votes)
2 views

IoT-notes

Module 4 of the IoT course discusses data and analytics for IoT, focusing on the classification of data into structured and unstructured types, as well as data in motion versus data at rest. It outlines various types of data analysis, including descriptive, diagnostic, predictive, and prescriptive, while also addressing challenges such as scaling and data volatility in relational databases. Additionally, the module covers machine learning applications in IoT, emphasizing the importance of supervised and unsupervised learning, neural networks, and their roles in monitoring, behavior control, operations optimization, and self-healing systems.

Uploaded by

rajinder97611
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

IoT-notes

Module 4 of the IoT course discusses data and analytics for IoT, focusing on the classification of data into structured and unstructured types, as well as data in motion versus data at rest. It outlines various types of data analysis, including descriptive, diagnostic, predictive, and prescriptive, while also addressing challenges such as scaling and data volatility in relational databases. Additionally, the module covers machine learning applications in IoT, emphasizing the importance of supervised and unsupervised learning, neural networks, and their roles in monitoring, behavior control, operations optimization, and self-healing systems.

Uploaded by

rajinder97611
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Module 4 IoT(15CS81)

Internet of Things (15cs81)

MODULE 4

By
Divya K S
Dept. of CSE
Module 4 IoT(15CS81)

Chapter 1: Data and Analytics for IoT


1. Classification of Data
Structured Versus Unstructured Data:
 Structured data and unstructured data are important classifications as they typically
require different toolsets from a data analytics perspective.

 Structured data means that the data follows a model or schema that defines how the data
is represented or organized, meaning it fits well with a traditional relational database
management system (RDBMS).
 In many cases we will find structured data in a simple tabular form—for example, a
spreadsheet where data occupies a specific cell and can be explicitly defined and
referenced.
 IoT sensor data often uses structured values, such as temperature, pressure, humidity, and
so on, which are all sent in a known format.
 Structured data is easily formatted, stored, queried, and processed.
 Example commercial software like Microsoft Excel and Tableau

 Unstructured data lacks a logical schema for understanding and decoding the data
through traditional programming means.
 Examples of this data type include text, speech, images, and video.
 Any data that does not fit neatly into a predefined data model is classified as unstructured
data.
 According to some estimates, around 80% of a business’s data is unstructured.
 Data analytics methods that can be applied to unstructured data, such as cognitive
computing and machine learning, are deservedly garnering a lot of attention.
 With machine learning applications, such as natural language processing (NLP), we can
decode speech.
 With image/facial recognition applications, we can extract critical information from still
images and video.
 Smart objects in IoT networks generate both structured and unstructured data
Module 4 IoT(15CS81)

Data in Motion Versus Data at Rest:

 Data in IoT networks is either in transit (“data in motion”) or being held or stored (“data at
rest”).
 Examples of data in motion include traditional client/server exchanges, such as web
browsing and file transfers, and email.
 Data saved to a hard drive, storage array, or USB drive is data at rest.
 From an IoT perspective, the data from smart objects is considered data in motion as it
passes through the network en route to its final destination.
 This is often processed at the edge, using fog computing.

 When data is processed at the edge, it may be filtered and deleted or forwarded on for
further processing and possible storage at a fog node or in the data center.
 Data does not come to rest at the edge.
 When data arrives at the data center, it is possible to process it in real-time, just like at the
edge, while it is still in motion.
 Tools with this sortofcapability,such as Spark, Storm, and Flink, arerelatively nascent
compared to the tools for analyzing stored data.
 Data at rest in IoT networks can be typically found in IoT brokers or in some sort of storage
array at the data center.
 The best known tool is Hadoop.
 Hadoop not only helps with data processing but also data storage.

2. IoT Data Analytics Overview:


Data analysis is typically broken down by the types of results that are produced.
There are four types of data analysis results:
1 . Descriptive 2. Diagnostic 3. Predictive 4. Prescriptive
Module 4 IoT(15CS81)

1 . Descriptive:
 Descriptive data analysis tells us what is happening, either now or in the past.
 For example,a thermometer in a truck engine reports temperaturevaluesevery second.
 From a descriptive analysis perspective, we can pull this data at any moment to gain insight
into the current operating condition of the truck engine.
 If the temperature value is too high, then there maybe a cooling problem or the engine may
be experiencing too much. lo ad

2 . Diagnostic:
 When we are interested in the “why,” diagnostic data analysis can provide the answer.
 The example of the temperature sensor in the truck engine, we might wonder why the truck
engine failed.
 Diagnostic analysis might show that the temperature of the engine was too high, and the
engine overheated.
 Applying diagnostic analysis acrossthedatagenerated by a wide range ofsmart objects can
provide a clear picture of why a problem or an event occurred.
3 . Predictive:
 Predictive analysis aims to foretell problems or issues before they occur.
 For example, with historical values of temperatures for the truck engine, predictive
analysis could provide an estimate on the remaining life of certain components in the
engine.
 These components could then be proactively replaced before failure occurs. Or perhaps if
temperature values of the truck engine start to rise slowly over time, this could indicate
the need for an oil change or some other sort of engine cooling maintenance.
4 . Prescriptive:
 Prescriptive analysis goes a step beyond predictive and recommends solutions for
upcoming problems.
 A prescriptive analysis of the temperature data from a truck engine might calculate various
alternatives to cost-effectively maintain our truck.
 These calculations could range from the cost necessary for more frequent oil changes and
cooling maintenance to installing new cooling equipment on the engine or upgrading to a
lease on a model with a more powerful engine.

3. IoT Data Analytics Challenges:


Module 4 IoT(15CS81)

IoT data places two specific challenges on a relational database:


1. Scaling problems: Due to the large number of smart objects in most IoT networks that
continually send data, relational databases can grow incredibly large very quickly. This can result
in performance issues that can be costly to resolve, often requiring more hardware and architecture
changes.

2.Volatility of data:
 With relational databases, it is critical that the schema be designed correctly from the
beginning. Changing it later can slow or stop the database from operating.
 Due to the lack of flexibility, revisions to the schema must be kept at a minimum.
 IoT data, however, is volatile , the data model is likely to change and evolve over time.
 A dynamic schema is often required so that data model changes can be made daily or
even hourly

4. Another challenge that IoT brings to analytics is in the area of network data, which is
referred to as network analytics.
5. With the large numbers of smart objects in IoT networks that are communicating and
streaming data, it can be challenging to ensure that these data flows are effectively
managed, monitored, and secure.
Network analytics tools such as Flexible NetFlow and IPFIX provide the capability to detect
irregular patterns or other problems in the flow of IoT data through a network

4. Machine Learning:
 Machine learning, deep learning, neural networks, and convolutional networks are related
to big data and IoT.
 ML is central to IoT.
 Data collected by smart objects needs to be analyzed, and intelligent actions need to be
taken based on these analyses.
 Performing this kind of operation manually is almost impossible (or very, very slow
and inefficient).
 Machines are needed to process information fast and react instantly when thresholds
are met.
 For example, every time a new advance is made in the field of self-driving vehicles,
abnormal pattern recognition in a crowd, or any other automated intelligent and machine-
assisted decision system, ML is named as the tool that made the advance possible.
 Machine learning is part of a larger set of technologies commonly grouped under the
term artificial intelligence (AI).
 AI includes any technology that allows a computing system to mimic human intelligence
using any technique, from very advanced logic to basic “if-then else” decision loops.
 Any computer that uses rules to make decisions belongs to this realm.
Module 4 IoT(15CS81)

 A simple example is an app that can help us find your parked car.
 A GPS reading of our position at regular intervals calculates your speed.
 A basic threshold system determines whether we are driving (for example, “if speed >
20 mph or 30 kmh, then start calculating speed”).
 A typical example is a dictation program that runs on a computer.
 The program is configured to recognize theaudio patternof eachword in a dictionary, but it
does not know your voice’s specifics—your accent, tone, speed, and so on.

ML can be divided in two main categories:


Supervised and unsupervised learning.
Supervised Learning:
 In supervised learning, the machine is trained with input for which there is a known
correct answer.
 For example, suppose that you are training a system to recognize when there is a human
in a mine tunnel.
 A sensor equipped with a basic camera can capture shapes and return them to a computing
system that is responsible for determining whether the shape is a human or something else
(such as a vehicle, a pile of ore, a rock, a piece of wood, and so on.).
 With supervised learning techniques, hundreds or thousands of images are fed into
the machine, and each image is labeled (human or nonhuman in this case).
 This is called the training set.
 An algorithm is used to determine common parameters and common differences between
the images.
 The comparison is usually done at the scale of the entire image, or pixel by pixel.
 Images are resized to have the same characteristics (resolution, color depth, position
of the central figure, and so on), and each point is analyzed.
 Human images have certain types of shapes and pixels in certain locations (which
correspond to the position of the face, legs, mouth, and so on).
 Each new image is compared to the set of known “good images,” and a deviation is
calculated to determine how different the new image is from the average human image
and, therefore, the probability that what is shown is a human figure.
 This process is called classification.
 After training, the machine should be able to recognize human shapes.
 Before real field deployments, the machine is usually tested with unlabeled pictures - this
is called the validation or the test set, depending on the ML system used— to verify that
the recognition level is at acceptable thresholds.
 If the machine does not reach the level of success expected, more training is needed.

Unsupervised Learning:
 For example, we may decide to group the engines by the sound they make at a given
temperature.
Module 4 IoT(15CS81)

 A standard function to operate this grouping, K-means clustering, finds the mean values
for a group of engines (for example, mean value for temperature, mean frequency for
sound).
 Grouping the engines this way can quickly reveal several types of engines that all belong
to the same category (for example, small engine of chainsaw type, medium engine of
lawnmower type).
 All engines of the same type produce sounds and temperatures in the same range as
the other members of the same group.
 There will occasionally be an engine in the group that displays unusual characteristics
(slightly out of expected temperature or sound range).
 This is the engine that you send for manual evaluation.

 The computing process associated withthis determination is called unsupervised learning


 The type of learning is unsupervised because there is not a “good” or “bad” answer known
in advance.
 It is the variation from a group behavior that allows the computer to learn that something
is different.
 parameters are multidimensional.
 hundreds orthousands ofparameters are computed, and small cumulated deviations in
multiple dimensions are used to identify the exception.

5. Neural Networks
 Neural networks are ML methods that mimic the way the human brain works.
 When we look at a human figure, multiple zones of our brain are activated to recognize
colors, movements, facial expressions, and so on.
 our brain combines these elements to conclude that the shape we are seeing is human.
 Neural networks mimic the same logic.
 The information goes through different algorithms (called units), each of which is in
charge of processing an aspect of the information
 The resulting value of one unit computation can be used directly or fed into another unit
for further processing to occur.
 In this case, the neural network is said to have several layers.
 For example, a neural network processing human image recognition may have two units
in a first layer that determines whether the image has straight lines and sharp angles—
because vehicles commonly have straight lines and sharp angles, and human figures do
not.
 If the image passes the first layer successfully (because there are no or only a small
percentage of sharp angles and straight lines), a second layer may look for different
features (presence of face, arms, and so on), and then a third layer might compare the image
to images of various animals and conclude that the shape is a human (or not).
 The great efficiency of neural networks is that each unit processes a simple test, and
therefore computation is quite fast.
 When the result of layer is fed into another layer, the process is called deep learning
Module 4 IoT(15CS81)

 One advantage of deeplearning is that having more layers allows for richer intermediate
processing and representation of the data.
 At each layer, the data can be formatted to be better utilized by the next layer.
 This process increases the efficiency of the overall result.

ML operations into two broad subgroups:


1. Local learning 2. Remote learning
1. Local learning
In this group, data is collected and processed locally, either in the sensor itself (the
edge node) or in the gateway (the fog node).
2. Remote learning
In this group, data is collected and sent to a central computing unit (typically the
data center in a specific location or in the cloud), where it is processed.

Common applications of ML for IoT revolve around four major domains:


1. Monitoring
2. Behavior control
3. Operations optimization
4. Self-healing, self-optimizing

1. Monitoring:
 Smart objects monitor the environment where they operate.
 Data is processed to better understand the conditions of operations.
 These conditions can refer to external factors, such as air temperature, humidity, or
presence of carbon dioxide in a mine, or to operational internal factors, such as the pressure
of a pump, the viscosity of oil flowing in a pipe, and so on.
 ML can be used with monitoring to detect early failure conditions or to better evaluate the
environment (such as shape recognition for a robot automatically sorting material or
picking goods in a warehouse or a supply chain).
2. ctnBehavior control:
Module 4 IoT(15CS81)

 Monitoring commonly works in conjunction with behavior control.


 When a given set of parameters reach a target threshold —defined in advance (that is,
supervised) or learned dynamically through deviation from mean values (that is,
unsupervised)—monitoring functions generate an alarm.
 This alarm can be relayed to a human, but a more efficient and more advanced system
would trigger a corrective action, such as increasing the flow of fresh air in the mine tunnel,
turning the robot arm, or reducing the oil pressure in the pipe.

3.Operations optimization:

 Behavior control typically aims at taking corrective actions based on thresholds.


 However, analyzing data can also lead to changes that improve the overall process.
 For example, a water purification plant in a smart city can implement a system to monitor
the efficiency of the purification process based on which chemical (from company A or
company B) is used, at what temperature, and associated to what stirring mechanism
(stirring speed and depth).
 Neural networks can combine multiples of such units, in one or several layers, to estimate
the best chemical and stirring mix for a target air temperature.
 This intelligence can help the plant reduce its consumption of chemicals while still
operating at the same purification efficiency level.
 As a result of the learning, behavior control results in different machine actions.
 The objective is not merely to pilot the operations but to improve the efficiency and the
result of these operations.

4. Self-healing, self-optimizing:

 A fast-developing aspect of deep learning is the closed loop.


 ML-based monitoring triggers changes in machine behavior (the change is monitored by
humans), and operations optimizations.
 In turn, the ML engine can be programmed to dynamically monitor and combine new
parameters (randomly or semi-randomly) and automatically deduce and implement new
optimizations when the results demonstrate a possible gain.
 The system becomes self-learning and self optimizing.
 It also detects new K-means deviations that result in predetection of new potential defects,
allowing the system to self-heal.
 The healing is not literal, as external factors (typically human operators) have to
intervene, but the diagnosis is automated.

5. Big Data Analytics Tools and Technology:


 Hadoop is at the core of many of today’s big data implementations.
 Big data analytics can consist of many different software pieces that together collect,
store, manipulate, and analyze all different data types.

The “three Vs” to categorize big data:


Module 4 IoT(15CS81)

1. Velocity
2. Variety
3. Volume

1. Velocity:
 Velocity refers to how quickly data is being collected and analyzed.
 Hadoop Distributed File System is designed to ingest and process data very quickly.
 Smart objects can generate machine and sensor data at a very fast rate and require database
or file systems capable of equally fast ingest functions.
2. Variety:
 Variety refers to different types of data and data is categorized as structured, semi-
structured, or unstructured.
 Different database technologies may only be capable of accepting one of these types.
 Hadoop is able to collect and store all three types.
3. Volume:
 Volume refers to the scale of the data.
 This is measured from gigabytes on the very low end to petabytes or even exabytes of data
on the other extreme.
 Big data implementations scale beyond what is available on locally attached storage disks
on a single node.
 It is common to see clusters of servers that consist of dozens, hundreds, or even thousands
of nodes for some large deployments.

The three most popular of database technology categories are

 massively parallel processing systems,


 NoSQL, and
 Hadoop

Massively Parallel Processing Databases:

 Massively parallel processing (MPP) databases were built on the concept of the relational
data warehouses but are designed to be much faster, to be efficient, and to support reduced
query times.
 To accomplish this, MPP databases take advantage of multiple nodes (computers)
designed in a scale out architecture such that both data and processing are distributed
across multiple systems.
 MPPs are sometimes referred to as analytic databases because they are designed to allow
for fast query processing and often have built-in analytic functions.
 These database types process massive data sets in parallel across many processors and
nodes.
 An MPP architecture typically contains a single master node that is responsible for the
coordination of all the data storage and processing across the cluster.
 It operates in a “shared-nothing” fashion, with each node containing local processing,
memory, and storage and operating independently
Module 4 IoT(15CS81)

 Data storage is optimized across the nodes in a structured SQL-like format that allows data
analysts to work with the data using common SQL tools and applications.

NoSQL Databases:

 NoSQL (“not only SQL”) is a class of databases that support semi-structured and
unstructured data, in addition to the structured data handled by data warehouses and
MPPs.
 NoSQL is not a specific database technology, it encompasses several different types of
databases, including the following:
1. Document stores
2. Key-value stores
3. Wide-column stores
4.Graph stores

1. Document stores:
 This type of database stores semi-structured data, such as XML or JSON.
 Document stores generally have query engines and indexing features that allow for many
optimized queries.

2. Key-value stores:
 This type of database stores associative arrays where a key is paired with an associated
value.
 These databases are easy to build and easy to scale.
Module 4 IoT(15CS81)

3. Wide-column stores:
 This type of database stores similar to a key value store, but the formatting of the values
can vary from row to row, even in the same table.

4. Graph stores:
 This type of database is organized based on the relationships between elements.
 Graph stores are commonly used for social media or natural language processing,
 NoSQL was developed to support the high-velocity, urgent data requirements of modern
web applications that typically do not require much repeated use.
 NoSQL is built to scale horizontally, allowing the database to span multiple hosts.

Hadoop:

Initially, the project had two key elements:


1. Hadoop Distributed File System (HDFS):
 A system for storing data across multiple nodes
2. MapReduce:
 A distributed processing engine that splits a large task into smaller ones that can
be run in parallel.

NameNodes:

 These are a critical piece in data adds, moves, deletes, and reads on HDFS.
 They coordinate where the data is stored, and maintain a map of where each block of
data is stored and where it is replicated.
 All interaction with HDFS is coordinated through the primary (active) NameNode, with
a secondary (standby) NameNode notified of the changes in the event of a failure of the
primary.
 The NameNode takes write requests from clients and distributes those files across the
available nodes in configurable block sizes, usually 64 MB or 128 MB blocks.
 The NameNode is also responsible for instructing the DataNodes where replication
should occur.
Module 4 IoT(15CS81)

DataNodes:

 These are the servers where the data is stored at the direction of the NameNode.
 It is common to have many DataNodes in a Hadoop cluster to store the data.
 Data blocks are distributed across several nodes and often are replicated three, four, or
more times across nodes for redundancy.
 Once data is written to one of the DataNodes, the DataNode selects two (or more)
additional nodes, based on replication policies, to ensure data redundancy across the
cluster.
 Disk redundancy techniques such as Redundant Array of Independent Disks (RAID) are
generally not used for HDFS because the NameNodes and DataNodes coordinate block-
level redundancy with this replication technique.

6. Edge Streaming Analytics


Comparing Big Data and Edge Analytics:

The key values of edge streaming analytics include the following:

1. Reducing data at the edge:


 The aggregate data generated by IoT devices is generally in proportion to the number of
devices.
 The scale of these devices is likely to be huge, and so is the quantity of data they
generate.
 Passing all this data to the cloud is inefficient and is unnecessarily expensive in terms of
bandwidth and network infrastructure.

2. Analysis and response at the edge:


 Some data is useful only at the edge (such as a factory control feedback system).
 In cases such as this, the data is best analyzed and acted upon where it is generated.

3. Time sensitivity:
 When timely response to data is required, passing data to the cloud for future processing
results in unacceptable latency.

Edge Analytics Core Functions:

 To perform analytics at the edge, data needs to be viewed as real-time flows.


 Whereas big data analytics is focused on large quantities of data at rest, edge analytics
continually processes streaming flows of data in motion.
Module 4 IoT(15CS81)

Streaming analytics at the edge can be broken down into three simple stages :

1. Raw input data


2. Analytics processing unit (APU)
3. Output streams:

 Raw input data:

1. This is the raw data coming from the sensors into the analytics processing unit.
2. Analytics processing unit (APU):
3. The APU filters and combines data streams (or separates the streams, as
necessary), organizes them by time windows, and performs various analytical
functions.
4. It is at this point that the results may be acted on by micro services running in the
APU.

 Output streams:

1. The data that is output is organized into insightful streams and is used to influence
the behavior of smart objects, and passed on for storage and further processing in
the cloud.
2. Communication with the cloud often happens through a standard
publisher/subscriber messaging protocol, such as MQTT.

Figure illustrates the stages of data processing in an edge APU

theAPU needs to perform thefollowing functions:

1. Filter:

 The streaming data generated by IoT endpoints is likely to be very large, and most of it
is irrelevant. For example, a sensor may simply poll on a regular basis to confirm that it
is still reachable.
 The filtering function identifies the information that is considered important.
Module 4 IoT(15CS81)

2.Transform:

 In the data warehousing world, Extract, Transform, and Load (ETL) operations are used
to manipulate the data structure into a form that can be used for other purposes.
 Analogous to data warehouse ETL operations, in streaming analytics, once the data is
filtered, it needs to be formatted for processing.

3.Time:

 As the real-time streaming data flows, a timing context needs to be established. This
could be to correlated average temperature readings from sensors on a minute-by-
minute basis.
 For example, an APU that takes input data from multiple sensors reporting temperature
fluctuations. In this case, the APU is programmed to report the average temperature
every minute from the sensors, based on an average of the past two minutes.

4.Correlate:

 Streaming dataanalyticsbecomesmost usefulwhen multiple datastreams are combined from


different types of sensors.
 For example,in a hospital, several vital signs are measured for patients, including body
temperature, blood pressure, heart rate, and respiratory rate.
 These different types of data come from different instruments, but when this data is
 combined and analyzed, it provides an invaluable picture of the health of the patient at
any given time
 For example, historical data may include the patient’s past medical history, such as blood
test results.
 Combining historicaldatagives thelivestreaming dataa powerfulcontext and promotes more
insights into the current condition of the patient (see Figure).

5. Match patterns:
Module 4 IoT(15CS81)

 Once the data streams are properly cleaned, transformed, and correlated with other live
streams as well as historical data sets, pattern matching operations are used to gain
deeper insights to the data.
 For example, say that the APU has been collecting the patient’s vitals for some time and
has gained an understanding of the expected patterns for each variable being monitored.
 If an unexpected event arises, such as a sudden change in heart rate or respiration, the
pattern matching operator recognizes this as out of the ordinary and can take certain
actions, such as generating an alarm to the nursing staff.
 The patterns can be simple relationships, or they may be complex, based on the criteria
defined by the application.

6. Improve business intelligence:

 Ultimately, the value of edge analytics is in the improvements to business intelligence


that were not previously available.
 For example, conducting edge analytics on patients in a hospital allows staff to respond
more quickly to the patient’s changing needs and also reduces the volume of
unstructured (and not always useful) data sent to the cloud.

Distributed Analytics Systems:

 Streaming analytics may be performed directly at the edge, in the fog, or in the cloud
data center.
 Fog analytics allows, to see beyond one device, giving visibility into an aggregation of
edge nodes and allowing to correlate data from a wider set.
 Figure shows an example of an oil drilling company that is measuring both pressure and
temperature on an oil rig.

 Sensors communicate via MQTT through a message broker to the fog analytics node,
allowing a broader data set.
 The fog node is located on the same oil rig and performs streaming analytics from
several edge devices, giving it better insights due to the expanded data set.
Module 4 IoT(15CS81)

 It may not be able to respond to an event as quickly as analytics performed directly on


the edge device, but it is still close to responding in real-time as events occur.
 Once the fog node is finished with the data, it communicates the results to the cloud
(again through a message broker via MQTT) for deeper historical analysis through big
data analytics tools.

Network Analytics:
 Network analytics has the power to analyze details of communications patterns made by
protocols and correlate this across the network.
 It quickly identifies anomalies that suggest network problems due to sub optimal paths,
intrusive malware, or excessive congestion.
 Network analytics offer capabilities to cope with capacity planning for scalable IoT
deployment as well as security monitoring in order to detect abnormal traffic volume and
patterns (such as an unusual traffic spike for a normally quiet protocol) for both
centralized or distributed architectures, such as fog computing.

Benefits of flow analytics, in addition to other network management services, are as


follows:

1. Network traffic monitoring and profiling:

 Flow collection from the network layer provides global and distributed near-real- time
monitoring capabilities.
 IPv4 and IPv6 network wide traffic volume and pattern analysis helps administrators
proactively detect problems and quickly troubleshoot and resolve problems when they
occur.

2. Application traffic monitoring and profiling:

 Monitoring and profiling can be used to gain a detailed time-based view of IoT access
services, such as the application-layer protocols, including MQTT, CoAP, and DNP3, as
well as the associated applications that are being used over the network.

3. Capacity planning:

 Flow analytics can be used to track and anticipate IoT traffic growth and help in the
planning of upgrades when deploying new locations or services by analyzing captured
data over a long period of time.
 This analysis affords the opportunity to track and anticipate IoT network growth on a
continual basis.

4. Security analysis:
Module 4 IoT(15CS81)

 Because most IoT devices typically generate a low volume of traffic and always send
their data to the same server(s), any change in network traffic behavior may indicate a
cyber security event, such as a denial of service (DoS) attack.
 Security can be enforced by ensuring that no traffic is sent outside the scope of the IoT
domain.
 For example, with a LoRaWAN gateway, there should be no reason to see traffic sent or
received outside the LoRaWAN network server and network management system.

4. Accounting:

 In field area networks, routers or gateways are often physically isolated and leverage
public cellular services and VPNs for backhaul.
 Deployments may have thousands of gateways connecting the last-mile IoT
infrastructure over a cellular network.
 Flow monitoring can thus be leveraged to analyze and optimize the billing, in
complement with other dedicated applications, such as Cisco Jasper, with a broader
scope than just monitoring data flow.

5. Data warehousing and data mining::

 Flow data (or derived information) can be warehoused for later retrieval and analysis in
support of proactive analysis of multiservice IoT infrastructures and applications.

Common Challenges in OT Security:


The security challenges faced in IoT are

Erosion of Network Architecture:

 Two of the major challenges in securing industrial environments have been initial design
and ongoing maintenance.
 The initial design challenges arose from the concept that networks were safe due to
physical separation from the enterprise with minimal or no connectivity to the outside
world, and the assumption that attackers lacked sufficient knowledge to carry out security
attacks
 From a security design perspective, it is better to know that communication paths are
insecure than to not know the actual communication paths.
 This kind of organic growth has led to miscalculations of expanding networks and the
introduction of wireless communication in a standalone fashion, without consideration of
the impact to the original security design.
 These uncontrolled or poorly controlled OT network evolutions have, in many cases, over
time led to weak or inadequate network and systems security.
 In many industries, the control systems consist of packages, skids, or components that are
self-contained and may be integrated as semi-autonomous portions of the network.
Module 4 IoT(15CS81)

 These packages may not be as fully or tightly integrated into the overall control system,
network management tools, or security applications, resulting in potential risk.

Pervasive Legacy Systems:

 Due to the static nature and long lifecycles of equipment in industrial environments, many
operational systems may be deemed legacy systems.
 For example, in a power utility environment, it is not uncommon to have racks of old
mechanical equipment still operating alongside modern intelligent electronic devices
(IEDs).
 From a security perspective, this is potentially dangerous as many devices may have
historical vulnerabilities or weaknesses that have not been patched and updated, or it may
be that patches are not even available due to the age of the equipment.
 Communication methods and protocols may be generations old and must be interoperable
with the oldest operating entity in the communications path.
 This includes switches, routers, firewalls, wireless access points, servers, remote access
systems, patch management, and network management tools.
 All of these may have exploitable vulnerabilities and must be protected.

Insecure Operational Protocols:

 Industrial protocols, such as supervisory control and dataacquisition(SCADA) particularly


the older variants, suffer from common security issues.

Three examples of this are a frequent lack of authentication between communication endpoints,
no means of securing and protecting data at rest or in motion, and insufficient granularity of
control to properly specify recipients or avoid default broadcast approaches.

Modbus:

 Modbus is commonly found in many industries, such as utilities and manufacturing


environments, and has multiple variants (for example, serial,TCP/IP).
 It was created by the first programmable logic controller (PLC) vendor, Modicon, and has
been in use since the 1970s.
 It is one of the most widely used protocols in industrial deployments, and its
development is governed by the Modbus Organization.
 Authentication of communicating endpoints was not a default operation because it would
allow an inappropriate source to send improper commands to the recipient.
 For example, for a message to reach its destination, nothing more than the proper Modbus
address and function call (code) is necessary.
 Some older and serial-based versions of Modbus communicate via broadcast.
 The ability to curb the broadcast function does not exist in some versions.
 Validation of the Modbus message content is also not performed by the initiating
application.
 Instead, Modbus depends on the network stack to perform this function.
Module 4 IoT(15CS81)

 This could open up the potential for protocol abuse in the system.

DNP3 (Distributed Network Protocol):

 DNP3 is found in multiple deployment scenarios and industries.


 It is common in utilities and is also found in discrete and continuous process systems. Like
many other ICS/SCADA protocols, it was intended for serial communication between
controllers and simple IEDs.
 In the case of DNP3, participants allow for unsolicited responses, which could trigger an
undesired response.
 The missing security element here is the ability to establish trust in the system’s state and
thus the ability to trust the veracity of the information being presented.
 This is akin to the security flaws presented by Gratuitous ARP messages in Ethernet
networks, which has been addressed by Dynamic ARP Inspection (DAI) in modern
Ethernet switches.

ICCP (Inter-Control Center Communications Protocol):

 ICCP is a common control protocol in utilities across North America that is frequently
used to communicate between utilities.
 Given that it must traverse the boundaries between different networks, it holds an extra
level of exposure and risk that could expose a utility to cyber attack.
 Initial versions of ICCP had several significant gaps in the area of security.
 One key vulnerability is that the system did not requireauthentication for
communication.
 Second, encryption across the protocol was not enabled as a default condition, thus
exposing connections to man-in-the-middle (MITM) and replay attacks.

OPC (OLE for Process Control):

 OPC is based on the Microsoft interoperability methodology Object Linking and


Embedding (OLE).
 This is an example where an IT standard used within the IT domain and personal
computers has been leveraged for use as a control protocol across an industrial network.
 In industrial control networks, OPC is limited to operation at the higher levels of the
control space, with a dependence on Windows-based platforms.
 Concerns around OPC begin with the operating system on which it operates.
 Many of the Windows devices in the operational space are old, not fully patched and at
risk due to a plethora of well-known vulnerabilities.
 Particular concern with OPC is the dependence on the Remote Procedure Call (RPC)
protocol, which creates two classes of exposure.
 The first requires to clearly understand the many vulnerabilities associated with RPC, and
the second requires to identify the level of risk these vulnerabilities bring to a specific
network.

ALL THE BEST


Module 4 IoT(15CS81)

You might also like