Iiot Notes
Iiot Notes
COURSE OBJECTIVES:
To develop knowledge in Industrial Internet of Things (IIoT) fundamentals.
To understand the architecture, IOT and its protocols
To Understand the various data analytics techniques
To Understand the CPS for Industry 4.0
To provide students with a good depth of knowledge of Designing Industrial IOT Systems for
various applications
UNIT-I : Industrial IOT Introduction 9
Introduction to IOT, IOT Vs. IIOT, History of IIOT, Components of IIOT - Sensors and
Actuators for Industrial Processes, Role of IIOT in Manufacturing Processes. Challenges & Benefits in
implementing IIOT.
UNIT-II : IIoT Architecture 9
Industrial IoT: Business Model and Reference Architecture: IIoT-Business Models, Industrial IoT-
Layers: IIoT Sensing, IIoT Processing, IIoT Communication, IIoT Networking
UNIT-III : IIOT ANALYTICS 9
Big Data Analytics and Software Defined Networks, Machine Learning and Data Science, Julia
Programming, Data Management with Hadoop.
UNIT-IV : Industrial IoT: CYBER PHYSICAL SYSTEM 9
Introduction to Cyber Physical Systems (CPS), Architecture of CPS- Components, Data
science and technology for CPS, Emerging applications in CPS in different fields. Case study:
Application of CPS in health care domain.
UNIT-V : Industrial IoT- Application Domains 9
Industrial IoT- Application Domains: Healthcare, Power Plants, Inventory Management &
Quality Control, Plant Safety and Security (Including AR and VR safety applications), Facility
Management.
TOTAL: 45 HOURS
COURSE OUTCOMES:
At end of the course students will be able to:
CO1 :To understand the basics of industrial IoT (IIoT).
CO2 :To develop various applications using IIOT architectures
CO3 : Recognize the uses of cloud computing and data analytics
CO4 :Analyze privacy and security measures for industry standard solutions
CO5 :Design and implement IOT applications that manage various technology
TEXT BOOKS:
1. Veneri, Giacomo, and Antonio Capasso- Hands-on Industrial Internet of Things: Create a
Powerful Industrial IoT Infrastructure Using Industry 4.0, 1stEd., Packt Publishing Ltd,2018
2. Alasdair Gilchrist- Industry 4.0: The Industrial Internet of Things, 1st Ed., Apress, 2017
REFERENCES:
1. Alasdair Gilchrist, Industry 4.0: The Industrial Internet of Things, 1st Edition, Apress, 2017
2. Aboul Ella Hassanien, Nilanjan Dey and Sureaka Boara, Medical Big Data and Internet of
Medical Things: Advances, Challenges and Applications, 1st edition, CRC Press, 2019.
WEBSITE REFERENCES :
1. https://onlinecourses.nptel.ac.in/noc22_cs52/preview
2. https://www.coursera.org/specializations/developing-industrial-iot#courses
3. https://www.coursera.org/learn/industrial-inte rnet-of-things.
4. https://www.coursera.org/learn/inte rnet-of-things-sensing-actuation
Internet of things (IoT) The Internet of things (IoT) is the inter-networking of physical
devices, vehicles (also referred to as “connected devices” and “smart devices”), buildings,
and other items embedded with electronics, software, sensors, actuators, and network
connectivity which enable these objects to collect and exchange data.
1.1Characteristics:
Unique Identity: Each IOT device has an I.P. address. This identity is helpful in
tracking the equipment and at times to query its status.
Dynamic and Self-Adapting: The IOT device must dynamically adopt itself to the
changing context. Assume a camera meant for surveillance, it may have to work in
different conditions and at different light situations (morning, afternoon, night).
Safety: Having got all the things connected with the Internet possess a major threat,
as our personal data is also there and it can be tampered with, if proper safety
measures are not taken.
Smart cities: The smart city is another powerful application of IoT. It includes
smart surveillance, environment monitoring, automated transformation, urban
security, smart traffic management, water distribution, smart healthcare etc.
W earables: Wearables are devices that have sensors and software installed which
can collect data about the user which can be later used to get the insights about
the user. They must be energy efficient and small sized.
Smart retail: Retailers can enhance the in-store experience of the customers
using IoT. The shopkeeper can also know which items are frequently bought
together using IoT devices.
Smart healthcare: People can wear the IoT devices which will collect data about
user's health. This will help users to analyze themselves and follow tailor -made
techniques to combat illness. The doctor also doesn't have to visit the patients in
order to treat them.
The common connectivity used in this kind of solutions are Bluetooth, WiFi, and
ZigBee. These technologies offer short-range communication, suitable for
applications deployed in limited spaces such as houses, or small offices.
IIOT IOT
IIoT deals with large IoT deals with small scale network
scale networks
3. History of IIOT:
• Industry 1.0 (1784) – The invention of steam engines kick started the Industry
1.0. However, the manufacturing was purely labor oriented and tiresome.
• Industry 2.0 (1870)- The first assembly line production was introduced. This
invention was a big relief for the workers as their labor was minimized to the
possible extent. Henry Ford the Father of mass production and the assembly
line introduced the process in a car manufacturing plant by Ford to improve
the productivity using conveyor belt mechanism.
4. Components of IIOT
SENSOR
Characteristics of Sensors
Classification of sensors:
Sensors based on the power requirement sensor is classified into two types:
Active Sensors, Passive Sensors.
Active Sensors: Does not need any external energy source but directly
generates an electric signal in response to the external.
Analog Sensors
Actuator is a device that converts the electrical signals into the physical
events or characteristics. It takes the input from the system and gives
output to the environment. For example, motors and heaters are some of
the commonly used actuators.
Types of Actuators
IOT COMPONENTS
Four fundamental components of IoT system, which tells us how IoT works.
i. Sensors/Devices
First, sensors or devices help in collecting very minute data from the
surrounding environment. All of this collected data can have various degrees
of complexities ranging from a simple temperature monitoring sensor or a
complex full video feed.
A device can have multiple sensors that can bundle together to do more
than just sense things. For example, our phone is a device that has
multiple sensors such as GPS, accelerometer, camera but our phone does
not simply sense things.
ii. Connectivity
Next, that collected data is sent to a cloud infrastructure but it needs a medium
for transport.
This can range from something very simple, such as checking that the
temperature reading on devices such as AC or heaters is within an
acceptable range. It can sometimes also be very complex, such as identifying
objects (such as intruders in your house) using computer vision on video.
Also, a user sometimes might also have an interface through which they can
actively check in on their IOT system. For example, a user has a camera
installed in his house, he might want to check the video recordings and all
the feeds through a web server.
Role of IIOT in Manufacturing Processes
1. Asset M anagement
IoT technology enables Asset Management, which simply means monitoring pieces
of equipment for better production and quality control and worker’s safety.
As per a report by IBM, industries can achieve a 20% higher product count on
average by optimizing their manufacturing process from their existing line, which is
a huge number. So, a faster and efficient manufacturing plant reduces product
cycle time. And one of the best examples to quote is of the motorbike manufacturing
company Harley Davidson: Via leveraging the power of IIoT, the company is able to
produce a motorbike in just 6 hours! which earlier used to be around 21 days.
IIoT also helps to ensure a safer workplace, especially in hazardous workplaces or
chemical/oil manufacturing firms. For example, with the help of sensors and IoT,
now you can easily detect gas leakages into the pipe network, eliminating that
manual effort. And not only this, IIoT tech along with wearable devices can help in
monitoring the health status of your workers.
So if we integrate the IoT solutions with the transport management system, this will
provide us with a better status or visibility of the moving vehicles. This helps in the
on-time maintenance of vehicles and swift action in case of road accidents, all of
which ultimately results in fast, efficient, and safe transportation.
3. Predictive M aintenance
Maintenance is a tough task, but it won’t be much with the IoT-enabled
maintenance known as Predictive Maintenance.
Recognize the fact that industries are literally struggling with the burden of
maintenance. And in numerical terms, it’s costing them around 50 billion dollars
per year. But with PdM as a solution at their hand, they can avoid this lofty
available cost to a very large extent. An example of this is: Rio Tinto which is a
mining company and they are able to save 2 million USD daily using the IoT-
enabled PdM.
4. Smart Pumping
Now, it’s time to make our pumping systems smart. With the help of an IoT-based
system comprising of sensors and switches, you can not only monitor but also
regulate the flow, pressure, and temperature of your fluid and the pumping systems
of your production facility. This efficient pumping system will help in saving water,
energy costs, and manual labor expenses.
Similarly, groundwater pollution is a critical issue all over the world and using an
IoT system comprising piezometers and sensors, groundwater can be monitored and
managed efficiently, allowing us to take necessary actions whenever required. In
fact, in India, the regulatory body known as CGWA has made it mandatory for
manufacturing industries to install a groundwater monitoring telemetry system and
send the report over to the regulatory body.
Lastly, with the help of an IoT-based stack monitoring system, the CO2 emissions
released from various industries can be regulated, which is also a mandatory
guideline issued by CPCB that has to be followed by manufacturing plants.
5. Plant Safety and Security: IoT combined big data analysis can improve
the overall workers’ safety and security in the plant. .
6. Quality control: IoT sensors collect aggregate product data and other
third-party syndicated data from various stages of a product cycle.
8. Logistics and Supply Chain Optimization: The Industrial IoT (IIoT) can
provide access to real-time supply chain information by tracking
materials, equipment, and products as they move through the supply
chain.
Challenges & Benefits in implementing IIOT.
1. Security: Security is the most significant challenge for the IoT. Increasing
the number of connected devices increases the opportunity to exploit security
vulnerabilities, as do poorly designed devices, which can expose user data to
theft by leaving data streams inadequately protected and in some cases
people’s health and safety can be put at risk.
2.Privacy: The IoT creates unique challenges to privacy, many that go
beyond the data privacy issues that currently exist. Much of this stems from
integrating devices into our environments without us consciously using
them. This is becoming more prevalent in consumer devices, such as
tracking devices for phones and cars as well as smart televisions.
3.Scalability: Billions of internet-enabled devices get connected in a huge
network, large volumes of data are needed to be processed. The system that
stores, analyses the data from these IoT devices needs to be scalable.
4.Interoperability: Technological standards in most areas are still
fragmented. These technologies need to be converged. Which would help us
in establishing a common framework and the standard for the IoT devices.
As the standardization process is still lacking, interoperability of IoT with
legacy devices should be considered critical. This lack of interoperability is
preventing us to move towards the vision of truly connected everyday
interoperable smart objects.
5.Bandwidth: Connectivity is a bigger challenge to the IoT than you might
expect. As the size of the IoT market grows exponentially, some experts are
concerned that bandwidth-intensive IoT applications such as video
streaming will soon struggle for space on the IoT’s current server-client
model.
6.Standards: Lack of standards and documented best practices have a
greater impact than just limiting the potential of IoT devices. Without
standards to guide manufacturers, developers sometimes design products
that operate in disruptive ways on the Internet without much regard to their
impact. If poorly designed and configured, such devices can have negative
consequences for the networking resources they connect to and the broader
Internet.
7. Regulation: The lack of strong IoT regulations is a big part of why the IoT
remains a severe security risk, and the problem is likely to get worse as the
potential attack surface expands to include ever more crucial devices. When
medical devices, cars and children’s toys are all connected to the Internet,
it’s not hard to imagine many potential disaster scenarios unfolding in the
absence of sufficient regulation.
Challenges:
1. security and data privacy
2. lack of interoperability.
3. Increased complexity.
4. Increased cost
IIoT reference architecture:
1. IIoT reference architecture is governed by the Industrial Internet
Reference Architecture (IIRA)
2. IIRA - Industrial Internet Reference Architecture is the architectural
standard, that is used for most of these IIoT applications in these
industries. So, it is a standard based architecture
3. Safety is the major concern in the IIRA infrastructure, and is to be
followed by security
IIRA-Architecture Patterns:
Different IIoT architecture implementation patterns are as follows:
1. Three-tier architecture pattern:
The three different layers
1. the edge layer,
2. the platform layer
3. the enterprise layer.
Edge layer: Edge layer gathers data from the edge nodes. The architecture
includes
breadth of distribution
governance
location
2. Platform layer: basically, it is concerning receiving, processing, and forwarding
control commands from the enterprise layer to the edge layer.
3. enterprise layer: Enterprise layer receives data flows from edge layer and
platform layer. The Enterprise layer implements
domain-specific applications,
decision support systems, and
provides interfaces to end-users.
concerns receiving data flows from the edge layer and the platform layer.
Gateway-mediated edge connectivity and management architecture pattern:
1) local control,
2) automation.
3) System of systems allows
4) complex systems,
5) monitoring, and
6) analytic applications
Layered Databus pattern is applicable in the field of
control,
local monitoring, and
analytics.
The databus communicates between applications and devices.
It allows interoperable communication between endpoints.
For communication between machines, another databus is used.
IIoT sensing
IIoT sensors are industrial sensors with integrated sensor and computing
functions that are connected to larger systems via wireless communications
technology. They are a key part of the industrial internet of things (IIoT), the
industrial extension within the internet of things (IoT): In this emerging
paradigm, the connected nature of the internet extends to the physical world,
where individual objects receive their IP address, technology, and wireless
connectivity. The increasing availability of compact, high-quality, affordable
sensors is a major driver for IIoT. This synergy between the digital and physical
worlds is particularly important for industrial applications, where sensors have
traditionally operated in isolation and required local monitoring.
Temperature Sensor Interfacing Circuit
voltage
Temperature sensor
M agnetostrictive sensor
materials
Torque sensor
Vacuum sensor
Speed sensor
given time
PIR sensor
cts infrared radiations coming from human body in its surrounding area
Image sensor
Ultrasonic sensor
nd dynamic body
detection
Optical sensor
Radiation sensor
Level sensor
Flow sensor
Touch sensor
Gas sensor
Industrial Communication
Real-time
Very low duty-cycle
Very low latency
Very low jitter
Industrial Ethernet
Industrial Ethernet protocols for real-time control and automation.
Used in manufacturing processes dealing with clock synchronization and
performance.
Fieldbus
Industrial Ethernet
1. ModBus-TCP
2. EtherCat
3. EtherNet/IP
4. Profinet
5. TSN
M odBus-TCP
Features of M odBus-TCP
l defines 2 units in the data frame: PDU (Protocol Data Unit ) and
ADU (Application Data Unit)
ADU is identified by a header called MBAP
EtherCat
Data exchange provide low duty cycle time of and low jitter for better
synchronization.
EtherNet/IP
Communication Type
Explicit
-critical information.
per packet.
-RTU
-Link
It is based on the standard IEC 61158. It was first started in Germany in late
1980s and then used by Siemens. It is a field-bus technology that supports
several protocols. It supports cyclic as well as acyclic data transmission,
isochronous messaging, and alarm-handling.
Variants of Profibus
iants:
environment).
It defines 2 layers:
the system.
and can
support branches.
spatially arranged I/O modules which connects to several sensors & actuators.
Features of Interbus
communication.
IIoT Networking
QoS 1 - Also known as "at least once" delivery. Retry is performed until the
acknowledgment of the message is received.
QoS 2 - Also known as "exactly once" delivery. Ensuring that the retry is
performed until the message is delivered exactly once
Advanced M essage Queuing Protocol (AM QP) - This is also based on the
publish/subscribe models like MQTT and XMPP. And, it supports two types of
the framework: one is the point to point communication and the other one is
multi-point communication and is typically used for application such as
financial applications, and digital finance. It uses a token-based mechanism for
flow control, which ensures that there is no buffer overflow at the receiving end.
So, flow control is all about the use of a token-based mechanism.
DDS RTPS - The full form of this thing is Distributed Data Service Real-Time
Publish and Subscribe. It is very much attractive for use in IoT networks, this
support Publish/Subscribe framework on top of the UDP transport layer
protocol. So, it is a data-centric binary protocol and this data in this context are
termed as “topics”. There are topics that mean like there are users, which
subscribe to a particular topic of interest and the listeners listen to these. There
is a single topic that may have multiple speakers of different priorities and this
supports enlisted QoS for data distribution in terms of data persistence,
maintaining, ensuring, delivery deadline, reliability, the freshness of data, and
in a different protocol. The application such as military, industrial, and
healthcare monitoring are the ones that find this particular protocol to be of
use.
UNIT-III
IIOT ANALYTICS
Big Data Analytics and Software Defined Networks, Machine Learning and Data Science, Julia
Programming, Data Management with Hadoop.
IoT data is just a curiosity, and it’s even useful if handled correctly. However,
given time, as more and more devices are added to IoT networks, the data
generated by these systems becomes overwhelming.
The real value of IoT is not just in connecting things but rather in the data
produced by those things, the new services you can enable via those connected
things, and the business insights that the data can reveal.
In the world of IoT, the creation of massive amounts of data from sensors is
common and one of the biggest challenges—not only from a transport
perspective but also from a data management standpoint.
Analysing large amount of data in the most efficient manner possible falls
under the umbrella of data analytics.
Data analytics must be able to offer actionable insights and knowledge from
data, no matter the amount or style, in a timely manner, or the full benefits of
IoT cannot be realized.
Example:
Modern jet engines are fitted with thousands of sensors that generate a
whopping 10GB of data per second may be equipped with around 5000
In fact, a single wing of a modern jumbo jet is equipped with 10,000 sensors.
The potential for a petabyte (PB) of data per day per commercial airplane is
not farfetched—and this is just for one airplane. Across the world, there are
approximately 100,000 commercial flights per day. The amount of IoT data
coming just from the commercial airline business is overwhelming.
IIoT Analytics: Data Science
modelling approach
Structured data means that the data follows a model or schema that defines
how the data is represented or organized, meaning it fits well with a traditional
relational database management system (RDBMS).
IoT sensor data often uses structured values, such as temperature, pressure,
humidity, and so on, which are all sent in a known format. Structured data is
easily formatted, stored, queried, and processed; for these reasons, it has been
the core type of data used for making business decisions.
From custom scripts to commercial software like Microsoft Excel and Tableau,
most people are familiar and comfortable with working with structured data.
Unstructured data lacks a logical schema for understanding and decoding the
data through traditional programming means.
Examples of this data type include text, speech, images, and video. As a general
rule, any data that does not fit neatly into a predefined data model is classified
as unstructured data. such as cognitive computing and machine learning, are
deservedly garnering a lot of attention.
Smart objects in IoT networks generate both structured and unstructured data.
Structured data is more easily managed and processed due to its welldefined
organization.
On the other hand, unstructured data can be harder to deal with and typically
requires very different analytics tools for processing the data.
Data saved to a hard drive, storage array, or USB drive is data at rest.
➢ From an IoT perspective, the data from smart objects is considered data
in motion as it passes through the network en route to its final destination.
➢ This is often processed at the edge, using fog computing. When data is
processed at the edge, it may be filtered and deleted or forwarded on for further
processing and possible storage at a fog node or in the data center.
➢ Tools with this sort of capability, such as Spark, Storm, and Flink, are
relatively nascent compared to the tools for analysing stored data.
Data at rest in IoT networks can be typically found in IoT brokers or in some
sort of storage array at the data center. Myriad tools, especially tools for
structured data in relational databases, are available from a data analytics
perspective.
The best known of these tools is Hadoop. Hadoop not only helps with data
processing but also data storage. IoT Data Analytics Overview
The true importance of IoT data from smart objects is realized only when the
analysis of the data leads to actionable business intelligence and insights.
Data analysis is typically broken down by the types of results that are
produced.
Descriptive: Descriptive data analysis tells you what is happening, either now or
in the past.
Diagnostic: When you are interested in the “why,” diagnostic data analysis can
provide the answer.
Both predictive and prescriptive analyses are more resource intensive and
increase complexity, but the value they provide is much greater than the value
from descriptive and diagnostic analysis.
Figure 7-4 illustrates the four data analysis types and how they rank as
complexity and value increase. You can see that descriptive analysis is the least
complex and at the same time offers the least value. On the other end,
prescriptive analysis provides the most value but is the most complex to
implement.
Most data analysis in the IoT space relies on descriptive and diagnostic
analysis, but a shift toward predictive and prescriptive analysis is
understandably occurring for most businesses and organizations.
Scaling problems: Due to the large number of smart objects in most IoT
networks that continually send data, relational databases can grow incredibly
large very quickly. This can result in performance issues that can be costly to
resolve, often requiring more hardware and architecture changes.
must be kept at a minimum. IoT data, however, is volatile in the sense that the
data model is likely to change and evolve over time.
Some other challenges:
• IoT also brings challenges with the live streaming nature of its data and
with managing data at the network level. Streaming data, which is generated as
smart objects transmit data, is challenging because it is usually of a very high
volume, and it is valuable only if it is possible to analyse and respond to it in
real-time.
Open SDN: Experience the power of open protocols as they orchestrate and
govern both virtual and physical devices, seamlessly directing the flow of data
packets.
Hybrid M odel SDN: Embrace the best of both worlds with the Hybrid Model
SDN. By seamlessly blending the realms of SDN and traditional networking,
this versatile approach enables the optimal selection of protocols for various
traffic types. Harness the power of Hybrid SDN as a phased implementation
strategy for a smooth transition into the world of SDN.
Enhanced Control with Unparalleled Speed and Flexibility: SDN elimin ates the
need for manual configuration of various hardware devices from different
vendors. Instead, developers can exert control over network traffic by
programming a software based controller adhering to open standards. This
approach empowers networking managers with the freedom to select
networking equipment and communicates with multiple hardware devices using
a single protocol via a centralized controller, resulting in remarkable speed and
flexibility.
Robust Security: SDN in IoT offers comprehensive visibility across the entire
network, presenting a holistic view of potential security threats. As the number
of intelligent devices connecting to the Internet continues to proliferate, SDN
surpasses traditional networking in terms of security advantages. Operators
can create distinct zones for devices requiring different security levels or
promptly isolate compromised devices to prevent the spread of infections
throughout the network.
M achine Learning
You need to record a set of predetermined sentences to help the tool match
well- known words to the sounds you make when you say the words. This
process is called machine learning.
ML is concerned with any process where the computer needs to receive a set of
data that is processed to help perform a task with more efficiency. ML is a vast
field but can be simply divided in two main categories: supervised and
unsupervised learning.
1. Unsupervised Learning
3. Reinforcement Learning
2. Supervised Learning
Unsupervised Learning
dataset, based on the inner structure of the data without looking into the
specific outcome.
Supervised learning
In supervised learning, the machine is trained with input for which there is a
known correct answer. For example, suppose that you are training a system to
recognize when there is a human in a mine tunnel.
A sensor equipped with a basic camera can capture shapes and return them to
a computing system that is responsible for determining whether the shape is a
human or something else (such as a vehicle, a pile of ore, a rock, a piece of
wood, and so on.)
With supervised learning techniques, hundreds or thousands of images are fed
into the machine, and each image is labeled (human or nonhuman in this case).
This is called the training set. An algorithm is used to determine common
parameters and common differences between the images.
The comparison is usually done at the scale of the entire image, or pixel by
pixel. Images are resized to have the same characteristics (resolution, color
depth, position of the central figure, and so on), and each point is analyzed.
Human images have certain types of shapes and pixels in certain locations.
and a deviation is calculated to determine how different the new image is from
the average human image and, therefore, the probability that what is shown is
a human figure. This process is called classification.
After training, the machine should be able to recognize human shapes. Before
real field deployments, the machine is usually tested with unlabelled pictures—
this is called the validation or the test set, depending on the ML system used—
to verify that the recognition level is at acceptable thresholds. If the machine
does not reach the level of success expected, more training is needed.
specific environment.
Data science :
Julia programming
With the help of multiple dispatch, the user can define function behavior across
many combinations of arguments.It has powerful shell that makes Julia able to
manage other processes easily.The user can cam call C function without any
wrappers or any special APIs.Julia provides an efficient support for Unicode.
It also provides its users the Lisp-like macros as well as other metaprogramming
processes.It provides lightweight green threading, i.e., coroutines.
The coding done in Julia is fast because there is no need of vectorization of code
for performance.
Open source
Distributed computation and parallelism possible
Support efficiently Unicode
Call c functions directly
Basic math
Assigning string
Use of $ sign for string interpolation
String concatenation
Data structures
1. Tuples
Dictionary
3. Arrays
Data M anagement
Hadoop
M apReduce
that process large amount of datasets in
parallel
-generation MapReduce
Hadoop cluster.
Namenode
The namenode is the commodity hardware that contains the GNU/Linux
operating system and the namenode software. It is a software that can be
run on commodity hardware. The system having the namenode acts as
the master server and it does the following tasks −
Datanode
The datanode is a commodity hardware having the GNU/Linux operating
system and datanode software. For every node (Commodity
hardware/System) in a cluster, there will be a datanode. These nodes
manage the data storage of their system.
Block
Generally the user data is stored in the files of HDFS. The file in a file
system will be divided into one or more segments and/or stored in
individual data nodes. These file segments are called as blocks. In other
words, the minimum amount of data that HDFS can read or write is
called a Block. The default block size is 64MB, but it can be increased as
per the need to change in HDFS configuration.
Goals of HDFS
Fault detection and recovery − Since HDFS includes a large number of
commodity hardware, failure of components is frequent. Therefore HDFS
should have mechanisms for quick and automatic fault detection and
recovery.
Step 1
You have to create an input directory.
Step 1
Initially, view the data from HDFS using cat command.
$ stop-dfs.sh
There are many more commands in "$HADOOP_HOME/bin/hadoop fs"
than are demonstrated here, although these basic operations will get you
started. Running ./bin/hadoop dfs with no additional arguments will list
all the commands that can be run with the FsShell system. Furthermore,
$HADOOP_HOME/bin/hadoop fs -help commandName will display a
short usage summary for the operation in question, if you are stuck.
Users and applications can retrieve data from Hadoop using various
query and analysis tools. SQL-like languages (e.g., Hive’s HQL), scripting
languages (e.g., Pig Latin), and programming languages (e.g., Java,
Python) can be used for data retrieval.
Data Security:
Metadata about data assets, such as data lineage, data definitions, and
data ownership, can be stored in data catalogs and metadata repositories
to aid in data discovery and usage.
Data Compression and Optimization: