Data Analytics and Hadoop
Data Analytics and Hadoop
Structured data means that the data follows a model or schema that defines
how the data is represented or organized, meaning it fits well with a traditional
relational database management system (RDBMS).
Unstructured data lacks a logical schema for understanding and decoding the
data through traditional programming means. Examples of this data type include
text, speech, images, and video. As a general rule, any data that does not fit
neatly into a predefined data model is classified as unstructured data.
Scaling problems: Due to the large number of smart objects in most IoT networks
that continually send data, relational databases can grow incredibly large very
quickly.
Volatility of data: With relational databases, it is critical that the schema be
designed correctly from the beginning. Changing it later can slow or stop the
database from operating
Machine Learning
One of the core subjects in IoT is how to makes sense of the data that is
generated. ML is indeed central to IoT. Data collected by smart objects needs to be
analyzed, and intelligent actions need to be taken based on these analyses.
Unsupervised Learning
The computing process associated with decision making is called unsupervised
learning. This type of learning is unsupervised because there is not a “good” or
“bad” answer known in advance.
Example: Aircraft fault detection.
Neural networks
Neural networks are ML methods that mimic the way the
human brain works. When you look at a human figure,
multiple zones of your brain are activated to recognize
colors, movements, facial expressions, and so on. Your
brain combines these elements to conclude that the
shape you are seeing is human. Neural networks mimic
the same logic. The information goes through different
algorithms (called units), each of which is in charge of
processing an aspect of the information.
Hadoop
NameNodes: They coordinate where the DataNodes: These are the servers where the
data is stored, and maintain a map of data is stored at the direction of the
where each block of data is stored and NameNode. It is common to have many
where it is replicated. DataNodes in a Hadoop cluster to store the
data.
MAP Reduce
Flexible NetFlow (FNF) and IETF IPFIX (RFC 5101, RFC 5102) are examples
of protocols that are widely used for networks.
Flexible NetFlow overview