Big Data Analytics
Big Data Analytics
Let's have a look at the data generated per minute on the internet
Youtube
Email
4.5 million videos
188 million
are watched on
emails are sent
Youtube
So. How do you Classify any data as big data
this possible with the concept of 5 v's
•Volume
•Velocity
•Variety
•Veracity and
•Value
Let us understand this with an example from the health Care industry
Hospitals and Clinics across the world generate massive Volumes of data.
2,314 Exabytes of data are collected annually in form of patient
records and test results.
All this data is generated, at a very high speed, which attributes to the
velocity of big data
Variety refers to the Various data type such as structured, semi-
structured and Unstructured.
Example includes Excel records log files and X-ray images.
• Excel records -> structured
• Log File -> Semi-structured
• X-ray image -> unstructured
Accuracy and trustworthiness of the generated data is termed as
veracity
Analyzing all this data will benefit of the medical sector by enabling
faster disease detection ,better treatment, and reduced Cost this is
known as the value of Big data
Volume Velocity Variety
Veracity Value
• Cassandra
• hadoop &
• spark
Let us take hadoop as an example, and see how hadoop stores and
processes the big data.
Hadoop uses a distributed file system, Known as hadoop
distributed file system, to store big data if you have a huge file your file
will be broken down into & smaller Chunks, and stored in various
machines.
128MB
300MB 44MB
128MB
Not only that when you break the flie you also make copies of it
which goes into different nodes this. store your big data in a distributed
way, and make sure that even if one machine fails your data is Safe on
another
128MB A B
128MB A C
44MB C B
128MB A B
128MB A C
44MB C B
Mapreduce technique is used to process big data a lengthy task A is
broken into smaller task B, C and D
Now instead of one machine three machines take up each task
and complete in a parallel fashion and assemble the results at the end.
Task A
Result
Due to this the processing becomes easy and fast this is known as
parallel processing.
Now that we have stored and processed our big data We can
analyze this data for numerous applications
Similarly big data also helped with disaster management during hurricare
Sandy in 2012
It was used to gain better understanding of the Storm's effect on the east
coast of the U.S and necessary measure were taken it could Predict the
hurricane's landfall five days in advance. which wasn't possible earlier.
These are some of the Clear indication of how valuable big data can be
once it is accurately processed and analyzed
What is Hadoop big data tool in IoT?
Apache Hadoop is an open source framework that is used to efficiently
store and process large datasets ranging in size from gigabytes to
petabytes of data. Instead of using one large computer to store and
process the data, Hadoop allows clustering multiple computers to
analyze massive datasets in parallel more quickly.