Big Data-Hadoop
Big Data-Hadoop
Deepanshu tyagi
·
Follow
4 min read
Big data means a collection of data sets which are large and complex that
describe a massive volume of both structured and unstructured data which
makes it difficult to process using traditional database and software techniques.
The challenges would include analysis, curation, capture, search, sharing, storage
and privacy violations.
Structured
Semi-Structured
Unstructured
https://brilliantprogrammer.tech/#/blog/3
Daily we are uploading millions of bytes of big data and big data is growing
exponentially.90% of big data is created in last two years.
In today’s generation big data is used everywhere and In fact we are in the
technology generation where new technology came in just one month or two
months.
Now let me tell you few challenges which come along with Big Data:
Data Quality — The problem here is the 4th V i.e. Veracity. The
data here is very messy, inconsistent and incomplete. Dirty data
cost $600 billion to the companies every year in the United States.
Discovery — Finding insights on Big Data is like finding a needle
in a haystack. Analyzing Petabytes of data using extremely powerful
algorithms to find patterns and insights are very difficult.
Storage — The more data and organization has the more complex
the problems of managing it can become. The question that arises
here is “Where to store it?”We need a storage system which can
easily scale up or down on-demand.
Analytics — In the case of Big Data, most of the time we are
unaware of the kind of data we are dealing with, so analyzing that
data is even more difficult.
Security — Since the data is huge in size, keeping it secure is
another challenge. It includes user authentication, restricting
access based on a user, recording data access histories, proper use
of data encryption etc.