The Data Truthfulness_ A Big Data understanding
The Data Truthfulness_ A Big Data understanding
newsletter)
(https://datascience.foundation)
14 28 3
Shubham Gagneja
System Engineer, Infosys Ltd., Hyderabad Campus, India
Santosh Kumar
Assistant Professor, School of Management Presidency University, Bangalore, India
As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. Data is considered as the most valuable
commodity on the globe, far ahead than the crude oil in economy list. The paper is an attempt to provide a complete understanding about the trustworthiness of
data in the big data framework. The current article provide the complete insight about the understanding, importance and structure of Big Data. The first of paper
will established the need of data and its truthfulness , while the latter part of the article explains big data history, framework, challenges and use case.
Introduction
As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. The data is consider as the most valuable
commodity on the globe, far ahead than the crude oil in economy list. The data is the new oil of the digital era and companies dealing in this. These titans—
Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft—look unstoppable. They are the five most valuable listed firms on the earth,
(Economist, 7 May 2017). All industries have welcomed this digital transformation, often betting their worth on visions mined from collected data. Data that is not
properly used, or unstructured on an inaccessible IT atoll can prove detrimental to the reliability of business process. The question arise about the truthfulness of
data, If internal key person of business can’t trust the data of their company, how can external stakeholders know they are in good hands? More than 80% time of
a Data professional is spent on data finding, cleansing, understanding and integrating to the business problem.
Business house want to use data as resource at epicenter of their decision-making processes to reduce mistakes and take full advantage of their core
competence. The size of data is growing with the higher speed than the growth in area data handling technology. The optimal use of Data inequalities between
organizations will become more glaring, moving from competitive edges to critical business advantages. We are talking about the management of huge data
flow in and around the businesses, the Big data. In Today’s real business cycle the impact of big data is not only evolving but increasing with rocket speed, as each
and every face of it as well as the machines works only on data. The management of data is crucial for success of business in digital era, so the job of Chief Data
Officers is not to manage but to maintain the trustworthiness of data in changing and evolving space.
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 1/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
data every day in form rhythm of heart, movement of blood, amount of calorie burn, movement of brain nervous etc. To understand the importance, let us check a
life of an ordinary college going student, his day starts with the wake up alarm from his smart phone, followed by peeping in his social media account, check the
class schedule of the day by digital interaction and so on.
Even the wash rooms are full of data producing machines, use of smart shower system works on Daily Information Impact to store the speed and the kind of
water we want on a daily basis, automatic temperature control toilet seats, smart geysers and many more. Similarly during travel to work place, we generate
gigabyte of data in form of music play, route navigation system, traffic control etc. All these huge data come together and form a big sea of data which can be
manage by the concept of big data only.
Our life is revolving around data, a huge amount of data knows as BIG DATA. The big question is to understand and use these data for decision making. Should be
trust on all these data? As every single activity of society and business is link to big data, the field is developing hurriedly at a massive rate. The next section will
provide detail insight about big data, its effect and handling.
In simple words big data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In today’s digital scenario,
commercial and social interactions among people, between people and machines, and among machines produce a constant stream of data to monitor and
analyze. Social interactions, mobile devices, facilities, equipment, R&D, simulations, and physical infrastructure all contribute to this endless flow. In aggregate, this
explosion in data sources is at the heart of big data. The Big Data technology could include a range of appropriate or fit-for-purpose software and hardware that
are able to address the scenarios depicted in figure 1 below.
In short, such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
Structured
Unstructured
Semi-Structured
Structured Data:
Data which is stored, accessed and processed in the form of fixed format is termed as structured data. Though there are various techniques to handle such kind of
data but as big data came into picture these techniques lack in the sizes of volumes of data. One such example of a structured data is the relational database and
with its increase in huge number previously techniques such as SQL is also facing issues to handle it as it went to zettabytes of data .
Unstructured Data:
Any Data with unknown form or any structure is termed as unstructured data. Nowadays big data mainly comprises of unstructured data only as not only its size
is huge but to handle it comes with various challenges to get the desired out of it or we say get the appropriate value of it. Combinations of music, videos, text
messages etc are all unstructured data. The most or we say biggest example of unstructured data is data by social media such as Whatsapp, instagram, facebook
etc. Even the data we searched on google is one type of unstructured data.
We use
Semi- cookies Data:
Structured
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 2/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
This type of data contains both forms of data (i.e. Structured and Unstructured). It means the information that does not present in relational database but have
some organizational properties that makes easier to analyze. While doing some work on it can be stored as a relational databases but in some cases it’s hard to
that. One such example of semi-structured data is a XML page.
A type of a semi-structured data which has some tags and includes the database information in it.
Initially Big data came into picture around 1960’s and 1970’s when the world of data was just getting started with the first data centers and the development of
the relational database. But that time no one cares about the concept of it.
After that Big Data existed in various forms such as in 1992 Teradata systems were the first to analyze and store 1 terabyte of data. As previously Hard disks are
typically of 2.25 GB .Similarly in 2007 they installed first petabyte of relational database system and after that it increases as data increase which needs to be
stored and for that large storage devices needed and finally people come to know about big data . As of size methodology: smallest unit of data is bit
1 Byte = 8 bits
1 Megabyte = 1024 KB
1 Gigabyte = 1024 MB
1 Terabyte = 1024 GB
1 Petabyte = 1024 TB
But Big data really becomes talk of town in 2005 after Mark Zuckerberg launched Facebook and realize that how much data is creating out of it and later on by
YouTube and other online services.
To Handle or work with this amount of data various techniques and frameworks were also discovered in the year 2004-2005 like Google introduced a map
reduced technique to process huge amounts of data, and Hadoop (an open-source framework created specifically to store and analyze big data sets) was
developed that same year. NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was essential for the growth of big data because they make big data
easier to work with and cheaper to store. In the years since then, the volume of big data has skyrocketed. Users are still generating huge amounts of data—but it’s
not just humans who are doing it.
While big data has come far, its usefulness is only just beginning. Cloud computing has expanded big data possibilities even further. The cloud offers truly elastic
scalability, where developers can simply spin up ad hoc clusters to test a subset of data.
To have a better understanding we can say that Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is
known as the three Vs.
Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data(data in the form of photos, videos,
messages etc). This can be data of unknown value, such as Twitter data feeds, facebook, daily music etc. For some organizations, this might be tens of terabytes of
data. For others, it may be hundreds of petabytes.
Velocity: It is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being
written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
Variety: This refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big
data, data comes in new unstructured data types. Unstructured and semi structured data types, such as text, audio, and video, require additional preprocessing to
derive meaning and support metadata.
Due to massive increase in the data two additional Vs have emerged that are value and veracity.
Value: Each data has an important value to it but it’s of no use until that value is not discovered or why that data is present that’s is the reason the Value has
added to the functionality of big data or we can say that the value determines which to prefer and which not. Think of some of the world’s biggest tech
companies. A large part of the value they offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products.
Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s an entire discovery process that requires insightful analysts, business
users, and executives, who ask the right questions, recognize patterns, make informed assumptions, and predict behavior.
Veracity: As the value holds its importance one aspect is that how truthful is your data is and can we rely on it or not is the veracity of the big data, with the
increased volumes big data now cheaper and more accessible the users can make more accurate and precise business decisions and as a result of it the correct
We use cookies
planning, and success can be achieved easily.
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 3/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
Big Data Use cases
With the help of big data and the techniques to store/work/handle such data can help you address a range of business activities, from customer services to
analytics, from producing a product to earn maximum profit out of it. Some Use cases related to it are mentioned below:
Product Development: Companies like Netflix, Amazon prime use big data to anticipate customer demand. This is very helpful for the end users and even for
product owners as these days this kind of business is trending very fast. People browse daily many things like movies, TV-shows, music etc. With the help of big
data and its techniques its analyze the user interest and create a cache out of it and gets an idea about the user demand which in turn helps the customer to
browse according to their interest rather to think and search for it .
Predictive Maintenance: Factors that can predict mechanical failures may be deeply buried in structured data, such as the year, make, and model of equipment,
as well as in unstructured data that covers millions of log entries, sensor data, error messages, and engine temperature. By analyzing these indications of potential
issues before the problems happen, organizations can deploy maintenance more cost effectively and maximize parts and equipment uptime so that the
efficiency can be increased with fewer failures.
Customer Experience: A clearer view of customer experience is more possible now than ever before. Big data enables you to gather data from social media, web
visits, call logs, and other sources to improve the interaction experience and maximize the value delivered. Start delivering personalized offers, reduce customer
effort, and handle issues more accurately.
Similarly there are many other use cases which shows that with the help of big data and its techniques the task can become much easier and experience of the
end user increases drastically and for the owner they can handle huge sets of information precisely and earn more profit.
First is the data itself though many big data technologies such as Hadoop have been developed to handle and store it but still with increased data on a massive
scale even these face issues and organizations still struggle to keep pace with the data and find a effectively way to store it.
And as of today it’s also not enough to store the data, data must be used to be valuable which depends on curation (selecting relevant out of massive sets).
Cleaning of the data or to have the data that is relevant to the end user and organized requires lots and lots work, meaningful analysis. Engineers spend around
70 percent of their time curating and preparing data for their client according to their needs.
Initially Hadoop was developed to handle the big data ,later on Apache spark was introduced to store it but it’s is still face some challenges and issues and Now
days combination of both the frameworks i.e. Hadoop and Apache spark are used and are till now most effective technologies to work on big data. Keeping up
with big data technology is an ongoing challenge.
As we understand that what big data is, why there is a need to understand big data and let understand how big data works:
Integrate
Big data brings the data from many discrete sources and applications, Mechanism such as ETL (extract, transform and load) , first we extract the data and then the
extracted data is transformed according to the desired requirement of the work the analyst needs to do on that data and finally loading of data is done . This is
one strategy and technology and there are various others to analyze big data sets at terabyte, or even petabyte, scale.
During integration part basically bringing of data, processing it and make the data available in a form that business analyst can be started with.
Manage
This is the second stage as after loading we need to manage that data so that we can work on it as we know that Big data requires storage, storage can be done in
any form and bring the desired requirements and necessary process engines to those data sets on an on-demand basis. Nowadays the best popular storage
solution is on cloud as its support all the current compute requirements and enables to use the resources as needed for example Google Cloud platform, Azure
Analyze
This is the final stage as your investment in big data pays off when you analyze and act on the data and work on it according to the user demands and
requirements. Finding new discoveries and analyze the data how it is and what sort of new research the data can give by visual analysis of various data sets and
even build something new out of it .This is basically how big data actually works and gives us numerous outstanding results .
Conclusion:
Big data framework is becoming need of the day for every organizations, it is not only a big opportunities for institutions but will also create pressure on cheap
information officers. Big data provide complete framework for making IT a more valued asset to the Businesses. Big data implementation projects are at the
frontier of the business where many of the most significant business expansion or cost reduction opportunities lie. Taking a lead in Big Data implementation
provides a strategically high competitive importance to the business by data management and IT infrastructure strategy which require out-of-the-box thinking as
well as moving outside the traditional IT comfort zone.
References:
We use cookies
https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
Accept
(https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data)retrieve on 12/06/2019 at 10: 15 Pm
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 4/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
https://www.upgrad.com/blog/what-is-big-data-types-characteristics-benefits-and-examples/ (https://www.upgrad.com/blog/what-is-big-data-types-
characteristics-benefits-and-examples/) retrieve on 13/06/2019 at 11: 15 Pm
https://intellipaat.com/blog/7-big-data-examples-application-of-big-data-in-real-life/ (https://intellipaat.com/blog/7-big-data-examples-application-of-big-data-in-
real-life/) retrieve on 15/06/2019 at 10: 15 Pm
https://www.cisco.com/c/en_in/solutions/data-center-virtualization/big-data/index.html (https://www.cisco.com/c/en_in/solutions/data-center-virtualization/big-
data/index.html) retrieve on 15/06/2019 at 08: 15 Pm
(https://datascience.foundation/downloadpdf/79/whitepaper)
If you found this Whitepaper interesting, why not review the other Whitepapers in our archive (https://datascience.foundation/sciencewhitepaper).
Comments:
Abhishek Mishra
I agree big data implementation projects are at the frontier of the business where many of the most significant business expansion or cost reduction
opportunities lie
Sureshkumar Sundaram
Big data is a term that describes the large volume of data – both structured and unstructured that inundates a business on a day-to-day basis. .It's what
organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.The above article
clearly explain the Big Data concepts.Very well written.
Search...
Categories
Data Science (https://datascience.foundation/sciencewhitepaper/category/data-science)
We useSecurity
Data cookies(https://datascience.foundation/sciencewhitepaper/category/data-security)
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 5/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
Analytics (https://datascience.foundation/sciencewhitepaper/category/analytics)
Robotics (https://datascience.foundation/sciencewhitepaper/category/robotics)
Visualisation (https://datascience.foundation/sciencewhitepaper/category/visualisation)
AN ADAPTIVE MODEL FOR RUNWAY DETECTION AND LOCALIZATION IN UNMANNED AERIAL VEHICLE (https://datascience.foundation/sciencewhitepaper/an-
adaptive-model-for-runway-detection-and-localization-in-unmanned-aerial-vehicle)
12 November 2021
05 November 2021
05 November 2021
Data is a New oil : A step into WSN enabled IoT and security (https://datascience.foundation/sciencewhitepaper/data-is-a-new-oil-a-step-into-wsn-enabled-iot-
and-security)
26 October 2021
We use cookies
Data Driven Business Models in FMCG & Retail (https://datascience.foundation/sciencewhitepaper/data-driven-business-models-in-fmcg-retail)
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 6/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
The transformational shift in educational outcomes in London 2003 to 2013: the contribution of local authorities
(https://datascience.foundation/sciencewhitepaper/the-transformational-shift-in-educational-outcomes-in-london-2003-to-2013-the-contribution-of-local-
authorities)
We use cookies © 2022 Data science Foundation. All rights reserved. Data S.F. Limited 09624670
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 7/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
We use cookies
Accept
https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 8/8