0% found this document useful (0 votes)
6 views

The Data Truthfulness_ A Big Data understanding

The document discusses the importance and trustworthiness of data in the context of big data, highlighting its role as a critical resource for businesses. It covers the history, types, and challenges of big data, as well as its applications in various fields such as product development and customer experience. The paper emphasizes the need for effective data management and the significance of data veracity in making informed business decisions.

Uploaded by

Guadalupe Rios
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

The Data Truthfulness_ A Big Data understanding

The document discusses the importance and trustworthiness of data in the context of big data, highlighting its role as a critical resource for businesses. It covers the history, types, and challenges of big data, as well as its applications in various fields such as product development and customer experience. The paper emphasizes the need for effective data management and the significance of data veracity in making informed business decisions.

Uploaded by

Guadalupe Rios
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

11/1/22 12:49 The Data Truthfulness: A Big Data understanding

Login (https://datascience.foundation/login) | Join Us (https://datascience.foundation/joinus) | Subscribe to Newsletter (https://datascience.foundation/subscribe-for-

newsletter)

Login to View News Feed and Manage Profile

(https://datascience.foundation)

The Data Truthfulness: A Big Data understanding


 A DSF Whitepaper  14 September 2019  Santosh Kumar

Author Profile (https://datascience.foundation/individual/view/2228)


Other Articles (https://datascience.foundation/portfolio/santoshkumar)
Follow (17)

Share with your network:

 14  28 3

The authors of this paper are:

Shubham Gagneja
System Engineer, Infosys Ltd., Hyderabad Campus, India

Santosh Kumar
Assistant Professor, School of Management Presidency University, Bangalore, India

As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. Data is considered as the most valuable
commodity on the globe, far ahead than the crude oil in economy list. The paper is an attempt to provide a complete understanding about the trustworthiness of
data in the big data framework. The current article provide the complete insight about the understanding, importance and structure of Big Data. The first of paper
will established the need of data and its truthfulness , while the latter part of the article explains big data history, framework, challenges and use case.

Introduction

As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. The data is consider as the most valuable
commodity on the globe, far ahead than the crude oil in economy list. The data is the new oil of the digital era and companies dealing in this. These titans—
Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft—look unstoppable. They are the five most valuable listed firms on the earth,
(Economist, 7 May 2017). All industries have welcomed this digital transformation, often betting their worth on visions mined from collected data. Data that is not
properly used, or unstructured on an inaccessible IT atoll can prove detrimental to the reliability of business process. The question arise about the truthfulness of
data, If internal key person of business can’t trust the data of their company, how can external stakeholders know they are in good hands? More than 80% time of
a Data professional is spent on data finding, cleansing, understanding and integrating to the business problem.

Business house want to use data as resource at epicenter of their decision-making processes to reduce mistakes and take full advantage of their core
competence. The size of data is growing with the higher speed than the growth in area data handling technology. The optimal use of Data inequalities between
organizations will become more glaring, moving from competitive edges to critical business advantages. We are talking about the management of huge data
flow in and around the businesses, the Big data. In Today’s real business cycle the impact of big data is not only evolving but increasing with rocket speed, as each
and every face of it as well as the machines works only on data. The management of data is crucial for success of business in digital era, so the job of Chief Data
Officers is not to manage but to maintain the trustworthiness of data in changing and evolving space.

What is data and why we are talking about it!




In more specific term, Data is information for businesses but in a general concept Data is part of daily life. Data was referred as the fact coming from some existing
knowledge base to solve critical problems of business before some decades. But now, we are living in ocean of data. As a much advanced digital life people are
becoming smart and more digital instead of doing all things manual they prefer to do all digital whether it’s a business places, household work, or even sleeping
We use cookies
at night, the whole day to day life covers with data and is not getting stopped instead its getting increased higher and higher. Even our body produce a millions of
Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 1/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
data every day in form rhythm of heart, movement of blood, amount of calorie burn, movement of brain nervous etc. To understand the importance, let us check a
life of an ordinary college going student, his day starts with the wake up alarm from his smart phone, followed by peeping in his social media account, check the
class schedule of the day by digital interaction and so on.

Even the wash rooms are full of data producing machines, use of smart shower system works on Daily Information Impact to store the speed and the kind of
water we want on a daily basis, automatic temperature control toilet seats, smart geysers and many more. Similarly during travel to work place, we generate
gigabyte of data in form of music play, route navigation system, traffic control etc. All these huge data come together and form a big sea of data which can be
manage by the concept of big data only.

Our life is revolving around data, a huge amount of data knows as BIG DATA. The big question is to understand and use these data for decision making. Should be
trust on all these data? As every single activity of society and business is link to big data, the field is developing hurriedly at a massive rate. The next section will
provide detail insight about big data, its effect and handling.

Big Data Understanding

In simple words big data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In today’s digital scenario,
commercial and social interactions among people, between people and machines, and among machines produce a constant stream of data to monitor and
analyze. Social interactions, mobile devices, facilities, equipment, R&D, simulations, and physical infrastructure all contribute to this endless flow. In aggregate, this
explosion in data sources is at the heart of big data. The Big Data technology could include a range of appropriate or fit-for-purpose software and hardware that
are able to address the scenarios depicted in figure 1 below.

Figure 1: Big Data Framework

In short, such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Types of Big Data:

Structured

Unstructured

Semi-Structured

Structured Data:

Data which is stored, accessed and processed in the form of fixed format is termed as structured data. Though there are various techniques to handle such kind of
data but as big data came into picture these techniques lack in the sizes of volumes of data. One such example of a structured data is the relational database and
with its increase in huge number previously techniques such as SQL is also facing issues to handle it as it went to zettabytes of data .

A company employment database:

Employee ID Employee Name Gender Job Level/Designation Annual Salary (LPA)

125250 Raghav Grover Male JL3/System Engineer 3.9

75580 Sagar Dingerja Male JL4/Senior System Engineer 4.4

74963 Sakshi Mittal Female JL5/Technology Analyst 5.2

Unstructured Data:

Any Data with unknown form or any structure is termed as unstructured data. Nowadays big data mainly comprises of unstructured data only as not only its size



is huge but to handle it comes with various challenges to get the desired out of it or we say get the appropriate value of it. Combinations of music, videos, text
messages etc are all unstructured data. The most or we say biggest example of unstructured data is data by social media such as Whatsapp, instagram, facebook
etc. Even the data we searched on google is one type of unstructured data.

We use
Semi- cookies Data:
Structured
Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 2/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding

This type of data contains both forms of data (i.e. Structured and Unstructured). It means the information that does not present in relational database but have
some organizational properties that makes easier to analyze. While doing some work on it can be stored as a relational databases but in some cases it’s hard to
that. One such example of semi-structured data is a XML page.

Shubham Gagneja Male24

Aarzoo Mohshin Female26

A type of a semi-structured data which has some tags and includes the database information in it.

History of Big Data:

Initially Big data came into picture around 1960’s and 1970’s when the world of data was just getting started with the first data centers and the development of
the relational database. But that time no one cares about the concept of it.

After that Big Data existed in various forms such as in 1992 Teradata systems were the first to analyze and store 1 terabyte of data. As previously Hard disks are
typically of 2.25 GB .Similarly in 2007 they installed first petabyte of relational database system and after that it increases as data increase which needs to be
stored and for that large storage devices needed and finally people come to know about big data . As of size methodology: smallest unit of data is bit

1 bit = Single 1 or 0, binary unit

1 Byte = 8 bits

1 Kilobyte = 1024 bytes

1 Megabyte = 1024 KB

1 Gigabyte = 1024 MB

1 Terabyte = 1024 GB

1 Petabyte = 1024 TB

But Big data really becomes talk of town in 2005 after Mark Zuckerberg launched Facebook and realize that how much data is creating out of it and later on by
YouTube and other online services.

To Handle or work with this amount of data various techniques and frameworks were also discovered in the year 2004-2005 like Google introduced a map
reduced technique to process huge amounts of data, and Hadoop (an open-source framework created specifically to store and analyze big data sets) was
developed that same year. NoSQL also began to gain popularity during this time.

The development of open-source frameworks, such as Hadoop (and more recently, Spark) was essential for the growth of big data because they make big data
easier to work with and cheaper to store. In the years since then, the volume of big data has skyrocketed. Users are still generating huge amounts of data—but it’s
not just humans who are doing it.

While big data has come far, its usefulness is only just beginning. Cloud computing has expanded big data possibilities even further. The cloud offers truly elastic
scalability, where developers can simply spin up ad hoc clusters to test a subset of data.

Three Vs of Big Data:

To have a better understanding we can say that Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is
known as the three Vs.

Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data(data in the form of photos, videos,
messages etc). This can be data of unknown value, such as Twitter data feeds, facebook, daily music etc. For some organizations, this might be tens of terabytes of
data. For others, it may be hundreds of petabytes.

Velocity: It is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being
written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.

Variety: This refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big
data, data comes in new unstructured data types. Unstructured and semi structured data types, such as text, audio, and video, require additional preprocessing to
derive meaning and support metadata.

Due to massive increase in the data two additional Vs have emerged that are value and veracity.

Value: Each data has an important value to it but it’s of no use until that value is not discovered or why that data is present that’s is the reason the Value has
added to the functionality of big data or we can say that the value determines which to prefer and which not. Think of some of the world’s biggest tech
companies. A large part of the value they offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products.
Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s an entire discovery process that requires insightful analysts, business
users, and executives, who ask the right questions, recognize patterns, make informed assumptions, and predict behavior. 

Veracity: As the value holds its importance one aspect is that how truthful is your data is and can we rely on it or not is the veracity of the big data, with the
increased volumes big data now cheaper and more accessible the users can make more accurate and precise business decisions and as a result of it the correct
We use cookies
planning, and success can be achieved easily.
Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 3/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding
Big Data Use cases

With the help of big data and the techniques to store/work/handle such data can help you address a range of business activities, from customer services to
analytics, from producing a product to earn maximum profit out of it. Some Use cases related to it are mentioned below:

Product Development: Companies like Netflix, Amazon prime use big data to anticipate customer demand. This is very helpful for the end users and even for
product owners as these days this kind of business is trending very fast. People browse daily many things like movies, TV-shows, music etc. With the help of big
data and its techniques its analyze the user interest and create a cache out of it and gets an idea about the user demand which in turn helps the customer to
browse according to their interest rather to think and search for it .

Predictive Maintenance: Factors that can predict mechanical failures may be deeply buried in structured data, such as the year, make, and model of equipment,
as well as in unstructured data that covers millions of log entries, sensor data, error messages, and engine temperature. By analyzing these indications of potential
issues before the problems happen, organizations can deploy maintenance more cost effectively and maximize parts and equipment uptime so that the
efficiency can be increased with fewer failures.

Customer Experience: A clearer view of customer experience is more possible now than ever before. Big data enables you to gather data from social media, web
visits, call logs, and other sources to improve the interaction experience and maximize the value delivered. Start delivering personalized offers, reduce customer
effort, and handle issues more accurately.

Similarly there are many other use cases which shows that with the help of big data and its techniques the task can become much easier and experience of the
end user increases drastically and for the owner they can handle huge sets of information precisely and earn more profit.

Big Data Challenges:

First is the data itself though many big data technologies such as Hadoop have been developed to handle and store it but still with increased data on a massive
scale even these face issues and organizations still struggle to keep pace with the data and find a effectively way to store it.

And as of today it’s also not enough to store the data, data must be used to be valuable which depends on curation (selecting relevant out of massive sets).
Cleaning of the data or to have the data that is relevant to the end user and organized requires lots and lots work, meaningful analysis. Engineers spend around
70 percent of their time curating and preparing data for their client according to their needs.

Initially Hadoop was developed to handle the big data ,later on Apache spark was introduced to store it but it’s is still face some challenges and issues and Now
days combination of both the frameworks i.e. Hadoop and Apache spark are used and are till now most effective technologies to work on big data. Keeping up
with big data technology is an ongoing challenge.

How Big Data Works:

As we understand that what big data is, why there is a need to understand big data and let understand how big data works:

Basically to work on big data involves three stages:

Integrate

Big data brings the data from many discrete sources and applications, Mechanism such as ETL (extract, transform and load) , first we extract the data and then the
extracted data is transformed according to the desired requirement of the work the analyst needs to do on that data and finally loading of data is done . This is
one strategy and technology and there are various others to analyze big data sets at terabyte, or even petabyte, scale.

During integration part basically bringing of data, processing it and make the data available in a form that business analyst can be started with.

Manage

This is the second stage as after loading we need to manage that data so that we can work on it as we know that Big data requires storage, storage can be done in
any form and bring the desired requirements and necessary process engines to those data sets on an on-demand basis. Nowadays the best popular storage
solution is on cloud as its support all the current compute requirements and enables to use the resources as needed for example Google Cloud platform, Azure

Analyze

This is the final stage as your investment in big data pays off when you analyze and act on the data and work on it according to the user demands and
requirements. Finding new discoveries and analyze the data how it is and what sort of new research the data can give by visual analysis of various data sets and
even build something new out of it .This is basically how big data actually works and gives us numerous outstanding results .

Conclusion:

Big data framework is becoming need of the day for every organizations, it is not only a big opportunities for institutions but will also create pressure on cheap
information officers. Big data provide complete framework for making IT a more valued asset to the Businesses. Big data implementation projects are at the
frontier of the business where many of the most significant business expansion or cost reduction opportunities lie. Taking a lead in Big Data implementation
provides a strategically high competitive importance to the business by data management and IT infrastructure strategy which require out-of-the-box thinking as
well as moving outside the traditional IT comfort zone.

References:
We use cookies
https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
Accept
(https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data)retrieve on 12/06/2019 at 10: 15 Pm

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 4/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding

https://www.upgrad.com/blog/what-is-big-data-types-characteristics-benefits-and-examples/ (https://www.upgrad.com/blog/what-is-big-data-types-
characteristics-benefits-and-examples/) retrieve on 13/06/2019 at 11: 15 Pm

https://www.sciencedirect.com/topics/computer-science/big-data-system (https://www.sciencedirect.com/topics/computer-science/big-data-system) retrieve on


13/06/2019 at 12: 15 Pm

https://intellipaat.com/blog/7-big-data-examples-application-of-big-data-in-real-life/ (https://intellipaat.com/blog/7-big-data-examples-application-of-big-data-in-
real-life/) retrieve on 15/06/2019 at 10: 15 Pm

https://www.cisco.com/c/en_in/solutions/data-center-virtualization/big-data/index.html (https://www.cisco.com/c/en_in/solutions/data-center-virtualization/big-
data/index.html) retrieve on 15/06/2019 at 08: 15 Pm

Rate this Whitepaper


Rate 1 - 10 by clicking on a star

          (9 Ratings) (2 Comments) (8140 Views)

 (https://datascience.foundation/downloadpdf/79/whitepaper)

If you found this Whitepaper interesting, why not review the other Whitepapers in our archive (https://datascience.foundation/sciencewhitepaper).

Login to Comment and Rate (https://datascience.foundation/login#https://datascience.foundation/sciencewhitepaper/the-dat

Comments:

Abhishek Mishra

 19 Apr 2020  05:37:53 PM

I agree big data implementation projects are at the frontier of the business where many of the most significant business expansion or cost reduction
opportunities lie

Sureshkumar Sundaram

 26 Apr 2020  04:37:12 AM

Big data is a term that describes the large volume of data – both structured and unstructured that inundates a business on a day-to-day basis. .It's what
organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.The above article
clearly explain the Big Data concepts.Very well written.

Go to discussion page (https://datascience.foundation/discussion/data-science/the-data-truthfulness-a-big-data-


understanding)

Search... 

Categories


 Data Science (https://datascience.foundation/sciencewhitepaper/category/data-science)

 We useSecurity
Data cookies(https://datascience.foundation/sciencewhitepaper/category/data-security)

Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 5/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding

 Analytics (https://datascience.foundation/sciencewhitepaper/category/analytics)

 Machine Learning (https://datascience.foundation/sciencewhitepaper/category/machine-learning)

 Artificial Intelligence (https://datascience.foundation/sciencewhitepaper/category/artificial-intelligence)

 Robotics (https://datascience.foundation/sciencewhitepaper/category/robotics)

 Visualisation (https://datascience.foundation/sciencewhitepaper/category/visualisation)

 Internet of Things (https://datascience.foundation/sciencewhitepaper/category/internet-of-things)

 People & Leadership Skills (https://datascience.foundation/sciencewhitepaper/category/people-&-leadership)

 Other Topics (https://datascience.foundation/sciencewhitepaper/category/other-topics)

Top Active Contributors (https://datascience.foundation/contributors)

Balakrishnan Subramanian (https://datascience.foundation/portfolio/balakrishnansubramanian)

Abhishek Mishra (https://datascience.foundation/portfolio/abhishekmishra)

Mayank Tripathi (https://datascience.foundation/portfolio/mayanktripathi)

Michael Baron (https://datascience.foundation/portfolio/michaelbaron)

Santosh Kumar (https://datascience.foundation/portfolio/santoshkumar)

Recent Posts (https://datascience.foundation/sciencewhitepaper)

AN ADAPTIVE MODEL FOR RUNWAY DETECTION AND LOCALIZATION IN UNMANNED AERIAL VEHICLE (https://datascience.foundation/sciencewhitepaper/an-
adaptive-model-for-runway-detection-and-localization-in-unmanned-aerial-vehicle)

12 November 2021

Deep Learning (https://datascience.foundation/sciencewhitepaper/deep-learning)

05 November 2021

Machine Learning (https://datascience.foundation/sciencewhitepaper/machine-learning)

05 November 2021

Data is a New oil : A step into WSN enabled IoT and security (https://datascience.foundation/sciencewhitepaper/data-is-a-new-oil-a-step-into-wsn-enabled-iot-
and-security)

26 October 2021

Highest Rated Posts

DEEP LEARNING: FIGHTING COVID-19 WITH NEURAL NETWORKS (https://datascience.foundation/sciencewhitepaper/deep-learning-fighting-covid-19-with-neural-


networks)



Understanding Imbalanced Datasets and techniques for handling them (https://datascience.foundation/sciencewhitepaper/understanding-imbalanced-datasets-




and-techniques-for-handling-them)


We use cookies
Data Driven Business Models in FMCG & Retail (https://datascience.foundation/sciencewhitepaper/data-driven-business-models-in-fmcg-retail)
Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 6/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding


The transformational shift in educational outcomes in London 2003 to 2013: the contribution of local authorities
(https://datascience.foundation/sciencewhitepaper/the-transformational-shift-in-educational-outcomes-in-london-2003-to-2013-the-contribution-of-local-
authorities)



Graph Analytics and Big Data (https://datascience.foundation/sciencewhitepaper/graph-analytics-and-big-data)




We use cookies © 2022 Data science Foundation. All rights reserved. Data S.F. Limited 09624670
Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 7/8
11/1/22 12:49 The Data Truthfulness: A Big Data understanding

Site By-Peppersack (https://www.peppersack.com/)


We use cookies

Accept

https://datascience.foundation/sciencewhitepaper/the-data-truthfulness-a-big-data-understanding 8/8

You might also like