0% found this document useful (0 votes)
100 views14 pages

Big Data

Big data refers to large volumes of both structured and unstructured data that is difficult to process using traditional methods due to its size and complexity. It is characterized by the 3Vs - volume, velocity and variety. Volume refers to the large amount of data collected from various sources, velocity refers to the speed at which data streams in, and variety refers to the different data formats. Big data is important because analyzing it can provide insights to improve business decisions and strategies.

Uploaded by

Sabin Parajuli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views14 pages

Big Data

Big data refers to large volumes of both structured and unstructured data that is difficult to process using traditional methods due to its size and complexity. It is characterized by the 3Vs - volume, velocity and variety. Volume refers to the large amount of data collected from various sources, velocity refers to the speed at which data streams in, and variety refers to the different data formats. Big data is important because analyzing it can provide insights to improve business decisions and strategies.

Uploaded by

Sabin Parajuli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Big data is a term that describes large, hard-to-manage volumes of data – both

structured and unstructured – that inundate businesses on a day-to-day basis. But it’s
not just the type or amount of data that’s important, it’s what organizations do with
the data that matters. Big data can be analyzed for insights that improve decisions and
give confidence for making strategic business moves.

History of Big Data

Big data refers to data that is so large, fast or complex that it’s difficult or impossible
to process using traditional methods. The act of accessing and storing large amounts
of information for analytics has been around for a long time. But the concept of big
data gained momentum in the early 2000s when industry analyst Doug Laney
articulated the now-mainstream definition of big data as the three V’s:

Volume. Organizations collect data from a variety of sources, including transactions,


smart (IoT) devices, industrial equipment, videos, images, audio, social media and
more. In the past, storing all that data would have been too costly – but cheaper
storage using data lakes, Hadoop and the cloud have eased the burden.

Velocity. With the growth in the Internet of Things, data streams into businesses at
an unprecedented speed and must be handled in a timely manner. RFID tags, sensors
and smart meters are driving the need to deal with these torrents of data in near-real
time.

Variety. Data comes in all types of formats – from structured, numeric data in
traditional databases to unstructured text documents, emails, videos, audios, stock
ticker data and financial transactions.

At SAS, we consider two additional dimensions when it comes to big data:

Variability : In addition to the increasing velocities and varieties of data, data flows
are unpredictable – changing often and varying greatly. It’s challenging, but businesses
need to know when something is trending in social media, and how to manage daily,
seasonal and event-triggered peak data loads.

Veracity /Accuracy : Veracity refers to the quality of data. Because data comes from
so many different sources, it’s difficult to link, match, cleanse and transform data
across systems. Businesses need to connect and correlate relationships, hierarchies
and multiple data linkages. Otherwise, their data can quickly spiral out of control.
Why Is Big Data Important?

The importance of big data doesn’t simply revolve around how much data you have.
The value lies in how you use it. By taking data from any source and analyzing it, you
can find answers that 1) streamline resource management, 2) improve operational
efficiencies, 3) optimize product development, 4) drive new revenue and growth
opportunities and 5) enable smart decision making. When you combine big data with
high-performance analytics, you can accomplish business-related tasks such as:

 Determining root causes of failures, issues and defects in near-real time.


 Spotting anomalies faster and more accurately than the human eye.
 Improving patient outcomes by rapidly converting medical image data into
insights.
 Recalculating entire risk portfolios in minutes.
 Sharpening deep learning models' ability to accurately classify and react to
changing variables.
 Detecting fraudulent behavior before it affects your organization.

How Big Data Works

Before businesses can put big data to work for them, they should consider how it
flows among a multitude of locations, sources, systems, owners and users. There are
five key steps to taking charge of this "big data fabric" that includes traditional,
structured data along with unstructured and semistructured data:

 Set a big data strategy.


 Identify big data sources.
 Access, manage and store the data.

 Analyze the data.


 Make intelligent, data-driven decisions.

1) Set a big data strategy

At a high level, a big data strategy is a plan designed to help you oversee and improve
the way you acquire, store, manage, share and use data within and outside of your
organization. A big data strategy sets the stage for business success amid an
abundance of data. When developing a strategy, it’s important to consider existing –
and future – business and technology goals and initiatives. This calls for treating big
data like any other valuable business asset rather than just a byproduct of applications.
2) Identify big data sources

 Streaming data comes from the Internet of Things (IoT) and other connected
devices that flow into IT systems from wearables, smart cars, medical devices,
industrial equipment and more. You can analyze this big data as it arrives,
deciding which data to keep or not keep, and which needs further analysis.
 Social media data stems from interactions on Facebook, YouTube, Instagram,
etc. This includes vast amounts of big data in the form of images, videos, voice,
text and sound – useful for marketing, sales and support functions. This data is
often in unstructured or semistructured forms, so it poses a unique challenge
for consumption and analysis.
 Publicly available data comes from massive amounts of open data sources
like the US government’s data.gov, the CIA World Factbook or the European
Union Open Data Portal.
 Other big data may come from data lakes, cloud data sources, suppliers and
customers.

3) Access, manage and store big data

Modern computing systems provide the speed, power and flexibility needed to quickly
access massive amounts and types of big data. Along with reliable access, companies
also need methods for integrating the data, building data pipelines, ensuring data
quality, providing data governance and storage, and preparing the data for analysis.
Some big data may be stored on-site in a traditional data warehouse – but there are
also flexible, low-cost options for storing and handling big data via cloud solutions,
data lakes, data pipelines and Hadoop.

4) Analyze the data

With high-performance technologies like grid computing or in-memory analytics,


organizations can choose to use all their big data for analyses. Another approach is to
determine upfront which data is relevant before analyzing it. Either way, big data
analytics is how companies gain value and insights from data. Increasingly, big data
feeds today’s advanced analytics endeavors such as artificial intelligence (AI) and
machine learning.

5) Make intelligent, data-driven decisions

Well-managed, trusted data leads to trusted analytics and trusted decisions. To stay
competitive, businesses need to seize the full value of big data and operate in a data-
driven way – making decisions based on the evidence presented by big data rather
than gut instinct. The benefits of being data driven are clear. Data-driven
organizations perform better, are operationally more predictable and are more
profitable.

Examples Of Big Data


Following are some of the Big Data examples-

The New York Stock Exchange is an example of Big Data that generates about one terabyte of
new trade data per day.

Social Media

The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.

Types Of Big Data


Following are the types of Big Data:

1. Structured
2. Unstructured
3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a
‘structured’ data. Over the period of time, talent in computer science has achieved greater
success in developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it. However, nowadays, we are foreseeing
issues when a size of such data grows to a huge extent, typical sizes are being in the rage of
multiple zettabytes.

Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.

Looking at these figures one can easily understand why the name Big Data is given and imagine
the challenges involved in its storage and processing.

Do you know? Data stored in a relational database management system is one example of
a ‘structured’ data.

Examples Of Structured Data

An ‘Employee’ table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs


2365 Rajesh Kulkarni Male Finance 650000

3398 Pratibha Joshi Female Admin 650000

7465 Shushil Roy Male Admin 500000

7500 Shubhojit Das Male Finance 500000

7699 Priya Sane Female Finance 550000

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the
size being huge, un-structured data poses multiple challenges in terms of its processing for
deriving value out of it. A typical example of unstructured data is a heterogeneous data source
containing a combination of simple text files, images, videos etc. Now day organizations have
wealth of data available with them but unfortunately, they don’t know how to derive value out of
it since this data is in its raw form or unstructured format.

Examples Of Un-structured Data

The output returned by ‘Google Search’

Example Of Un-structured Data


Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a
structured in form but it is actually not defined with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.

Examples Of Semi-structured Data

Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Data Growth over the years

Data Growth over the years

Please note that web application data, which is unstructured, consists of log files, transaction
history files etc. OLTP systems are built to work with structured data wherein data is stored in
relations (tables).
Characteristics Of Big Data
Big data can be described by the following characteristics:

 Volume
 Variety
 Velocity
 Variability

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one
characteristic which needs to be considered while dealing with Big Data solutions.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by
most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring
devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of
unstructured data poses certain issues for storage, mining and analyzing data.

(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The
flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.

Advantages Of Big Data Processing


Ability to process Big Data in DBMS brings in multiple benefits, such as-

 Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.

 Improved customer service


Traditional customer feedback systems are getting replaced by new systems designed with Big
Data technologies. In these new systems, Big Data and natural language processing technologies
are being used to read and evaluate consumer responses.

 Early identification of risk to the product/services, if any


 Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before
identifying what data should be moved to the data warehouse. In addition, such integration of
Big Data technologies and data warehouse helps an organization to offload infrequently accessed
data.

Summary
 Big Data definition : Big Data meaning a data that is huge in size. Bigdata is a term used
to describe a collection of data that is huge in size and yet growing exponentially with
time.
 Big Data analytics examples includes stock exchanges, social media sites, jet engines,
etc.
 Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
 Volume, Variety, Velocity, and Variability are few Big Data characteristics
 Improved customer service, better operational efficiency, Better Decision Making are few
advantages of Bigdata
Big Data refers to the large collections of data that may be analysed to reveal patterns,
trends and associations, especially relating to human behaviour and interactions. Big
Data has already been explained in another article, entitled ‘Big Data’ (see 'Related
links'). This article will describe some real life examples of the use of Big Data for
performance management and measurement purposes.

Performance management involves managing the organisation in order to ensure that it


meets its objectives. Broadly, Big Data is relevant to performance management in the
following ways:

 Gaining insights (eg about customers’ preferences) which can then be used to improve
marketing and sales, thus increasing profits and shareholders’ wealth.
 Forecasting better (eg customer’s future spending patterns, when machines will need replacing)
so that more appropriate decisions can be made.
 Automating of high level business processes (eg lawyers scanning documents) which can lead
to organisations becoming more efficient.
 Providing more detailed and up to date performance measurement.

This article demonstrates some practical applications of Big Data.

Walmart’s Polaris search engine

Walmart is an American retailer that operates in 28 countries around the world. It is the
world’s largest company based on revenues. Many of Walmart’s customers buy online
through the company’s website. Walmart wanted to make sure that customers can find
what they are looking for on its website, so it developed its Polaris search engine. If
customers are looking for a particular product, they enter the description in a search
box, and the website displays products which meet that description.

What is unusual about Polaris is the way it ranks the search results. It attempts to show
the products that the customer is most likely to buy towards the top of the list. The
algorithm takes into account many factors, including the number of likes that the product
has on social media networks and how many favourable reviews it has.

The system also uses artificial intelligence to learn so that it can continually provide
better search results. If a phrase has been entered that the engine did not initially
understand, for example, the engine can ‘learn’ what that phrase meant based on what
the customer actually bought. Thus the system was soon able to figure out that when a
user entered ‘House’ into the search box, they were probably looking for merchandise
connected with the TV series of that name, not furniture or other items for their house. If
someone searches for ‘Flats’, the engine has learned that they probably want to buy
shoes, not apartments or flat screen TVs.

The metric that is used to measure the success of the website is customer conversion
rate – the number of customers that actually buy a product after a search. It is estimated
that the Polaris search engine has increased the conversion rate by between 10% and
15%. That is worth billions of dollars in extra revenue.

Beredynamic

Beredynamic is a manufacturer of high quality audio products such as microphones and


headphones. The company is based in Germany, but has a wide international sales and
distribution network. The company wanted to improve its analysis of sales. Most ad hoc
reports required data to be extracted from its legacy systems into a spreadsheet where
the reports would then be manually compiled. This was time consuming, leading to
delays in producing the reports. The reports themselves were not always accurate
either.

The company developed a data warehouse that automatically extracts transactions from
its existing ERP and financial accounting systems. The structure of this warehouse was
carefully designed so that standard information is stored for each transaction such as
product codes, country code, customer and region. This is supplemented by a web
based reporting solution that enables managers to create their own reports, both
standard and ad hoc, based on the data held in the warehouse.

The system allows the company to perform detailed analysis of sales, which helps it to
identify trends in different products or markets. This leads to two business advantages.
The first is that the sales and distribution strategy can be changed when demand
changes in certain markets – for example, when sales of gaming headphones began to
increase in Japan, the company introduced promotions for all its gaming products in that
country, including a large advertising campaign and introduction of product bundles
specially for the Japanese market. The second advantage is that production plans can
quickly be changed as demand changes. If demand is falling, production is slowed to
ensure that the company is not left with excessive inventory. If demand is expanding,
production is increased to take advantage of higher sales.

The ability to provide more detailed analysis quickly can also be used for performance
measurement and appraisal, for example, comparing actual sales with targets by
region, assessing whether a promotion achieved the expected increase in profits. Such
reports can be produced quickly based on real time data, meaning that management
can respond quickly to any adverse variances.
The success of the new system is measured in terms of the growth in revenues and
profits. While this seems simple, it has to be recognised that some growth would have
been expected even if the system had not been implemented, so determining how much
revenue growth has resulted from the greater analysis can be difficult. Assumptions
need to be made.

Tesco

British supermarket group Tesco has operations in several countries around the world.
In Ireland, the company developed a system to analyse the temperature of its in-store
refrigerators. Sensors were placed in the fridges that measured the temperature every
three seconds and sent the information over the internet to a central data warehouse.
Analysis of this data allowed the company to identify units that were operating at
incorrect temperatures. The company discovered that a number of fridges were
operating at temperatures below the -21◦C to -23◦C recommended. This was clearly
costing the company in terms of wasted energy. Having this information allowed the
company to correct the temperature of the fridges. Given that the company was
spending €10 million per year on fridge cooling costs in Ireland, an expected 20%
reduction in these costs was a significant saving.

The system also allowed the engineers to monitor the performance of the fridges
remotely. When they identified that a particular unit was malfunctioning, they could
analyse the problem then visit the store with the right parts and replace them.
Previously the fridges would only be fixed when a problem had been discovered by the
store manager, which would usually be when the problem had developed into
something more major. The engineers would have to visit the store, identify the
problem, and then make a second visit to the store with the required parts.

Morton’s Steak House

A customer jokingly tweeted US chain Morton’s and requested that dinner be sent to the
Newark airport where he was due to arrive late. Morton’s saw the tweet, realised he was
a regular customer, pulled up information on what he typically ordered, figured out
which flight he was on and then sent a waiter to meet him at the airport and serve him
dinner.

Clearly this action was a publicity stunt which the restaurant hoped that their customer
would publicise in future tweets. What it demonstrates is how easy it was for Morton’s to
identify the customer who sent the tweet, and to ascertain what his favourite meal was.
It also shows how companies like to influence social media users who have a large
following as a means of increasing their own publicity.
It is difficult to measure the impact of interventions into social media. No doubt the
happy customer would have communicated this story, and this may have improved the
reputation of the restaurant, but it is very difficult to measure the impact of this on sales.

Conclusion

The cases above have shown how detailed analysis of data can be used in a number of
different ways to improve the performance of an organisation. Big data can be used to
understand customers and trends better, to provide insights into costs, and to make it
easier for customers to find what they want on the website. Companies are likely to
continue to identify innovative uses of the increasing volumes of data available to them,
and analysis of Big Data is likely to grow in importance as an important strategic tool for
many businesses.

You might also like