Big Data
Big Data
structured and unstructured – that inundate businesses on a day-to-day basis. But it’s
not just the type or amount of data that’s important, it’s what organizations do with
the data that matters. Big data can be analyzed for insights that improve decisions and
give confidence for making strategic business moves.
Big data refers to data that is so large, fast or complex that it’s difficult or impossible
to process using traditional methods. The act of accessing and storing large amounts
of information for analytics has been around for a long time. But the concept of big
data gained momentum in the early 2000s when industry analyst Doug Laney
articulated the now-mainstream definition of big data as the three V’s:
Velocity. With the growth in the Internet of Things, data streams into businesses at
an unprecedented speed and must be handled in a timely manner. RFID tags, sensors
and smart meters are driving the need to deal with these torrents of data in near-real
time.
Variety. Data comes in all types of formats – from structured, numeric data in
traditional databases to unstructured text documents, emails, videos, audios, stock
ticker data and financial transactions.
Variability : In addition to the increasing velocities and varieties of data, data flows
are unpredictable – changing often and varying greatly. It’s challenging, but businesses
need to know when something is trending in social media, and how to manage daily,
seasonal and event-triggered peak data loads.
Veracity /Accuracy : Veracity refers to the quality of data. Because data comes from
so many different sources, it’s difficult to link, match, cleanse and transform data
across systems. Businesses need to connect and correlate relationships, hierarchies
and multiple data linkages. Otherwise, their data can quickly spiral out of control.
Why Is Big Data Important?
The importance of big data doesn’t simply revolve around how much data you have.
The value lies in how you use it. By taking data from any source and analyzing it, you
can find answers that 1) streamline resource management, 2) improve operational
efficiencies, 3) optimize product development, 4) drive new revenue and growth
opportunities and 5) enable smart decision making. When you combine big data with
high-performance analytics, you can accomplish business-related tasks such as:
Before businesses can put big data to work for them, they should consider how it
flows among a multitude of locations, sources, systems, owners and users. There are
five key steps to taking charge of this "big data fabric" that includes traditional,
structured data along with unstructured and semistructured data:
At a high level, a big data strategy is a plan designed to help you oversee and improve
the way you acquire, store, manage, share and use data within and outside of your
organization. A big data strategy sets the stage for business success amid an
abundance of data. When developing a strategy, it’s important to consider existing –
and future – business and technology goals and initiatives. This calls for treating big
data like any other valuable business asset rather than just a byproduct of applications.
2) Identify big data sources
Streaming data comes from the Internet of Things (IoT) and other connected
devices that flow into IT systems from wearables, smart cars, medical devices,
industrial equipment and more. You can analyze this big data as it arrives,
deciding which data to keep or not keep, and which needs further analysis.
Social media data stems from interactions on Facebook, YouTube, Instagram,
etc. This includes vast amounts of big data in the form of images, videos, voice,
text and sound – useful for marketing, sales and support functions. This data is
often in unstructured or semistructured forms, so it poses a unique challenge
for consumption and analysis.
Publicly available data comes from massive amounts of open data sources
like the US government’s data.gov, the CIA World Factbook or the European
Union Open Data Portal.
Other big data may come from data lakes, cloud data sources, suppliers and
customers.
Modern computing systems provide the speed, power and flexibility needed to quickly
access massive amounts and types of big data. Along with reliable access, companies
also need methods for integrating the data, building data pipelines, ensuring data
quality, providing data governance and storage, and preparing the data for analysis.
Some big data may be stored on-site in a traditional data warehouse – but there are
also flexible, low-cost options for storing and handling big data via cloud solutions,
data lakes, data pipelines and Hadoop.
Well-managed, trusted data leads to trusted analytics and trusted decisions. To stay
competitive, businesses need to seize the full value of big data and operate in a data-
driven way – making decisions based on the evidence presented by big data rather
than gut instinct. The benefits of being data driven are clear. Data-driven
organizations perform better, are operationally more predictable and are more
profitable.
The New York Stock Exchange is an example of Big Data that generates about one terabyte of
new trade data per day.
Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments etc.
A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
‘structured’ data. Over the period of time, talent in computer science has achieved greater
success in developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it. However, nowadays, we are foreseeing
issues when a size of such data grows to a huge extent, typical sizes are being in the rage of
multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is given and imagine
the challenges involved in its storage and processing.
Do you know? Data stored in a relational database management system is one example of
a ‘structured’ data.
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the
size being huge, un-structured data poses multiple challenges in terms of its processing for
deriving value out of it. A typical example of unstructured data is a heterogeneous data source
containing a combination of simple text files, images, videos etc. Now day organizations have
wealth of data available with them but unfortunately, they don’t know how to derive value out of
it since this data is in its raw form or unstructured format.
Semi-structured data can contain both the forms of data. We can see semi-structured data as a
structured in form but it is actually not defined with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
Please note that web application data, which is unstructured, consists of log files, transaction
history files etc. OLTP systems are built to work with structured data wherein data is stored in
relations (tables).
Characteristics Of Big Data
Big data can be described by the following characteristics:
Volume
Variety
Velocity
Variability
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is one
characteristic which needs to be considered while dealing with Big Data solutions.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by
most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring
devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of
unstructured data poses certain issues for storage, mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The
flow of data is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.
Big Data technologies can be used for creating a staging area or landing zone for new data before
identifying what data should be moved to the data warehouse. In addition, such integration of
Big Data technologies and data warehouse helps an organization to offload infrequently accessed
data.
Summary
Big Data definition : Big Data meaning a data that is huge in size. Bigdata is a term used
to describe a collection of data that is huge in size and yet growing exponentially with
time.
Big Data analytics examples includes stock exchanges, social media sites, jet engines,
etc.
Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
Volume, Variety, Velocity, and Variability are few Big Data characteristics
Improved customer service, better operational efficiency, Better Decision Making are few
advantages of Bigdata
Big Data refers to the large collections of data that may be analysed to reveal patterns,
trends and associations, especially relating to human behaviour and interactions. Big
Data has already been explained in another article, entitled ‘Big Data’ (see 'Related
links'). This article will describe some real life examples of the use of Big Data for
performance management and measurement purposes.
Gaining insights (eg about customers’ preferences) which can then be used to improve
marketing and sales, thus increasing profits and shareholders’ wealth.
Forecasting better (eg customer’s future spending patterns, when machines will need replacing)
so that more appropriate decisions can be made.
Automating of high level business processes (eg lawyers scanning documents) which can lead
to organisations becoming more efficient.
Providing more detailed and up to date performance measurement.
Walmart is an American retailer that operates in 28 countries around the world. It is the
world’s largest company based on revenues. Many of Walmart’s customers buy online
through the company’s website. Walmart wanted to make sure that customers can find
what they are looking for on its website, so it developed its Polaris search engine. If
customers are looking for a particular product, they enter the description in a search
box, and the website displays products which meet that description.
What is unusual about Polaris is the way it ranks the search results. It attempts to show
the products that the customer is most likely to buy towards the top of the list. The
algorithm takes into account many factors, including the number of likes that the product
has on social media networks and how many favourable reviews it has.
The system also uses artificial intelligence to learn so that it can continually provide
better search results. If a phrase has been entered that the engine did not initially
understand, for example, the engine can ‘learn’ what that phrase meant based on what
the customer actually bought. Thus the system was soon able to figure out that when a
user entered ‘House’ into the search box, they were probably looking for merchandise
connected with the TV series of that name, not furniture or other items for their house. If
someone searches for ‘Flats’, the engine has learned that they probably want to buy
shoes, not apartments or flat screen TVs.
The metric that is used to measure the success of the website is customer conversion
rate – the number of customers that actually buy a product after a search. It is estimated
that the Polaris search engine has increased the conversion rate by between 10% and
15%. That is worth billions of dollars in extra revenue.
Beredynamic
The company developed a data warehouse that automatically extracts transactions from
its existing ERP and financial accounting systems. The structure of this warehouse was
carefully designed so that standard information is stored for each transaction such as
product codes, country code, customer and region. This is supplemented by a web
based reporting solution that enables managers to create their own reports, both
standard and ad hoc, based on the data held in the warehouse.
The system allows the company to perform detailed analysis of sales, which helps it to
identify trends in different products or markets. This leads to two business advantages.
The first is that the sales and distribution strategy can be changed when demand
changes in certain markets – for example, when sales of gaming headphones began to
increase in Japan, the company introduced promotions for all its gaming products in that
country, including a large advertising campaign and introduction of product bundles
specially for the Japanese market. The second advantage is that production plans can
quickly be changed as demand changes. If demand is falling, production is slowed to
ensure that the company is not left with excessive inventory. If demand is expanding,
production is increased to take advantage of higher sales.
The ability to provide more detailed analysis quickly can also be used for performance
measurement and appraisal, for example, comparing actual sales with targets by
region, assessing whether a promotion achieved the expected increase in profits. Such
reports can be produced quickly based on real time data, meaning that management
can respond quickly to any adverse variances.
The success of the new system is measured in terms of the growth in revenues and
profits. While this seems simple, it has to be recognised that some growth would have
been expected even if the system had not been implemented, so determining how much
revenue growth has resulted from the greater analysis can be difficult. Assumptions
need to be made.
Tesco
British supermarket group Tesco has operations in several countries around the world.
In Ireland, the company developed a system to analyse the temperature of its in-store
refrigerators. Sensors were placed in the fridges that measured the temperature every
three seconds and sent the information over the internet to a central data warehouse.
Analysis of this data allowed the company to identify units that were operating at
incorrect temperatures. The company discovered that a number of fridges were
operating at temperatures below the -21◦C to -23◦C recommended. This was clearly
costing the company in terms of wasted energy. Having this information allowed the
company to correct the temperature of the fridges. Given that the company was
spending €10 million per year on fridge cooling costs in Ireland, an expected 20%
reduction in these costs was a significant saving.
The system also allowed the engineers to monitor the performance of the fridges
remotely. When they identified that a particular unit was malfunctioning, they could
analyse the problem then visit the store with the right parts and replace them.
Previously the fridges would only be fixed when a problem had been discovered by the
store manager, which would usually be when the problem had developed into
something more major. The engineers would have to visit the store, identify the
problem, and then make a second visit to the store with the required parts.
A customer jokingly tweeted US chain Morton’s and requested that dinner be sent to the
Newark airport where he was due to arrive late. Morton’s saw the tweet, realised he was
a regular customer, pulled up information on what he typically ordered, figured out
which flight he was on and then sent a waiter to meet him at the airport and serve him
dinner.
Clearly this action was a publicity stunt which the restaurant hoped that their customer
would publicise in future tweets. What it demonstrates is how easy it was for Morton’s to
identify the customer who sent the tweet, and to ascertain what his favourite meal was.
It also shows how companies like to influence social media users who have a large
following as a means of increasing their own publicity.
It is difficult to measure the impact of interventions into social media. No doubt the
happy customer would have communicated this story, and this may have improved the
reputation of the restaurant, but it is very difficult to measure the impact of this on sales.
Conclusion
The cases above have shown how detailed analysis of data can be used in a number of
different ways to improve the performance of an organisation. Big data can be used to
understand customers and trends better, to provide insights into costs, and to make it
easier for customers to find what they want on the website. Companies are likely to
continue to identify innovative uses of the increasing volumes of data available to them,
and analysis of Big Data is likely to grow in importance as an important strategic tool for
many businesses.