0% found this document useful (0 votes)
24 views

BigData_BCom-Unit-1

The document provides an overview of Big Data, defining it as high-volume, high-velocity, and high-variety information assets that require innovative processing techniques for better decision-making. It discusses the evolution, types, characteristics, benefits, and challenges of Big Data, emphasizing its importance in modern business analytics. Additionally, it contrasts Big Data with traditional Business Intelligence, highlighting differences in data handling, processing, and objectives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

BigData_BCom-Unit-1

The document provides an overview of Big Data, defining it as high-volume, high-velocity, and high-variety information assets that require innovative processing techniques for better decision-making. It discusses the evolution, types, characteristics, benefits, and challenges of Big Data, emphasizing its importance in modern business analytics. Additionally, it contrasts Big Data with traditional Business Intelligence, highlighting differences in data handling, processing, and objectives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Unit-I : INTRODUCTION TO BIG DATA

Big Data
The definition of Big Data – according to Gartner is
“Big data” is high-volume, velocity, and variety information assets that
demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.”
Big Data refers to complex and large data sets that have to be processed
and analysed to uncover valuable information that can benefit businesses
and organizations. However, there are certain basic tenets of Big Data
that will make it even simpler:
 Big Data refers to a massive amount of data that keeps on growing
exponentially with time.
 Big Data is so voluminous that it cannot be processed or analyzed
using conventional data processing techniques.
 It includes data mining, data storage, data analysis, data sharing, and
data visualization.
 The term is an all-comprehensive one including data, data frameworks,
along with the tools and techniques used to process and analyze the
data.

Evolution of Big Data


The evolution of Big Data

1970s and before was the era of mainframes. The data was essentially
primitive and structured. Relational databases evolved in 1980s and
1990s. The era was of data intensive applications. The World Wide Web
(WWW) and the Internet of Things (IOT) have led to an onslaught of
structured, unstructured, and multimedia data.

The History of Big Data


Although the concept of big data itself is relatively new, the origins of
large data sets go back to the 1960s and '70s when the world of data was
just getting started with the first data centres and the development of the
relational database.
Three Phases of Big Data

Around 2005, people began to realize just how


much data users generated through Facebook,
YouTube, and other online services. Hadoop
(an open-source framework created
specifically to store and analyze big data sets)
was developed that same year. NoSQL also
began to gain popularity during this time.
The development of open-source frameworks,
such as Hadoop (and more recently, Spark) was essential for the growth of
big data because they make big data easier to work with and cheaper to
store. In the years since then, the volume of big data has skyrocketed.
Users are still generating huge amounts of data—but it’s not just humans
who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices
are connected to the internet, gathering data on customer usage patterns
and product performance. The emergence of machine learning has
produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud
computing has expanded big data possibilities even further. The cloud
offers truly elastic scalability, where developers can simply spin up ad hoc
clusters to test a subset of data.
Benefits of Big Data and Data Analytics
Big data makes it possible for you to gain more answers that are complete
because you have more information.
More answers that are complete mean more confidence in the data—
which means a completely different approach to tackling problems.

Types of Big Data


a) Structured: Structured is one of the types of big data and By
structured data, we mean data that can be processed, stored, and
retrieved in a fixed format. It refers to highly organized information that
can be readily and seamlessly stored and accessed from a database by
simple search engine algorithms. For instance, the employee table in
a company database will be structured as the employee details,
their job positions, their salaries, etc., will be present in an organized
manner.
b) Unstructured: Unstructured data refers to the data that lacks any
specific form or structure whatsoever. This makes it very difficult and
time-consuming to process and analyze unstructured data. Email is an
example of unstructured data. Structured and unstructured are two
important types of big data.
c) Semi-structured: Semi structured is the third type of big data. Semi-
structured data pertains to the data containing both the formats
mentioned above, that is, structured and unstructured data. To be
precise, it refers to the data that although has not been classified under a
particular repository (database), yet contains vital information or tags that
segregate individual elements within the data. Thus we come to the end
of types of data.

Characteristics of Big Data Characteristics of Data


Big data has three key
characteristics:
1. Composition: The composition
of data deals with the structure of
data, that is, the sources of data,
the granularity, the types, and the
nature of data as to whether it is
static or real-time streaming.
2. Condition: The condition of data deals with the state of data, that is,
"Can one use this data as is for analysis?" or "Does it require cleansing
for further enhancement and enrichment?"
3. Context: The context of data deals with "Where has this data been
generated?" "Why was this data generated?" How sensitive is this data?"

Definition of Big Data


 Big data is high-velocity and high-variety information assets that
demand cost effective, innovative forms of information processing for
enhanced insight and decision making.
 Big data refers to datasets whose size is typically beyond the storage
capacity of and also complex for traditional database software tools
 Big data is anything beyond the human & technical infrastructure
needed to support storage, processing and analysis.
 It is data that is big in volume, velocity and variety.
In 2001, Gartner analyst Doug
Laney listed the 3 ‘V’s of Big
Data – Variety, Velocity, and
Volume. Let us look at them in
depth:
a) Variety: Variety of Big Data
refers to structured,
unstructured, and semi-
structured data that is gathered
from multiple sources. While in
the past, data could only be
collected from spreadsheets and
databases, today data comes in
an array of forms such as
emails, PDFs, photos, videos, audios, and so much more. Variety is one of
the important characteristics of big data.
b) Velocity: Velocity essentially refers to the speed at which data is
being created in real-time. In a broader prospect, it comprises the rate of
change, linking of incoming data sets at varying speeds, and activity
bursts.
c) Volume: Big Data indicates huge ‘volumes’ of data that is being
generated on a daily basis from various sources like social media
platforms, business processes, machines, networks, human interactions,
etc. Such a large amount of data is stored in data warehouses.

Part I of the definition: "Big data is high-volume, high-velocity, and


high-variety information assets" talks about voluminous data (humongous
data) that may have great variety (a good mix of structured, semi-
structured and unstructured data) and will require a good speed/pace for
storage, preparation, processing and analysis.
Part II of the definition: "cost effective, innovative forms of
information processing" talks about embracing new techniques and
technologies to capture (ingest), store, process, persist, integrate and
visualize the high volume, high-velocity, and high-variety data.
Part III of the definition: "enhanced insight and decision making" talks
about deriving deeper, richer and meaningful insights and then using
these insights to make faster and better decisions to gain business value
and thus a competitive edge.
Data —> Information —> Actionable intelligence —> Better decisions —
>Enhanced business value

More characteristics of big data


Looking beyond the original three V's, here are details on some of the
other ones that are now often associated with big data:
 Veracity refers to the degree of accuracy in data sets and how
trustworthy they are. Raw data collected from various sources can
cause data quality issues that may be difficult to pinpoint. If they aren't
fixed through data cleansing processes, bad data leads to analysis errors
that can undermine the value of business analytics initiatives. Data
management and analytics teams also need to ensure that they have
enough accurate data available to produce valid results.
 Some data scientists and consultants also add value to the list of big
data's characteristics. Not all the data that's collected has real business
value or benefits. As a result, organizations need to confirm that data
relates to relevant business issues before it's used in big data analytics
projects.
 Variability also often applies to sets of big data, which may have
multiple meanings or be formatted differently in separate data sources
-- factors that further complicate big data management and analytics.

Challenges with Big Data


Data volume: Data today is growing at an exponential rate. This high
tide of data will continue to rise continuously. The key questions are – “will
all this data be useful for analysis?”,
“Do we work with all this data or subset of it?”,
“How will we separate the knowledge from the noise?” etc.
Storage: Cloud computing is the answer to managing infrastructure for
big data as far as cost-efficiency, elasticity and easy upgrading /
downgrading is concerned. This further complicates the decision to host

big data solutions outside the enterprise.


Data retention: How long should one retain this data? Some data may
require for log-term decision, but some data may quickly become
irrelevant and obsolete.
Skilled professionals: In order to develop, manage and run those
applications that generate insights, organizations need professionals who
possess a high-level proficiency in data sciences.
Other challenges: Other challenges of big data are with respect to
capture, storage, search, analysis, transfer and security of big data.
Visualization: Big data refers to datasets whose size is typically beyond
the storage capacity of traditional database software tools. There is no
explicit definition of how big the data set should be for it to be considered
bigdata. Data visualization(computer graphics) is becoming popular as a
separate discipline. There are very few data visualization experts.

Why is Big Data Important?


The importance of big data does not revolve around how much data a
company has but how a company utilizes the collected data. Every
company uses data in its own way; the more efficiently a company uses
its data, the more potential it has to grow. The company can take data
from any source and analyze it to find answers which will enable:
1. Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts of
data are to be stored and these tools also help in identifying more
efficient ways of doing business.
2. Time Reductions: The high speed of tools like Hadoop and in-memory
analytics can easily identify new sources of data which helps
businesses analyzing data immediately and make quick decisions
based on the learning.
3. Understand the market conditions: By analyzing big data you can
get a better understanding of current market conditions. For example,
by analyzing customers’ purchasing behaviors, a company can find out
the products that are sold the most and produce products according to
this trend. By this, it can get ahead of its competitors.
4. Control online reputation: Big data tools can do sentiment analysis.
Therefore, you can get feedback about who is saying what about your
company. If you want to monitor and improve the online presence of
your business, then, big data tools can help in all this.
5. Using Big Data Analytics to Boost Customer Acquisition and
Retention: The customer is the most important asset any business
depends on. There is no single business that can claim success without
first having to establish a solid customer base. However, even with a
customer base, a business cannot afford to disregard the high
competition it faces. If a business is slow to learn what customers are
looking for, then it is very easy to begin offering poor quality products.
In the end, loss of clientele will result, and this creates an adverse
overall effect on business success. The use of big data allows
businesses to observe various customer related patterns and trends.
Observing customer behavior is important to trigger loyalty.
6. Using Big Data Analytics to Solve Advertisers Problem and
Offer Marketing Insights: Big data analytics can help change all
business operations. This includes the ability to match customer
expectation, changing company’s product line and of course ensuring
that the marketing campaigns are powerful.
7. Big Data Analytics As a Driver of Innovations and Product
Development: Another huge advantage of big data is the ability to
help companies innovate and redevelop their products.

Business Intelligence vs Big Data


Although Big Data and Business Intelligence are two technologies used to
analyze data to help companies in the decision-making process, there are
differences between both of them. They differ in the way they work as
much as in the type of data they analyze.
Traditional BI methodology is based on the principle of grouping all
business data into a central server. Typically, this data is analyzed in
offline mode, after storing the information in an environment called Data
Warehouse. The data is structured in a conventional relational database
with an additional set of indexes and forms of access to the tables
(multidimensional cubes).
A Big Data solution differs in many aspects to BI to use.
1. In a Big Data environment, information is stored on a distributed file
system, rather than on a central server. It is a much safer and more
flexible space.
2. Big Data solutions carry the processing functions to the data, rather
than the data to the functions. As the analysis is centered on the
information, it´s easier to handle larger amounts of information in a
more agile way.
3. Big Data can analyze data in different formats, both structured and
unstructured. The volume of unstructured data (those not stored in a
traditional database) is growing at levels much higher than the
structured data. Nevertheless, its analysis carries different challenges.
Big Data solutions solve them by allowing a global analysis of various
sources of information.
4. Data processed by Big Data solutions can be historical or come from
real-time sources. Thus, companies can make decisions that affect
their business in an agile and efficient way.
5. Big Data technology uses parallel mass processing (MPP) concepts,
which improves the speed of analysis. With MPP many instructions are
executed simultaneously, and since the various jobs are divided into
several parallel execution parts, at the end the overall results are
reunited and presented. This allows you to analyze large volumes of
information quickly.

Comparison
of Business Intelligence Big Data
Objectives

Purpose The purpose of Business The main purpose of Big


Intelligence is to help the Data is to capture,
business to make better process, and analyze the
decisions. Business Intelligence data, both structured and
helps in delivering accurate unstructured to improve
reports by extracting customer outcomes.
information directly from the
data source.

EcoSystem / Operation systems, ERP Hadoop, Spark, R Server,


Component databases, Data Warehouse, hive, HDFS etc.
s Dashboard etc.

Tools Below is the list of tools used for Below is the list of tools
business intelligence. These used in Big Data. These
tools enable a business to tools or frameworks store
collate, analyze and visualize a large amount of data
data, which can be used in and process them to get
making better business insights from data to
decisions and to come up with make good decisions for
good strategic plans. the business.
Comparison
of Business Intelligence Big Data
Objectives

 Online analytical processing  Hadoop


(OLAP)  Spark
 Data Warehousing  Hive
 Digital Dashboards & Data  Polybase
mining  Presto
 Microsoft Power BI  Storm etc
 Google Analytics etc

Characterist Below are the six features of Big data can be


ics/ Business Intelligence: described by some
Properties Location intelligence, Executive characteristics such as
Dashboards, “what if” analysis, Volume, Variety,
Interactive reports, Metadata Variability, Velocity, and
layer, and Ranking reports Veracity.

Benefits Below is the list of benefits of Below is the list of


Business Intelligence benefits of Big Data
 Helps in making better  Better Decision making
business decisions  Fraud detection
 Faster and more accurate  Storage, mining, and
reporting and analysis analysis of data
 Improved data quality  Market prediction &
 Reduced costs and forecasting
 Increase revenues  Improves the service
 Improved operational  Helps in implementing
efficiency etc. the new strategies
 Keep up with customer
trends
 Cost savings
 Better sales insights,
which helps in
increasing revenues
etc

Applied Social media, Healthcare, The banking sector,


Fields Gaming Industry, Food Industry Entertainment, and
etc Social media, Healthcare,
Retail and wholesale etc

You might also like