0% found this document useful (0 votes)
12 views

BigData_BCom-Unit-2

Big Data Analytics provides organizations with opportunities to enhance efficiency, quality, and customer satisfaction through the analysis of large datasets. It involves various technologies and methodologies, including operational and analytical big data technologies, to derive insights and support decision-making. The document also discusses the classification of analytics, challenges, and the importance of big data technologies in various sectors such as retail, IT infrastructure, and social media.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

BigData_BCom-Unit-2

Big Data Analytics provides organizations with opportunities to enhance efficiency, quality, and customer satisfaction through the analysis of large datasets. It involves various technologies and methodologies, including operational and analytical big data technologies, to derive insights and support decision-making. The document also discusses the classification of analytics, challenges, and the importance of big data technologies in various sectors such as retail, IT infrastructure, and social media.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit-II : BIG DATA ANALYTICS

Big Data is creating significant new opportunities for organizations to


derive new value and create competitive advantage from their most
valuable asset: information. For businesses, Big Data helps drive
efficiency, quality, and personalized products and services, producing
improved levels of customer satisfaction and profit. For scientific efforts,
Big Data analytics enable new avenues of investigation with potentially
richer results and deeper insights than previously available. In many
cases, Big Data analytics integrate structured and unstructured data with
Real-time feeds and queries, opening new paths to innovation and insight.

Introduction to Big Data Analytics


Big Data Analytics is...
1. Technology-enabled analytics: Quite a few data analytics and
visualization tools are available in the market today from leading
vendors such as IBM, Tableau, SAS, R Analytics, Statistica, World
Programming Systems (WPS), etc. to help process and analyze your big
data.
2. About gaining a meaningful, deeper, and richer insight into your
business to steer it in the right direction. understanding the customer's
demographics to cross-sell and up- sell to them, better leveraging the
services of your vendors and suppliers, etc.
3. About a competitive edge over your competitors by enabling you with
findings that allow quicker and better decision-making.
4. A tight handshake between three communities: IT, business users, and
data Analysts.
5. Working with datasets whose volume and variety exceed the current
storage and processing capabilities and infrastructure of your
enterprise.
About moving code to data. This makes perfect sense as the program for
distributed processing is tiny (just a few KBs) compared to the data
(Terabytes or Petabytes today and likely to be Exabytes or Zettabytes in
the near future).

Examples of Big Data Analytics


There are three examples of Big Data Analytics in different areas: retail, IT
infrastructure, and social media.
1. Retail: As mentioned earlier, Big Data presents many opportunities to
improve sales and marketing analytics.
An example of this is the U.S. retailer Target. After analyzing consumer
purchasing behavior, Target's statisticians determined that the retailer
made a great deal of money from three main life-event situations.
• When people tend to buy many new products.
• When people buy new products and change their spending habits.
• When people have many new things to buy and have an urgency to
buy them. The analysis target to manage its inventory, knowing that
there would be demand for specific products and it would likely vary by
month over the coming nine- to ten-month cycles.
2. IT infrastructure: MapReduce paradigm is an ideal technical
framework for many Big Data projects, which rely on large data sets with
unusual data structures. One of the main benefits of Hadoop is that it
employs a distributed file system, meaning it can use a distributed cluster
of servers and commodity hardware to process large amounts of data.
Some of the most common examples of Hadoop implementations are in
the social media space, where Hadoop can manage transactions, give
textual updates, and develop social graphs among millions of users.
Twitter and Facebook generate massive amounts of unstructured data
and use Hadoop and its ecosystem of tools to manage this high volume.
3. Social media: It represents a tremendous opportunity to leverage
social and professional interactions to derive new insights.
LinkedIn represents a company in which data itself is the product. Early
on, Linkedln founder Reid Hoffman saw the opportunity to create a social
network for working professionals.
Linkedln has more than 250 million user accounts and has added many
additional features and data-related products, such as recruiting, job
seeker tools, advertising, and ln Maps, which show a social graph of a
user's professional network.

Classification of Analytics
There are basically two schools of thought:
1 Those that classify analytics into basic, operationalized, advanced and
Monetized.
2 Those that classify analytics into analytics 1.0, analytics 2.0, and
analytics 3.0.

First School of Thought


It includes Basic analytics, Operationalized analytics, Advanced analytics
and Monetized analytics.
Basic analytics: This primarily is slicing and dicing of data to help with
basic business insights. This is about reporting on historical data, basic
visualization, etc.
Operationalized analytics: It is operationalized analytics if it gets
woven into the enterprises business processes.
Advanced analytics: This largely is about forecasting for the future by
way of predictive and prescriptive modelling.
Monetized analytics: This is analytics in use to derive direct business
revenue.

Second School of Thought:


Let us take a closer look at analytics 1.0, analytics 2.0, and analytics 3.0.
Refer Table 2.1. Figure 2.1 shows the subtle growth of analytics from
Descriptive  Diagnostic  Predictive  Perspective analytics.
Analytics 1.0 Analytics 2.0 Analytics 3.0
Era: Mid 1990s to Era: 2005 to 2012 Era: 2012 to present
2009 Descriptive Statistics Descriptive +
Descriptive Statistics + predictive Statistics predictive +
(report on events, (use data from the prescriptive statistics
occurrences, etc of the past to make (Use data from the
past) predictions for the past to make
future) prophecies for the
future and at the same
time make
recommendations to
leverage the situation
to one’s advantage)
Key questions asked: Key Questions asked: Key Questions asked:
What happened? What happened? What will happen?
Why did it happen? Why will it happen? When will it happen?
What should be the
action taken to take
Analytics 1.0 Analytics 2.0 Analytics 3.0
advantage of what will
happen?
Data from legacy Big Data A blend of Big Data
systems. ERP, CRM and data from legacy
and 3rd party systems, ERP, CRM
applications. and 3rd party
applications.
Small and structured Big data is being taken A blend of Big Data
data sources. Data up seriously. Data is and traditional
stored in enterprise mainly unstructured, analytics to yield
data warehouses or arriving at a much insights and offerings
data marts. higher pace. This fast with speed and
flow of data entailed impact.
that the influx of big
volume data had to be
stored and processed
rapidly, often on
massive parallel
servers running
Hadoop.
Data was internally Data was often Data is both being
sourced. externally sourced. internally and
externally sourced.
Relational databased Database appliances, In memory analytics,
Hadoop clusters, SQL in database
to Hadoop processing, agile
environments, etc. analytical methods,
machine learning
techniques, etc.

Challenges of Big Data Analytics


There are mainly seven challenges of big data: scale, security, schema,
Continuous availability, Consistency, Partition tolerant and data quality.
Scale: Storage (RDBMS (Relational Database Management System) or
NoSQL (Not only SQL)) is one major concern that needs to be addressed to
handle the need for scaling rapidly and elastically. The need of the hour is
a storage that can best withstand the attack of large volume, velocity and
variety of big data. Should you scale vertically or should you scale
horizontally?
Security: Most of the NoSQL big data platforms have poor security
mechanisms (lack of proper authentication and authorization
mechanisms) when it comes to safeguarding big data. A spot that cannot
be ignored given that big data carries credit card information, personal
information and other sensitive data.
schema: Rigid schemas have no place. We want the technology to be
able to fit our big data and not the other way around. The need of the
hour is dynamic schema. Static (pre-defined schemas) are obsolete.
Continuous availability: The big question here is how to provide 24/7
support because almost all RDBMS and NoSQL big data platforms have a
certain amount of downtime built in.
Consistency: Should one opt for consistency or eventual consistency?
Partition tolerant: How to build partition tolerant systems that can take
care of both hardware and software failures?
Data quality: How to maintain data quality- data accuracy,
completeness, timeliness, etc.? Do we have appropriate metadata in
place?

Importance of of Big Data Analytics


Let us study the various approaches to analysis of data and what it leads
to.
Reactive-Business Intelligence: What does Business Intelligence (BI)
help us with? It allows the businesses to make faster and better decisions
by providing the right information to the right person at the right time in
the right format. It is about analysis of the past or historical data and
then displaying the findings of the analysis or reports in the form of
enterprise dashboards, alerts, notifications, etc. It has support for both
pre-specified reports as well as ad hoc querying.
Reactive - Big Data Analytics: Here the analysis is done on huge
datasets but the approach is still reactive as it is still based on static data.
Proactive - Analytics: This is to support futuristic decision making by
use of data mining predictive modelling, text mining, and statistical
analysis on. This analysis is not on big data as it still the traditional
database management practices on big data and therefore has severe
limitations on the storage capacity and the processing capability.
Proactive - Big Data Analytics: This is filtering through terabytes,
petabytes, exabytes of information to filter out the relevant data to
analyze. This also includes high performance analytics to gain rapid
insights from big data and the ability to solve complex problems using
more data.

Big Data Technologies


Big Data technology is primarily classified into the following two types:
Operational Big Data Technologies
This type of big data technology mainly includes the basic day-to-day data
that people used to process. Typically, the operational-big data includes
daily basis data such as online transactions, social media platforms, and
the data from any particular organization or a firm, which is usually
needed for analysis using the software based on big data technologies.
The data can also be referred to as raw data used as the input for several
Analytical Big Data Technologies.
Some specific examples that include the Operational Big Data
Technologies can be listed as below:
o Online ticket booking system, e.g., buses, trains, flights, and movies,
etc.
o Online trading or shopping from e-commerce websites like Amazon,
Flipkart, Walmart, etc.
o Online data on social media sites, such as Facebook, Instagram,
Whatsapp, etc.
o The employees' data or executives' particulars in multinational
companies.

Analytical Big Data Technologies


Analytical Big Data is commonly referred to as an improved version of Big
Data Technologies. This type of big data technology is a bit complicated
when compared with operational-big data. Analytical big data is mainly
used when performance criteria are in use, and important real-time
business decisions are made based on reports created by analyzing
operational-real data. This means that the actual investigation of big data
that is important for business decisions falls under this type of big data
technology.
Some common examples that involve the Analytical Big Data
Technologies can be listed as below:
o Stock marketing data
o Weather forecasting data and the time series analysis
o Medical health records where doctors can personally monitor the
health status of an individual
o Carrying out the space mission databases where every information of a
mission is very important

Top Big Data Technologies


We can categorize the leading big data technologies into the following
four sections:
o Data Storage
o Data Mining
o Data Analytics
o Data Visualization
Data Storage
Let us first discuss leading Big Data Technologies that come under Data
Storage:

Hadoop: When it comes to handling big data, Hadoop is one of the


leading technologies. Also, it is capable enough to process tasks in
batches. The Hadoop framework was mainly introduced to store and
process data in a distributed data processing environment. The Apache
Software Foundation introduced Hadoop which is written in Java
programming language.

MongoDB: MongoDB is another important component of big data


technologies in terms of storage. No relational properties and RDBMS
properties apply to MongoDb because it is a NoSQL database. MongoDB
uses schema documents. This enables MongoDB to hold massive
amounts of data. It is based on a simple cross-platform document-
oriented design. MongoDB Inc. introduced MongoDB which is written with
a combination of C++, Python, JavaScript, and Go language.

RainStor: RainStor is a popular database management system designed


to manage and analyze organizations' Big Data requirements. It uses
deduplication strategies that help manage storing and handling vast
amounts of data. RainStor was designed in 2004 by a RainStor Software
Company. It operates just like SQL.

Hunk: Hunk is mainly helpful when data needs to be accessed in remote


Hadoop clusters. Hunk allows us to report and visualize vast amounts of
data from Hadoop and NoSQL data sources. Hunk was introduced in 2013
by Splunk Inc. It is based on the Java programming language.

Cassandra: Cassandra is one of the leading big data technologies among


the list of top NoSQL databases. It is open-source, distributed and has
extensive column storage options. Cassandra was developed in 2008 by
the Apache Software Foundation for the Facebook inbox search feature. It
is based on the Java programming language.

Data Mining
Let us now discuss leading Big Data Technologies that come under Data
Mining:
Presto: Presto is an open-source and a distributed SQL query engine
developed to run interactive analytical queries against huge-sized data
sources. The size of data sources can vary from gigabytes to petabytes.
Presto is developed in 2013 by the Apache Software Foundation.
Companies like Repro, Netflix, Airbnb, Facebook and Checkr are using this
big data technology.

RapidMiner: RapidMiner is defined as the data science software that


offers us a very robust and powerful graphical user interface to create,
deliver, manage, and maintain predictive analytics. RapidMiner is
developed in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer
at the Technical University of Dortmund's AI unit. It was initially named
YALE (Yet Another Learning Environment). Companies that are making
good use of the RapidMiner tool are Boston Consulting Group, InFocus,
Domino's.

ElasticSearch: When it comes to finding information, elasticsearch is


known as an essential tool. It provides a purely distributed search engine
which is completely text-based. ElasticSearch is primarily written in a Java
programming language and was developed in 2010 by Shay Banon. Now,
it has been handled by Elastic NV since 2012. ElasticSearch is used by
many top companies, such as LinkedIn, Netflix, Facebook, Google,
Accenture, StackOverflow, etc.

Data Analytics
Now, let us discuss leading Big Data Technologies that come under Data
Analytics:
Apache Kafka: Apache Kafka is a popular streaming platform. This
streaming platform is primarily known for its three core capabilities:
publisher, subscriber and consumer. It is written in Java language and
was developed by the Apache software community in 2011. Some top
companies using the Apache Kafka platform include Twitter, Spotify,
Netflix, Yahoo, LinkedIn etc.

Splunk: Splunk is known as one of the popular software platforms for


capturing, correlating, and indexing real-time streaming data. Splunk can
also produce graphs, alerts, summarized reports, data visualizations, and
dashboards, etc., using related data. Splunk Inc. introduced Splunk in the
year 2014. It is written in combination with AJAX, Python, C++ and XML.
Companies such as Trustwave, QRadar are making good use of Splunk for
their analytical and security needs.

KNIME: KNIME is used to draw visual data flows, execute specific steps
and analyze the obtained models, results, and interactive views. It also
allows us to execute all the analysis steps altogether. It consists of an
extension mechanism that can add more plugins, giving additional
features and functionalities. KNIME is based on Eclipse and written in a
Java programming language. It was developed in 2008 by KNIME
Company. A list of companies that are making use of KNIME includes
Harnham, Tyler, and Paloalto.

Spark: Apache Spark is known for offering In-memory computing


capabilities that help enhance the overall speed of the operational
process. It also provides a generalized execution model to support more
applications. Spark is written using Java, Scala, Python and R language.
The Apache Software Foundation developed it in 2009. Companies like
Amazon, ORACLE, CISCO, VerizonWireless and Hortonworks are using this
big data technology and making good use of it.

R-Language: R is defined as the programming language, mainly used in


statistical computing and graphics. It is a free software environment used
by leading data miners, practitioners and statisticians. Language is
primarily beneficial in the development of statistical-based software and
data analytics.
R-language was introduced in Feb 2000 by R-Foundation. It is written in
Fortran. Companies like Barclays, American Express, and Bank of America
use R-Language for their data analytics needs.
Blockchain: Blockchain is a technology that can be used in several
applications related to different industries, such as finance, supply chain,
manufacturing, etc. Additionally, it is also used to fulfill the needs of
shared ledger, smart contract, privacy, and consensus in any Business
Network Environment.
Blockchain technology was first introduced in 1991 by two researchers,
Stuart Haber and W. Scott Stornetta. However, blockchain has its first
real-world application in Jan 2009 when Bitcoin was launched. It is a
specific type of database based on Python, C++, and JavaScript. ORACLE,
Facebook, and MetLife are a few of those top companies using Blockchain
technology.

Data Visualization
Let us discuss leading Big Data Technologies that come under Data
Visualization:
Tableau: Tableau is one of the fastest and most powerful data
visualization tools used by leading business intelligence industries.
Tableau helps in creating the visualizations and insights in the form of
dashboards and worksheets.
Tableau is developed and maintained by a company named TableAU. It
was introduced in May 2013. It is written using multiple languages, such
as Python, C, C++, and Java. Some of the top companies using Tableau
are Cognos, QlikQ, and ORACLE Hyperion.

Plotly: As the name suggests, Plotly is best suited for plotting or creating
graphs and relevant components at a faster speed in an efficient way.
This helps interactive styling graphs with Jupyter notebook and Pycharm.
Plotly was introduced in 2012 by Plotly company. It is based on
JavaScript. Paladins and Bitbank are some of those companies that are
making good use of Plotly.

You might also like