0% found this document useful (0 votes)

23 views20 pages

Unit-1 Bda

The document provides an overview of Big Data, including its definition, characteristics, types, and the challenges associated with it. It explains the components of Big Data analytics, the advantages and disadvantages of utilizing Big Data, and the historical context of its development. Additionally, it outlines the workflow of Big Data platforms and the significance of data management and governance.

Uploaded by

romeyoremo900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views20 pages

Unit-1 Bda

Uploaded by

romeyoremo900

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

22UBCAS69: BIG DATA ANALYTICS

Unit I: Introduction to Big Data

Introduction to big data: Introduction to Big Data Platform – Challenges of Conventional

Systems – Intelligent data analysis – Nature of Data –.Characteristics of Data – Evolution of Big
Data – Definition of Big Data – Challenges with Big Data – Volume, Velocity, Variety – Other
Characteristics of Data – Need for Big Data–Analytic Processes and Tools.

What is Data?
The quantities, characters or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on magnetic,
optical, or mechanical recording media.

What is Big Data?

Big data is also a data but with huge size.

Big Data is a collection of data that is huge in volume and diverse collections of data. It is a data
with so large size and complexity that none of traditional data management tools can store it or
process it efficiently.

Big data is the large onset of structured, semi-structured, and unstructured data. It is data that
arrives at a much higher volume, at a much faster rate, in a wider variety of file formats, and
from a wider variety of sources.

Types of various Units of Memory Size-

Bit Single Binary Digit (1 or 0)

Nibble 4 bits
Byte 8 Bits
Kilo Byte 1,024 Bytes
Mega Byte 1,024 Kilo Byte
Giga Byte 1,024 Mega Byte
Tera Byte 1,024 Giga Byte
Peta Byte 1,024 Tera Byte
Exa Byte 1,024 Peta Byte
Zetta Byte 1,024 Exa Byte
Yotta Byte 1,024 Zetta Byte
Bronto Byte 1,024 Yotta Byte
Geop Byte 1,o24 Bronto Byte
Example and Sources of Big Data:

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
The statistic shows that 500+terabytes of new data get ingested into the databases of
social media site Facebook, every day. This data is mainly generated in terms of photo
and video uploads, message exchanges, putting comments etc.

o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
o Share Market: Stock exchange across the world generates huge amount of data through
its daily transaction.
The New York Stock Exchange is an example of Big Data that generates about one
terabyte of new trade data per day.

TYPES AND NATURE OF BIG DATA

Following are the types of Big Data:

a) Structured
b) Unstructured
c) Semi-structured

a) Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
‘structured’ data. It can be stored and access displayed in fixed format that is rows and columns.

Example of Structured data: 1. Numbers, Date and Strings

2. An ‘Employee’ table in a database is an example of Structured Data

Emp_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
b) Un Structured Data:
Unstructured data refers to information that doesn’t have a fixed format or structure
that makes it difficult to organize and analyze. Unlike structured data, which is neatly
arranged in tables, unstructured data includes a variety of formats.
 Unstructured data refers to information that does not have a predefined data
model or structure, making it challenging to collect, process and analyze using
traditional data management tools.
 Unlike structured data, which is organized in a well-defined format
(like rows and columns in a relational database), unstructured data can come
in various forms and formats.

Example of Unstructured Data

This type of data can include a wide range of formats, such as:
 Text documents (e.g., emails, reports, articles)
 Multimedia files (e.g., images, audio, video)
 Social media content (e.g., posts, comments, tweets)
 Web pages and blogs

c) Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data
as a structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML file.

Semi-structured data is data that has some structure but doesn't conform to a data
model. It's also known as partially structured data or self-describing structure.

Examples of Semi-structured Data

Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Difference between Structured and Un-Structured Data:

CHARACTERISTICS OF BIG DATA:
In recent years, Big Data was defined by the “3Vs” but now there is “6Vs” of Big Data which
are also termed as the characteristics of Big Data as follows:
1. Volume
2. Variety
3. Velocity
4. Veracity
5. Value
6. Variability

(1) Volume – The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business processes, machines,
social media platforms, networks, human interactions, and many more.

Facebook can generate approximately a billion messages, 4.5 billion times that the "Like"
button is recorded, and more than 350 million new posts are uploaded each day. Big data
technologies can handle large amounts of data.

(2) Variety – The next aspect of Big Data is its variety.

Big Data can be structured, unstructured, and semi-structured that are being collected from different
sources. Data will only be collected from databases and sheets in the past, But these days the data will
comes in array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
(3) Velocity

Velocity plays an important role compared to others. Velocity creates the speed by which the data is
created in real-time. It contains the linking of incoming data sets speeds, rate of change, and activity
bursts. The primary aspect of Big Data is to provide demanding data rapidly.

Big data velocity deals with the speed at the data flows from sources like application logs, business
processes, networks, and social media sites, sensors, mobile devices, etc.

(4) Veracity:
 It refers to inconsistencies and uncertainty in data, that is data which is available can
sometimes get messy and quality and accuracy are difficult to control.
 Big Data is also variable because of the multitude of data dimensions resulting from
multiple disparate data types and sources.
 Example: Data in bulk could create confusion whereas less amount of data could convey
half or Incomplete Information .

(5) Value

Value is an essential characteristic of big data. It is not the data that we process or store. It
is valuable and reliable data that we store, process, and also analyze.

(6) Variability:

 How fast or available data that extent is the structure of your data is changing?
 How often does the meaning or shape of your data change?
 Example: if you are eating same ice-cream daily and the taste just keep changing.

Components of Big Data:

1. Ingestion

Ingestion refers to the process of gathering and preparing the data. You’d use the ETL (extract,
transform, and load) process to prepare your data. In this phase, you have to identify your data
sources, determine whether you’ll gather the data in batches or stream it, and prepare it through
cleansing, massaging, and organization. You perform the extract process in gathering the data
and the transformation process in optimizing it.

2. Storage

Once you have gathered the necessary data, you’d need to store it. Here, you’ll perform the final
step of the ETL, the load process. You’d store your data in a data warehouse or a data lake,
depending on your requirements. This is why it’s crucial to understand your organization’s goals
while performing any big data process.
4. Analysis

In this phase of your big data process, you’d analyze the data to generate valuable insights for
your organization. There are four kinds of big data analytics: prescriptive, predictive, descriptive,
and diagnostic. You’d use artificial intelligence and machine learning algorithms in this phase to
analyze the data.

5. Consumption

This is the final phase of a big data process. Once you have analyzed the data and have found the
insights, you have to share them with others. Here, you’d have to utilize data visualization and
data storytelling to share your insights effectively with a non-technical audience such as
stakeholders and project managers.

Introduction to big data platform:

Big Data platforms collect data from various sources, such as sensors, weblogs, social media, and other
databases. Data Storage. Once the data is collected, it is stored in a repository, such as Hadoop
Distributed File System (HDFS), Amazon S3, or Google Cloud Storage.

A big data platform acts as an organized storage medium for large amounts of data. Big data
platforms utilize a combination of data management hardware and software tools to store
aggregated data sets, usually onto the cloud.

How Big Data Platform works

Big Data platform workflow can be divided into the following stages:

1. Data Collection
Big Data platforms collect data from various sources, such as sensors, weblogs, social
media, and other databases.
2. Data Storage
Once the data is collected, it is stored in a repository, such as Hadoop Distributed File
System (HDFS), Amazon S3, or Google Cloud Storage.
3. Data Processing
Data Processing involves tasks such as filtering, transforming, and aggregating the data.
This can be done using distributed processing frameworks, such as Apache Spark,
Apache Flink, or Apache Storm.
4. Data Analytics
After data is processed, it is then analyzed with analytics tools and techniques, such as
machine learning algorithms, predictive analytics, and data visualization.
5. Data Governance
Data Governance (data cataloging, data quality management, and data lineage tracking)
ensures the accuracy, completeness, and security of the data.
6. Data Management
Big data platforms provide management capabilities that enable organizations to make
backups, recover, and archive.

BIG DATA ANALYTICS

Define big data analytics:

Big data analytics is the process of collecting, examining, and analyzing large amounts of
data to discover market trends, insights, and patterns that can help companies make better
business decisions.

Advantages & Disadvantages of Big Data:

There are numerous advantages of Big Data for organizations. Some of the key ones are as
follows:

1. Enhanced Decision-making

Big data implementations can help businesses and organizations make better-informed decisions
in less time. It allows them to use outside intelligence such as search engines and social media
platforms to fine-tune their strategies. Big data can identify trends and patterns that would’ve
been invisible otherwise, helping companies avoiding errors.

2. Data-driven Customer Service

Another huge impact big data can have on all industries is in the customer service department.
Companies are replacing the traditional customer feedback system with data-driven solutions.
Such solutions can analyze customer feedback more efficiently and help them offer customer
service to the consumers.

3. Efficiency Optimization

Organizations use big data to identify the weak areas present within them. Then, they use these
findings to resolve those issues and enhance their operations substantially. For example, Big
Data has substantially helped the manufacturing sector improve its efficiency through IoT and
robotics.
4. Real-time Decision Making

Big Data has transformed several areas by enabling real-time trackings, such as inventory
management, supply chain optimization, anti-money laundering, and fraud detection in banking
& finance.

5. Reduce costs of business processes

The surveys conducted by New Vantage and Syncsort (now Precisely) reveals that big data
analytics has helped businesses to reduce their expenses significantly. 66.7% of survey
respondents from New Vantage claimed that they have started using big data to reduce expenses.
Furthermore, 59.4% of survey respondents from Syncsort claimed that big data tools helped
them reduce costs and increase operational efficiency.

6. Fraud Detection

Financial companies, in particular, use big data to detect fraud. Data analysts use machine
learning algorithms and artificial intelligence to detect anomalies and transaction patterns. These
anomalies of transaction patterns indicate something is out of order or a mismatch giving us
clues about possible frauds.

7. Increased productivity

According to a survey from Syncsort, 59.9% of survey respondents have claimed that they were
using big data analytics tools like Spark and Hadoop to increase productivity. This increase in
productivity has, in turn, helped them to improve customer retention and boost sales.

Modern big data tools help data scientists and analysts to analyze a large amount of data
efficiently, enabling them to have a quick overview of more information. This also increases
their productivity levels.

8. Improved customer service

Improving customer interactions is crucial for any business as a part of their marketing efforts.

Since big data analytics provide businesses with more information, they can utilize that data to
create more targeted marketing campaigns and special, highly personalized offers to each
individual client.

Disadvantages:

1. Lack of talent
According to a survey by AtScale, the lack of big data experts and data scientists has been the
biggest challenge in this field for the past three years. Currently, many IT professionals don’t
know how to carry out big data analytics as it requires a different skill set. Thus, finding data
scientists who are also experts in big data can be challenging.

Big data experts and data scientists are two highly paid careers in the data science field.
Therefore, hiring big data analysts can be very expensive for companies, especially for startups.
Some companies have to wait for a long time to hire the required staff to continue their big data
analytics tasks.

2. Security risks

Most of the time, companies collect sensitive information for big data analytics. Those data need
protection, and security risks can be demerits due to the lack of proper maintenance.

Besides, having access to huge data sets can gain unwanted attention from hackers, and your
business may be a target of a potential cyber-attack. As you know, data breaches have become
the biggest threat to many companies today.

Another risk with big data is that unless you take all necessary precautions, important
information can be leaked to competitors.

3. Compliance

The need to have compliance with government legislation is also a drawback of big data. If big
data contains personal or confidential information, the company should make sure that they
follow government requirements and industry standards to store, handle, maintain, and process
that data.

So, data governance tasks, transmission, and storage will become more difficult to manage as the
big data volumes increase.

History of Big Data Analytics:

The concept of big data has been around for years;

The emergence of data, and big data, is a long and storied history. There were many
advancements in technology during World War 2, which were primarily made to serve military
purposes. Over time though, those advancements would become useful to the commercial sector
and eventually the general public, with personal computing becoming a viable option to the
everyday consumer.

1940s to 1989 – Data Warehousing and Personal Desktop Computers

The origins of electronic storage can be traced back to the development of the world’s first
programmable computer, the Electronic Numerical Integrator and Computer (ENIAC). It was
designed by the U.S. army during World War 2 to solve numerical problems, such as calculate
the range of artillery fire. Then, in the early 1960s, International Business Machines (IBM)
released the first transistorized computer called TRADIC, which helped data centers branch out
of the military and serve more general commercial purposes.

The first personal desktop computer to feature a Graphical User Interface (GUI) was Lisa,
released by Apple Computers in 1983. Throughout the 1980s, companies like Apple, Microsoft,
and IBM would release a wide range of personal desktop computers, which led to a surge in
people buying their own personal computers and being able to use them at home for the first time
ever. Thus, electronic storage was finally available to the masses.

1989 to 1999 – Emergence of the World Wide Web

Between 1989 and 1993, British computer scientist Sir Tim Berners-Lee would create the
fundamental technologies required to power what we now know as the World Wide Web.
These web technologies were HyperText Markup Language (HTML), Uniform Resource
Identifier (URI), and Hypertext Transfer Protocol (HTTP). Then in April 1993, the decision was
made to make the underlying code for these web technologies free, forever.

2000s to 2010s – Controlling Data Volume, Social Media and Cloud Computing
During the early 2000s, companies such as Amazon, eBay, and Google helped generate large
amounts of web traffic, as well as a combination of structured and unstructured data. Amazon
also launched a beta version of AWS (Amazon Web Services) in 2002, which opened
the Amazon.com platform to all developers. By 2004, over 100 applications were built for it.

AWS then relaunched in 2006, offering a wide range of cloud infrastructure services, including
Simple Storage Service (S3) and Elastic Compute Cloud (EC2). The public launch of AWS
attracted a wide range of customers, such as Dropbox, Netflix, and Reddit, who were eager to
become cloud-enabled and so they would all partner with AWS before 2010.

2010s to now – Optimization Techniques, Mobile Devices and IoT

In the 2010s, the biggest challenges facing big data was the advent of mobile devices and the IoT
(Internet of Things). Suddenly, millions of people, worldwide, were walking around with small,
internet-enabled devices in the palm of their hands, able to access the web, wirelessly
communicate with other internet-enabled devices, and upload data to the cloud. According to a
2017 Data Never Sleeps report by Domo, we were generating 2.5 quintillion bytes of data daily.

Evolution of Big Data Analytics:

If we see the last few decades, we can analyze that Big Data technology has gained so much
growth. There are a lot of milestones in the evolution of Big Data which are described below:

1. Data Warehousing:
In the 1990s, data warehousing emerged as a solution to store and analyze large volumes
of structured data.

2. Hadoop:
Hadoop was introduced in 2006 by Doug Cutting and Mike Cafarella. Distributed storage
medium and large data processing are provided by Hadoop, and it is an open-source
framework.
3. NoSQL Databases:
In 2009, NoSQL databases were introduced, which provide a flexible way to store and
retrieve unstructured data.

4. Cloud Computing:
Cloud Computing technology helps companies to store their important data in data
centers that are remote, and it saves their infrastructure cost and maintenance costs.

5. Machine Learning:
Machine Learning algorithms are those algorithms that work on large data, and analysis
is done on a huge amount of data to get meaningful insights from it. This has led to the
development of artificial intelligence (AI) applications.

6. Data Streaming:
Data Streaming technology has emerged as a solution to process large volumes of data in
real time.

7. Edge Computing:
Edge Computing is a kind of distributed computing paradigm that allows data processing
to be done at the edge or the corner of the network, closer to the source of the data.
Challenges of Big Data

Many companies get stuck at the initial stage of their Big Data projects. This is because they are
neither aware of the challenges of Big Data nor are equipped to tackle those
challenges. The challenges of conventional systems in Big Data need to be addressed. Below are
some of the major Big Data challenges and their solutions.

1. Lack of proper understanding of Big Data

Companies fail in their Big Data initiatives due to insufficient understanding. Employees may
not know what data is, its storage, processing, importance, and sources. Data professionals may
know what is going on, but others may not have a clear picture.
For example, if employees do not understand the importance of data storage, they might not keep
the backup of sensitive data. They might not use databases properly for storage. As a result,
when this important data is required, it cannot be retrieved easily.

Solution

Big Data workshops and seminars must be held at companies for everyone. Basic training
programs must be arranged for all the employees who are handling data regularly and are a part
of the Big Data projects. A basic understanding of data concepts must be inculcated by all levels
of the organization.

2. Data growth issues

One of the most pressing challenges of Big Data is storing all these huge sets of data properly.
The amount of data being stored in data centers and databases of companies is increasing rapidly.
As these data sets grow exponentially with time, it gets extremely difficult to handle.
Most of the data is unstructured and comes from documents, videos, audios, text files and other
sources. This means that you cannot find them in databases. This can pose huge Big Data
analytics challenges and must be resolved as soon as possible, or it can delay the growth of the
company.

Solution

In order to handle these large data sets, companies are opting for modern techniques, such
as compression, tiering, and deduplication. Compression is used for reducing the number of bits
in the data, thus reducing its overall size. Deduplication is the process of removing duplicate and
unwanted data from a data set.
Data tiering allows companies to store data in different storage tiers. It ensures that the data is
residing in the most appropriate storage space. Data tiers can be public cloud, private cloud, and
flash storage, depending on the data size and importance.
Companies are also opting for Big Data tools, such as Hadoop, NoSQL and other technologies.
3. Confusion while Big Data tool selection

Companies often get confused while selecting the best tool for Big Data analysis and storage.
Is HBase or Cassandra the best technology for data storage? Is Hadoop MapReduce good enough
or will Spark be a better option for data analytics and storage?
These questions bother companies and sometimes they are unable to find the answers. They end
up making poor decisions and selecting inappropriate technology. As a result, money, time,
efforts and work hours are wasted.

Solution
The best way to go about it is to seek professional help. You can either hire experienced
professionals who know much more about these tools. Another way is to go for Big Data
consulting. Here, consultants will give a recommendation of the best tools, based on your
company’s scenario. Based on their advice, you can work out a strategy and then select the best
tool for you.

4. Lack of data professionals

To run these modern technologies and Big Data tools, companies need skilled data professionals.
These professionals will include data scientists, data analysts and data engineers who are
experienced in working with the tools and making sense out of huge data sets.
Companies face a problem of lack of Big Data professionals. This is because data handling tools
have evolved rapidly, but in most cases, the professionals have not. Actionable steps need to be
taken in order to bridge this gap.

Solution

Companies are investing more money in the recruitment of skilled professionals. They also have
to offer training programs to the existing staff to get the most out of them.
Another important step taken by organizations is the purchase of data analytics solutions that are
powered by artificial intelligence/machine learning. These tools can be run by professionals who
are not data science experts but have basic knowledge. This step helps companies to save a lot of
money for recruitment.

5. Securing data

Securing these huge sets of data is one of the daunting challenges of Big Data. Often companies
are so busy in understanding, storing and analyzing their data sets that they push data security for
later stages. But, this is not a smart move as unprotected data repositories can become breeding
grounds for malicious hackers.
Companies can lose up to $3.7 million for a stolen record or a data breach.

Solution
Companies are recruiting more cybersecurity professionals to protect their data. Other steps
taken for securing data include:
 Data encryption
 Data segregation
 Identity and access control
 Implementation of endpoint security
 Real-time security monitoring
 Use Big Data security tools, such as IBM Guardian

6. Integrating data from a variety of sources

Data in an organization comes from a variety of sources, such as social media pages, ERP
applications, customer logs, financial reports, e-mails, presentations and reports created by
employees. Combining all this data to prepare reports is a challenging task.
This is an area often neglected by firms. But, data integration is crucial for analysis, reporting
and business intelligence, so it has to be perfect.
Solution
Companies have to solve their data integration problems by purchasing the right tools. Some of
the best data integration tools are mentioned below:
 Talend Data Integration
 Centerprise Data Integrator
 ArcESB
 IBM InfoSphere
 Xplenty
 Informatica PowerCenter
 CloverDX
 Microsoft SQL
 QlikView

Why is big data analytics (Need) important and Benefits?

 Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient operations,
higher profits and happier customers.
 Big data analytics helps organizations harness their data and use it to identify new
opportunities. That, in turn, leads to smarter business moves, more efficient operations,
higher profits and happier customers. Businesses that use big data with advanced
analytics gain value in many ways, such as:
1. Reducing cost. Big data technologies like cloud-based analytics can significantly reduce costs
when it comes to storing large amounts of data (for example, a data lake). Plus, big data analytics
helps organizations find more efficient ways of doing business.
2. Making faster, better decisions. The speed of in-memory analytics – combined with the ability
to analyze new sources of data, such as streaming data from IoT – helps businesses analyze
information immediately and make fast, informed decisions.
3. Developing and marketing new products and services. Being able to gauge customer needs
and customer satisfaction through analytics empowers businesses to give customers what they
want, when they want it. With big data analytics, more companies have an opportunity to
develop innovative new products to meet customers’ changing needs.
4. Risk Management: More informed risk management techniques based on large data sample
sizes.
5. Increased Efficiency: Savings due to the increased efficiency and optimization of business
processes.

TYPES OF BIG DATA ANALYTICS

The following are the four types of big data analytics:

i. Prescriptive Analytics
This type of analytics talks about an analysis that is based on rules and recommendations,
to prescribe a certain analytical path for an enterprise. At the next level, prescriptive
analytics will automate decisions and actions—how can we make that happen?
Building on the previous analytics, neural networks and heuristics are applied to the data
to recommend the best possible actions that will derive the desired outcomes.
ii. Diagnostic Analytics
In diagnostic analytics, most enterprises start to apply big data analytics to answer diagnostic
questions such as how and why something happened. Some may also call this behavioral
analytics.
Diagnostic analytics is about looking into the past and determining why a certain thing
happened. This type of analytics usually revolves around working on a dashboard.
Diagnostic analytics with big data helps in two ways: (a) the additional data brought by the
digital age eliminates analytic blind spots, and (b) the how and why questions deliver insights
that pinpoint the actions that need to be taken.
iii. Predictive Analytics
This type of analytics ensures that the path for the future course of action is predicted.
Answering the how and why questions will reveal specific patterns to detect when
outcomes are about to occur.
Predictive analytics builds on diagnostic analytics to look for these patterns and see
what is going to happen. Machine learning is also applied as new patterns emerge to
continuously learn.
iv. Descriptive Analytics
In this type of analytics, work is done based on incoming data. For the mining of this
data, we deploy analytics and come up with a description based on the data.
Many enterprises have spent years generating descriptive analytics—answering the
happened questions. This information is valuable but only provides a high-level,
rearview-mirror view of the business performance.

Applications of Big Data Analytics:

Here are some examples of the applications of big data analytics:

 Customer Acquisition and Retention: Customer information helps tremendously in

marketing trends, through data-driven actions, to increase customer satisfaction. For
example, personalization engines for Netflix, Amazon, and Spotify help with improved
customer experiences and gaining customer loyalty.
 Targeted Ads: Personalized data about interaction patterns, order history, and product
page viewing history can help immensely to create targeted ad campaigns for customers
on a larger scale and at the individual level.
 Product Development: It can generate insights on development decisions, product
viability, performance measurements, etc., and direct improvements that positively serve
the customers.
 Price Optimization: Pricing models can be modeled and used by retailers with the help
of diverse data sources to maximize revenues.
 Supply Chain and Channel Analytics: Predictive analytical models help with B2B
supplier networks, preemptive replenishment, route optimizations, inventory
management, and notification of potential delays in deliveries.
 Risk Management: It helps in the identification of new risks with the help of data
patterns for the purpose of developing effective risk management strategies.
 Improved Decision-making: The insights that are extracted from the data can help
enterprises make sound and quick decisions.

BIG DATA ANALYTICS TOOLS:

Here is the list of top 10 big data tools –

 Apache Hadoop: Hadoop which helps in storing and processing large data.
 Apache Spark: Spark helps in-memory calculation
 Flink
 Apache Storm: Storm helps in faster processing of unbounded data
 Apache Cassandra: It provides high availability and scalability of a database.
 MongoDB: provides cross-platform capabilities.
 Tableau
 RapidMiner
 R Programming
 Qubole
 SAS
 Data Pine
 Hadoop:
An open-source framework that stores and processes big data sets. Hadoop can handle and
analyse structured and unstructured data.

 Spark:
An open-source cluster computing framework for real-time processing and data analysis.

 APACHE Cassandra:
APACHE Cassandra is an open-source NoSQL distributed database that is used to fetch large
amounts of data. It’s one of the most popular tools for data analytics and has been praised by
many tech companies due to its high scalability and availability without compromising speed
and performance. It is capable of delivering thousands of operations every second and can
handle petabytes of resources with almost zero downtime. It was created by Facebook back in
2008 and was published publicly.

 Qubole
It’s an open-source big data tool that helps in fetching data in a value of chain using ad-hoc
analysis in machine learning. Qubole is a data lake platform that offers end-to-end service with
reduced time and effort which are required in moving data pipelines. It is capable of
configuring multi-cloud services such as AWS, Azure, and Google Cloud. Besides, it also
helps in lowering the cost of cloud computing by 50%.

 Mongo DB

Came in limelight in 2010, is a free, open-source platform and a document-oriented (NoSQL)

database that is used to store a high volume of data. It uses collections and documents for
storage and its document consists of key-value pairs which are considered a basic unit
of Mongo DB. It is so popular among developers due to its availability for multi-programming
languages such as Python, Jscript, and Ruby.

 Apache Storm

A storm is a robust, user-friendly tool used for data analytics, especially in small companies.
The best part about the storm is that it has no language barrier (programming) in it and can
support any of them. It was designed to handle a pool of large data in fault-tolerance and
horizontally scalable methods. When we talk about real-time data processing, Storm leads the
chart because of its distributed real-time big data processing system, due to which today many
tech giants are using APACHE Storm in their system. Some of the most notable names are
Twitter, Zendesk, NaviSite, etc.
 SAS

Today it is one of the best tools for creating statistical modeling used by data analysts. By
using SAS, a data scientist can mine, manage, extract or update data in different variants from
different sources. Statistical Analytical System or SAS allows a user to access the data in any
format (SAS tables or Excel worksheets). Besides that it also offers a cloud platform for
business analytics called SAS Viya and also to get a strong grip on AI & ML, they have
introduced new tools and products.

 Data Pine

Datapine is an analytical used for BI and was founded back in 2012 (Berlin, Germany). In a
short period of time, it has gained much popularity in a number of countries and it’s mainly
used for data extraction (for small-medium companies fetching data for close monitoring).
With the help of its enhanced UI design, anyone can visit and check the data as per their
requirement and offer in 4 different price brackets, starting from $249 per month. They do
offer dashboards by functions, industry, and platform.

 Rapid Miner

It’s a fully automated visual workflow design tool used for data analytics. It’s a no-code
platform and users aren’t required to code for segregating data. Today, it is being heavily used
in many industries such as ed-tech, training, research, etc. Though it’s an open-source platform
but has a limitation of adding 10000 data rows and a single logical processor. With the help
of Rapid Miner, one can easily deploy their ML models to the web or mobile (only when the
user interface is ready to collect real-time figures).

Aws Certified Solutions Architect Associate Saa C03 DUMPS
50% (4)
Aws Certified Solutions Architect Associate Saa C03 DUMPS
402 pages
AIF-C01 AWS Certified AI Practitioner Updated Dumps
100% (1)
AIF-C01 AWS Certified AI Practitioner Updated Dumps
16 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
60 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Serverless Course Slide 012022
No ratings yet
Serverless Course Slide 012022
464 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
22 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
297 pages
Big Data Introduction Unit 1
No ratings yet
Big Data Introduction Unit 1
19 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Chapter 4 Data Analytics
No ratings yet
Chapter 4 Data Analytics
19 pages
Bda M1
No ratings yet
Bda M1
111 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Unit 1
No ratings yet
Unit 1
44 pages
Big Data Chapter-I - New
No ratings yet
Big Data Chapter-I - New
49 pages
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
No ratings yet
SnowFlake SnowPro Core Certification Notes - Fromblogs (1) 1 2
172 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
Unit 1
No ratings yet
Unit 1
56 pages
What Is BIG DATA - Introduction, Types, Characteristics, Example
No ratings yet
What Is BIG DATA - Introduction, Types, Characteristics, Example
11 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
BDA Unit-1
No ratings yet
BDA Unit-1
56 pages
Unit 1
No ratings yet
Unit 1
57 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
Introduction To Big Data Analytics - Thendral1
No ratings yet
Introduction To Big Data Analytics - Thendral1
26 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Unit 1 BDT
No ratings yet
Unit 1 BDT
27 pages
Unit-I - Big Data
No ratings yet
Unit-I - Big Data
29 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
Unit-I (Big Data)
No ratings yet
Unit-I (Big Data)
30 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Unit 1 BDM
No ratings yet
Unit 1 BDM
49 pages
Dva-C02 8
No ratings yet
Dva-C02 8
24 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
35 pages
BDS Module-1
No ratings yet
BDS Module-1
59 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
Big Data SKN
No ratings yet
Big Data SKN
24 pages
Unit 1 What Is Big Data
No ratings yet
Unit 1 What Is Big Data
26 pages
Hadoop 2 & 3 Units Final
No ratings yet
Hadoop 2 & 3 Units Final
27 pages
Amazon Forecast
No ratings yet
Amazon Forecast
23 pages
Big Data
No ratings yet
Big Data
7 pages
Big Data Lecture 1
No ratings yet
Big Data Lecture 1
22 pages
Big Data 1
No ratings yet
Big Data 1
22 pages
BIG DATA - Abhiskar Poudel
No ratings yet
BIG DATA - Abhiskar Poudel
14 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Big Data 101
No ratings yet
Big Data 101
18 pages
Unit I
No ratings yet
Unit I
25 pages
Big Data Basics Unit 1
No ratings yet
Big Data Basics Unit 1
12 pages
What Is Data
No ratings yet
What Is Data
20 pages
Unit 1
No ratings yet
Unit 1
20 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
Big Data
No ratings yet
Big Data
16 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Big Data
No ratings yet
Big Data
7 pages
Chapter 1: Introduction To Big Data Analytics
No ratings yet
Chapter 1: Introduction To Big Data Analytics
9 pages
Unit-I Bdaur-Bcom
No ratings yet
Unit-I Bdaur-Bcom
5 pages
VMware Cloud On AWS - Deploy Configure Manage 2021 Lab Manual
No ratings yet
VMware Cloud On AWS - Deploy Configure Manage 2021 Lab Manual
124 pages
Cloud - Computing UNIT-3 Material .
No ratings yet
Cloud - Computing UNIT-3 Material .
17 pages
Klatt 2022 The Streaming Industry and The Great Disruption How Winning A Golden Globe Helps Amazon Sell More Shoes
No ratings yet
Klatt 2022 The Streaming Industry and The Great Disruption How Winning A Golden Globe Helps Amazon Sell More Shoes
18 pages
Overview of Big Data
No ratings yet
Overview of Big Data
4 pages
Big Type Data
No ratings yet
Big Type Data
4 pages
HSM Userguide
No ratings yet
HSM Userguide
128 pages
Kostha 1
No ratings yet
Kostha 1
32 pages
Cloud Gateway Technical Guide
No ratings yet
Cloud Gateway Technical Guide
52 pages
Magic Quadrant For Observability Platforms
No ratings yet
Magic Quadrant For Observability Platforms
39 pages
Kukbit Internship Projects
No ratings yet
Kukbit Internship Projects
14 pages
Tushar Gurav
No ratings yet
Tushar Gurav
2 pages
Localization - Landing Zone Accelerator On Aws v1.3.0 IG - FR CA
No ratings yet
Localization - Landing Zone Accelerator On Aws v1.3.0 IG - FR CA
71 pages
Big Data Analytics Options
No ratings yet
Big Data Analytics Options
76 pages
Cloud System Architecture - Concepts and Design
No ratings yet
Cloud System Architecture - Concepts and Design
41 pages
CloudComputing Syllabus
No ratings yet
CloudComputing Syllabus
5 pages
Week 10 - AWS Containers
No ratings yet
Week 10 - AWS Containers
41 pages
Cloud Computing BE Computer 2015 Pattern
No ratings yet
Cloud Computing BE Computer 2015 Pattern
35 pages
Madhu Dharavath
No ratings yet
Madhu Dharavath
2 pages
AWS RDBMS MS SQLServer PDF
No ratings yet
AWS RDBMS MS SQLServer PDF
22 pages
Performance Evaluation of Iot Data Management Using Mongodb Versus Mysql Databases in Different Cloud Environments
No ratings yet
Performance Evaluation of Iot Data Management Using Mongodb Versus Mysql Databases in Different Cloud Environments
13 pages
Contoh Surat Lamaran Kerja
No ratings yet
Contoh Surat Lamaran Kerja
9 pages
Indra Institute of Education: Sne+Cloud
No ratings yet
Indra Institute of Education: Sne+Cloud
2 pages
Seminar Report Front Page
No ratings yet
Seminar Report Front Page
6 pages
ABC Candidate Submittal Cover Sheet - Duwand Constant
No ratings yet
ABC Candidate Submittal Cover Sheet - Duwand Constant
2 pages
Saurabh Resume
No ratings yet
Saurabh Resume
1 page
Whitepaper-Achieving Least Privilege in AWS
No ratings yet
Whitepaper-Achieving Least Privilege in AWS
4 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet