0% found this document useful (0 votes)

134 views22 pages

Understanding Big Data Characteristics

Big data has found applications in government, social media analytics, technology, and fraud detection. In government, big data analysis helped Obama's re-election campaign and India's general election. Social media insights provide real-time understanding of market responses. Technology companies like eBay and Amazon use big data for search, recommendations, and operations. Fraud detection can now analyze transactions in real-time to identify anomalies and prevent fraud.

Uploaded by

Anish Ghui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views22 pages

Understanding Big Data Characteristics

Uploaded by

Anish Ghui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

1. What is big data? Characteristics of big data and significance.

Ans. “Big data” is high-volume, velocity, and variety information assets that
demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.”
Characteristics of Big Data

Back in 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of Big Data –
Variety, Velocity, and Volume. Let’s discuss the characteristics of big data.

These characteristics, isolatedly, are enough to know what is big data. Let’s
look at them in depth:

1) Variety

Variety of Big Data refers to structured, unstructured, and semistructured

data that is gathered from multiple sources. While in the past, data could
only be collected from spreadsheets and databases, today data comes in
an array of forms such as emails, PDFs, photos, videos, audios, SM posts,
and so much more. Variety is one of the important characteristics of big
data.

2) Velocity

Velocity essentially refers to the speed at which data is being created in

real-time. In a broader prospect, it comprises the rate of change, linking of
incoming data sets at varying speeds, and activity bursts.

3) Volume
Volume is one of the characteristics of big data. We already know that Big
Data indicates huge ‘volumes’ of data that is being generated on a daily
basis from various sources like social media platforms, business
processes, machines, networks, human interactions, etc. Such a large
amount of data are stored in data warehouses. Thus comes to the end of
characteristics of big data.
The importance of big data does not revolve around how much data a
company has but how a company utilises the collected data. Every
company uses data in its own way; the more efficiently a company uses its
data, the more potential it has to grow. The company can take data from
any source and analyse it to find answers which will enable:

Significance of big data

1. Cost Savings : Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts
of data are to be stored and these tools also help in identifying more
efficient ways of doing business.
2. Time Reductions :The high speed of tools like Hadoop and
in-memory analytics can easily identify new sources of data which
helps businesses analyzing data immediately and make quick
decisions based on the learnings.
3. Understand the market conditions : By analyzing big data you can
get a better understanding of current market conditions. For example,
by analyzing customers’ purchasing behaviors, a company can find
out the products that are sold the most and produce products
according to this trend. By this, it can get ahead of its competitors.
4. Control online reputation: Big data tools can do sentiment analysis.
Therefore, you can get feedback about who is saying what about your
company. If you want to monitor and improve the online presence of
your business, then, big data tools can help in all this.
5. Using Big Data Analytics to Boost Customer Acquisition and
Retention

The customer is the most important asset any business depends on.
There is no single business that can claim success without first
having to establish a solid customer base. However, even with a
customer base, a business cannot afford to disregard the high
competition it faces. If a business is slow to learn what customers are
looking for, then it is very easy to begin offering poor quality products.
In the end, loss of clientele will result, and this creates an adverse
overall effect on business success. The use of big data allows
businesses to observe various customer related patterns and trends.
Observing customer behaviour is important to trigger loyalty.

6. Using Big Data Analytics to Solve Advertisers Problem and Offer

Marketing Insights

Big data analytics can help change all business operations. This
includes the ability to match customer expectation, changing
company’s product line and of course ensuring that the marketing
campaigns are powerful.

7. Big Data Analytics As a Driver of Innovations and Product

Development

Another huge advantage of big data is the ability to help companies

innovate and redevelop their products.
2. What is data analytics? What are the different types of analytics
explain in brief with examples (adv and disadvantages)
Ans. types:
1. Descriptive analytics
2. Diagnostic analytics
3. predictive analytics
4. prescriptive analytics

1. Descriptive analytics : Descriptive analytics is introductory,

retrospective, and answers the question “what happened?” It accounts for
roughly 80 percent of business analytics today, making it the most common
type of data analysis.

Example of descriptive analytics

Let’s say website traffic numbers fell just short of its goal in 2018. That’s
enough reason to run a descriptive analysis to see what went wrong.
The analysis tells us:

● Website traffic fell drastically in Q3.

● It picked back up in early Q4.
● Remained steady through the rest of the year.

2. Diagnostic analytics: Diagnostic analytics is retrospective as well, but

instead, it seeks “why” the problem that was laid out in the descriptive
analysis occurred.

Example of diagnostic analytics

Using our previous example, we now understand where the problem

occurred, but exactly why did website traffic plummet so sharply?

The analysis tells us:

● Website traffic fell during a search engine algorithm update.

● There was a 25 percent decrease in published web content.
● A record amount of backlinks were lost in Q3.
3. Predictive analytics: Predictive analytics, unlike the previous two
analyses, looks to ahead the future and is a bit more proactive with its
findings. It attempts to forecast what is likely to happen next, and is
one-half of what is considered “advanced analytics.”

Example of predictive analytics

The diagnostic analysis showed us a variety of issues, now it’s time

to predict next steps so an accurate website traffic number can be
generated for the next few quarters. Here’s what that estimate may
look like:

4. Prescriptive analytics: Prescriptive analytics is the final type of

advanced analysis. It takes the information that’s been predicted and
prescribes calculated next steps to take.
Example of prescriptive analytics

Now that we have an idea where website traffic should be headed, what
are some actionable items to get it there? Prescriptive models should
unveil a variety of answers.

The analysis tells us:

● Publish double the amount of web content to reach traffic goals.

● Sales content will generate the highest amount of traffic.
● Email marketing content is the easiest backlink win.

3. Why not big data analytics?

Ans:
4. What are the digital classifications of big data.
Ans:
5. Applications of big data analytics.
Ans: Big data has found many applications in various fields today. The
major fields where big data is being used are as follows.

● Government

Big data analytics has proven to be very useful in the government sector.
Big data analysis played a large role in Barack Obama’s successful 2012
re-election campaign. Also most recently, Big data analysis was majorly
responsible for the BJP and its allies to win a highly successful Indian
General Election 2014. The Indian Government utilizes numerous
techniques to ascertain how the Indian electorate is responding to
government action, as well as ideas for policy augmentation.

● Social Media Analytics

The advent of social media has led to an outburst of big data. Various
solutions have been built in order to analyze social media activity like IBM’s
Cognos Consumer Insights, a point solution running on IBM’s BigInsights
Big Data platform, can make sense of the chatter. Social media can provide
valuable real-time insights into how the market is responding to products
and campaigns. With the help of these insights, the companies can adjust
their pricing, promotion, and campaign placements accordingly. Before
utilizing the big data there needs to be some preprocessing to be done on
the big data in order to derive some intelligent and valuable results. Thus to
know the consumer mindset the application of intelligent decisions derived
from big data is necessary.

● Technology

The technological applications of big data comprise of the following

companies which deal with huge amounts of data every day and put them
to use for business decisions as well. For example, eBay.com uses two
data warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop
cluster for search, consumer recommendations, and merchandising. Inside
eBay‟s 90PB data warehouse. Amazon.com handles millions of back-end
operations every day, as well as queries from more than half a million
third-party sellers. The core technology that keeps Amazon running is
Linux-based and as of 2005, they had the world’s three largest Linux
databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB. Facebook
handles 50 billion photos from its user base. Windermere Real Estate uses
anonymous GPS signals from nearly 100 million drivers to help new home
buyers determine their typical drive times to and from work throughout
various times of the day.

● Fraud detection

For businesses whose operations involve any type of claims or transaction

processing, fraud detection is one of the most compelling Big Data
application examples. Historically, fraud detection on the fly has proven an
elusive goal. In most cases, fraud is discovered long after the fact, at which
point the damage has been done and all that’s left is to minimize the harm
and adjust policies to prevent it from happening again. Big Data platforms
that can analyze claims and transactions in real time, identifying large-scale
patterns across many transactions or detecting anomalous behavior from
an individual user, can change the fraud detection game.

● Call Center Analytics

Now we turn to the customer-facing Big Data application examples, of

which call center analytics are particularly powerful. What’s going on in a
customer’s call center is often a great barometer and influencer of market
sentiment, but without a Big Data solution, much of the insight that a call
center can provide will be overlooked or discovered too late. Big Data
solutions can help identify recurring problems or customer and staff
behavior patterns on the fly not only by making sense of time/quality
resolution metrics but also by capturing and processing call content itself.

● Banking

The use of customer data invariably raises privacy issues. By uncovering

hidden connections between seemingly unrelated pieces of data, big data
analytics could potentially reveal sensitive personal information. Research
indicates that 62% of bankers are cautious in their use of big data due to
privacy issues. Further, outsourcing of data analysis activities or distribution
of customer data across departments for the generation of richer insights
also amplifies security risks. Such as customers’ earnings, savings,
mortgages, and insurance policies ended up in the wrong hands. Such
incidents reinforce concerns about data privacy and discourage customers
from sharing personal information in exchange for customized offers.

● Agriculture

A biotechnology firm uses sensor data to optimize crop efficiency. It plants

test crops and runs simulations to measure how plants react to various
changes in condition. Its data environment constantly adjusts to changes in
the attributes of various data it collects, including temperature, water levels,
soil composition, growth, output, and gene sequencing of each plant in the
test bed. These simulations allow it to discover the optimal environmental
conditions for specific gene types.

● Marketing

Marketers have begun to use facial recognition software to learn how well
their advertising succeeds or fails at stimulating interest in their products. A
recent study published in the Harvard Business Review looked at what
kinds of advertisements compelled viewers to continue watching and what
turned viewers off. Among their tools was “a system that analyses facial
expressions to reveal what viewers are feeling.” The research was
designed to discover what kinds of promotions induced watchers to share
the ads with their social network, helping marketers create ads most likely
to “go viral” and improve sales.

● Smart Phones

Perhaps more impressive, people now carry facial recognition technology

in their pockets. Users of I Phone and Android smartphones have
applications at their fingertips that use facial recognition technology for
various tasks. For example, Android users with the remember app can
snap a photo of someone, then bring up stored information about that
person based on their image when their own memory lets them down a
potential boon for salespeople.

● Telecom

Now a day’s big data is used in different fields. In telecom also it plays a
very good role. Operators face an uphill challenge when they need to
deliver new, compelling, revenue-generating services without overloading
their networks and keeping their running costs under control. The market
demands new set of data management and analysis capabilities that can
help service providers make accurate decisions by taking into account
customer, network context and other critical aspects of their businesses.
Most of these decisions must be made in real time, placing additional
pressure on the operators. Real-time predictive analytics can help leverage
the data that resides in their multitude systems, make it immediately
accessible and help correlate that data to generate insight that can help
them drive their business forward.
● Healthcare

Traditionally, the healthcare industry has lagged behind other industries in

the use of big data, part of the problem stems from resistance to change
providers are accustomed to making treatment decisions independently,
using their own clinical judgment, rather than relying on protocols based on
big data. Other obstacles are more structural in nature. This is one of the
best place to set an example for Big Data Application.Even within a single
hospital, payor, or pharmaceutical company, important information often
remains siloed within one group or department because organizations lack
procedures for integrating data and communicating findings.

Health care stakeholders now have access to promising new threads of

knowledge. This information is a form of “big data,” so called not only for its
sheer volume but for its complexity, diversity, and timelines.
Pharmaceutical industry experts, payers, and providers are now beginning
to analyze big data to obtain insights. Recent technologic advances in the
industry have improved their ability to work with such data, even though the
files are enormous and often have different database structures and
technical characteristics.

6. Applcns of unstructure data.

Ans:
7. Mpp versus smp. What is cap theorem
Ans: MPP Databases

MPP database searches are performed by each processor on the

computers where segments of the database are stored. MPP databases
can be expanded by adding new CPUs. MPP databases are a form of
linear scalable database or parallel database. Spreading data across more
systems in thinner slices results in faster database searches. Performance
of an MPP system is linear, increasing roughly in proportion to the number
of nodes. MPP nodes are managed as a single computer. SQL originated
as a means of processing data across MPP databases. Cognos Business
Intelligence and Teradata software run on MPP databases.

SMP Databases

SMP databases share software, input / output resources and memory

disks. Symmetric Multiprocessor databases generally use one CPU to
perform database searches. While Symmetrical Multiprocessors can have
hundreds of CPUs, they are most commonly configured with 2, 4, 8 or 16.
Memory is the primary constraint on SMP databases. SMP databases can
run on more than one server, though they will share other resources; this is
known as a called a clustered configuration. SMP databases assign tasks
to a single CPU, regardless of how many are in the database. SMP
databases have lower fault tolerance and efficiency due to their reliance on
shared resources. SMP databases have lower administrative costs than
MPP. Oracle and Sybase run on SMP databases.

MPP vs SMP Databases

An MPP database sends the same query to each CPU in the MPP where it
searches the data. When two MPP databases are connected, the search
time will be almost half that of a similarly sized SMP database. The search
time is not exactly half since there are delays as data travels between the
MPP nodes. High speed processors used in an SMP database can be cost
competitive with MPP systems.

Uses

When a company runs its payroll, records labor time card entries or saves
product data in a drawing database on a single server, it is using an SMP
database. SMP databases are used for hosting small Web sites and email
servers. MPP databases are commonly used for data warehousing. MPP
databases are also used for large scale data processing and data mining.

CAP theorem

The CAP theorem, also known as Brewer’s theorem, states that it is

impossible for any distributed database system to provide more than two of
the following properties together:

● Consistency
● Availability
● Partition tolerance
CAP theorem

With the advances in parallel processing and distributed systems, it is more

common to expand horizontally or have more machines, and the CAP
theorem is the backbone of such architecture. Let’s explore the
characteristics of the CAP theorem in detail.

Consistency

A consistent system is one in which all nodes see the same data at the
same time. In other words, if we perform read operations after multiple
write operations, then a consistent system should return the same value for
all the read operations and the most recent write operation.

Note that consistency, as defined in the CAP theorem, is quite different

from the consistency guaranteed in ACID database transactions.
Availability

A highly available distributed system is one that remains operational 100%

of the time. Every request made should be accepted and receive a
(non-error) response. Note: It is not necessary for the response to contain
the most recent write value (i.e. the system does not need to be consistent,
but it should be available all the time).

Partition Tolerance

It states that a system should continue to run even if the connection

between nodes delays or breaks. Note: This doesn’t mean nodes have
gone down. Nodes are up but can’t communicate.

Let’s say that we two nodes (N1 and N2) and both are connected. Now
assume that the network connecting both the nodes goes down (network
gets partitioned). Both nodes N1 and N2 are up and running fine, but the
updates happening at node N1 can no longer reach node N2 and vice
versa.

Partition tolerance is more of a necessity than an option in modern

distributed systems, hence we cannot avoid the “P” in CAP. So we have to
choose either consistency or availability.
8. Advantages and disadvantages of smp over mpp
Ans:

9. Hadoop architecture with read and write anatomy.

Ans: Anatomy of File Read in HDFS

Let’s get an idea of how data flows between the client interacting with
HDFS, the name node, and the data nodes with the help of a diagram.
Consider the figure:

Step 1: The client opens the file it wishes to read by calling open() on the
File System Object(which for HDFS is an instance of Distributed File
System).

Step 2: Distributed File System( DFS) calls the name node, using remote
procedure calls (RPCs), to determine the locations of the first few blocks in
the file. For each block, the name node returns the addresses of the data
nodes that have a copy of that block. The DFS returns an
FSDataInputStream to the client for it to read data from.
FSDataInputStream in turn wraps a DFSInputStream, which manages the
data node and name node I/O.

Step 3: The client then calls read() on the stream. DFSInputStream, which
has stored the info node addresses for the primary few blocks within the
file, then connects to the primary (closest) data node for the primary block
in the file.

Step 4: Data is streamed from the data node back to the client, which calls
read() repeatedly on the stream.

Step 5: When the end of the block is reached, DFSInputStream will close
the connection to the data node, then finds the best data node for the next
block. This happens transparently to the client, which from its point of view
is simply reading an endless stream. Blocks are read as, with the
DFSInputStream opening new connections to data nodes because the
client reads through the stream. It will also call the name node to retrieve
the data node locations for the next batch of blocks as needed.

Step 6: When the client has finished reading the file, a function is called,
close() on the FSDataInputStream.

Anatomy of File Write in HDFS

Next, we’ll check out how files are written to HDFS. Consider the figure 1.2
to get a better understanding of the concept.
Step 1: The client creates the file by calling create() on
DistributedFileSystem(DFS).

Step 2: DFS makes an RPC call to the name node to create a new file in
the file system’s namespace, with no blocks associated with it. The name
node performs various checks to make sure the file doesn’t already exist
and that the client has the right permissions to create the file. If these
checks pass, the name node prepares a record of the new file; otherwise,
the file can’t be created and therefore the client is thrown an error i.e.
IOException. The DFS returns an FSDataOutputStream for the client to
start out writing data to.

Step 3: Because the client writes data, the DFSOutputStream splits it into
packets, which it writes to an indoor queue called the info queue. The data
queue is consumed by the DataStreamer, which is liable for asking the
name node to allocate new blocks by picking an inventory of suitable data
nodes to store the replicas. The list of data nodes forms a pipeline, and
here we’ll assume the replication level is three, so there are three nodes in
the pipeline. The DataStreamer streams the packets to the primary data
node within the pipeline, which stores each packet and forwards it to the
second data node within the pipeline.

Step 4: Similarly, the second data node stores the packet and forwards it to
the third (and last) data node in the pipeline.

Step 5: The DFSOutputStream sustains an internal queue of packets that

are waiting to be acknowledged by data nodes, called an “ack queue”.

Step 6: This action sends up all the remaining packets to the data node
pipeline and waits for acknowledgments before connecting to the name
node to signal whether the file is complete or not.

HDFS follows Write Once Read Many models. So, we can’t edit files that
are already stored in HDFS, but we can include it by again reopening the
file. This design allows HDFS to scale to a large number of concurrent
clients because the data traffic is spread across all the data nodes in the
cluster. Thus, it increases the availability, scalability, and throughput of the
system.

10. Processing data with hadoop

Ans:
11. Analysing hadoop map reduce with weather data example
Ans:
12. Hadoop ecosystem with neat diagram.
Ans:
Hadoop Ecosystem

Overview: Apache Hadoop is an open source framework intended to make

interaction with big data easier, However, for those who are not acquainted
with this technology, one question arises that what is big data ? Big data is
a term given to the data sets which can’t be processed in an efficient
manner with the help of traditional methodology such as RDBMS. Hadoop
has made its place in the industries and companies that need to work on
large data sets which are sensitive and needs efficient handling. Hadoop is
a framework that enables processing of large data sets which reside in the
form of clusters. Being a framework, Hadoop is made up of several
modules that are supported by a large ecosystem of technologies.

Introduction: Hadoop Ecosystem is a platform or a suite which provides

various services to solve the big data problems. It includes Apache projects
and various commercial tools and solutions. There are four major elements
of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common. Most of
the tools or solutions are used to supplement or support these major
elements. All these tools work collectively to provide services such as
absorption, analysis, storage and maintenance of data etc.

Following are the components that collectively form a Hadoop ecosystem:

● HDFS: Hadoop Distributed File System

● YARN: Yet Another Resource Negotiator
● MapReduce: Programming based Data Processing
● Spark: In-Memory data processing
● PIG, HIVE: Query based processing of data services
● HBase: NoSQL Database
● Mahout, Spark MLLib: Machine Learning algorithm libraries
● Solar, Lucene: Searching and Indexing
● Zookeeper: Managing cluster
● Oozie: Job Scheduling

13. Flow of big data Analytics.

Ans:

Lecture-1 Big Data
No ratings yet
Lecture-1 Big Data
15 pages
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
No ratings yet
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
12 pages
Case - Data Driven Decision Making
No ratings yet
Case - Data Driven Decision Making
15 pages
Data Mining Primer
No ratings yet
Data Mining Primer
15 pages
Daftar Isi Modul Data Science
100% (1)
Daftar Isi Modul Data Science
56 pages
Strategic Intent: by Gray and C.K Parhalad
No ratings yet
Strategic Intent: by Gray and C.K Parhalad
23 pages
Marketing Data Management Best Practices
No ratings yet
Marketing Data Management Best Practices
16 pages
Strategies for Emerging Market Giants
100% (1)
Strategies for Emerging Market Giants
20 pages
STWS Business Model Archetypes
No ratings yet
STWS Business Model Archetypes
1 page
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
No ratings yet
Semester: 3 Course Name: Marketing Analytics Course Code: 18JBS315 Number of Credits: 3 Number of Hours: 30
4 pages
Big Data Analytics Limitations Explained
No ratings yet
Big Data Analytics Limitations Explained
9 pages
Predictive Analytics in Parts Marketing
No ratings yet
Predictive Analytics in Parts Marketing
1 page
Case Study-Retail Analytics
100% (1)
Case Study-Retail Analytics
11 pages
BPCL Case
No ratings yet
BPCL Case
26 pages
Marketing Analytics - Challenges and Opportunities - Samuel Bird - McGill MBA Japan 2016 - 20151210
No ratings yet
Marketing Analytics - Challenges and Opportunities - Samuel Bird - McGill MBA Japan 2016 - 20151210
76 pages
BPCL
No ratings yet
BPCL
23 pages
IT Solutions for Restaurant Challenges
100% (1)
IT Solutions for Restaurant Challenges
2 pages
Big Data Analytics: A Literature Review Paper: Lecture Notes in Computer Science August 2014
No ratings yet
Big Data Analytics: A Literature Review Paper: Lecture Notes in Computer Science August 2014
16 pages
Case Study Tips & Frameworks - YLS
No ratings yet
Case Study Tips & Frameworks - YLS
47 pages
Marketing Analytics New
No ratings yet
Marketing Analytics New
66 pages
BEM2044 W1 Introduction To Qualitative Marketing Research and Research Philosophy-1
No ratings yet
BEM2044 W1 Introduction To Qualitative Marketing Research and Research Philosophy-1
23 pages
Understanding Consumer Behavior
No ratings yet
Understanding Consumer Behavior
17 pages
CLV Reading1
No ratings yet
CLV Reading1
28 pages
Marketing Mix Modeling
0% (1)
Marketing Mix Modeling
9 pages
Walmart's Strategic Management Overview
No ratings yet
Walmart's Strategic Management Overview
30 pages
TRPM Case Study - Mary Kay Cosmetics (Sales Force Incentives)
No ratings yet
TRPM Case Study - Mary Kay Cosmetics (Sales Force Incentives)
3 pages
Yesterday, Today and Tommorrow of Big Data
No ratings yet
Yesterday, Today and Tommorrow of Big Data
9 pages
Eastern Lotus Bank Customer Experience Analysis
No ratings yet
Eastern Lotus Bank Customer Experience Analysis
8 pages
Business Analytics Expert Profile
No ratings yet
Business Analytics Expert Profile
9 pages
Marketing Database Analytics Guide
No ratings yet
Marketing Database Analytics Guide
22 pages
B2B Marketing - Marketing Intelligence
No ratings yet
B2B Marketing - Marketing Intelligence
24 pages
B.A. 1st Notes PDF
No ratings yet
B.A. 1st Notes PDF
64 pages
Modern Marketing Data Capabilities
No ratings yet
Modern Marketing Data Capabilities
28 pages
Strategies for Emerging Markets Analysis
100% (1)
Strategies for Emerging Markets Analysis
40 pages
From Big Data To Knowledge
No ratings yet
From Big Data To Knowledge
33 pages
Big Data MBA Driving Business Strategies With Data Science 1st Edition 2015 Wiley (PRG)
No ratings yet
Big Data MBA Driving Business Strategies With Data Science 1st Edition 2015 Wiley (PRG)
44 pages
CRM Initiatives at 3M PDF
No ratings yet
CRM Initiatives at 3M PDF
10 pages
MOS PGPM Individual Assignment 2
No ratings yet
MOS PGPM Individual Assignment 2
1 page
Infosys Technologies LTD: Growing Share of A Customer Business
No ratings yet
Infosys Technologies LTD: Growing Share of A Customer Business
14 pages
Data Warehouse Design for X-Mart Sales
No ratings yet
Data Warehouse Design for X-Mart Sales
14 pages
Entrepreneurship Development A4
No ratings yet
Entrepreneurship Development A4
44 pages
Motivation and Consumer Needs Analysis
No ratings yet
Motivation and Consumer Needs Analysis
34 pages
Business and Organizational Strategy of TCS
100% (1)
Business and Organizational Strategy of TCS
27 pages
Ucc & BM of Osmania University (MBA)
No ratings yet
Ucc & BM of Osmania University (MBA)
22 pages
Launch of Tata Nano EV: Visual Presentation On Soft Launch at Social-Media Platforms
No ratings yet
Launch of Tata Nano EV: Visual Presentation On Soft Launch at Social-Media Platforms
10 pages
Digital Marketing Assignment Guide
No ratings yet
Digital Marketing Assignment Guide
6 pages
Project Assignment.2024
No ratings yet
Project Assignment.2024
2 pages
Business Analytics Report (Hospitality Industry) 1
No ratings yet
Business Analytics Report (Hospitality Industry) 1
8 pages
Traditional Conjoint Analysis With Excel
No ratings yet
Traditional Conjoint Analysis With Excel
9 pages
Customer Churn Prediction Using Big Data Analytics
50% (2)
Customer Churn Prediction Using Big Data Analytics
41 pages
Hello Alfred: Service Overview & Pricing
No ratings yet
Hello Alfred: Service Overview & Pricing
19 pages
Managing Sales Territories and Quotas
No ratings yet
Managing Sales Territories and Quotas
7 pages
Business Analytics Overview
No ratings yet
Business Analytics Overview
4 pages
Makemytrip: Redefining Business With Big Data Analytics: Case Study
No ratings yet
Makemytrip: Redefining Business With Big Data Analytics: Case Study
5 pages
A Case Study On Maruti Suzuki
No ratings yet
A Case Study On Maruti Suzuki
8 pages
Crisis in The Automobile Industry - Group Discussion Ideas
No ratings yet
Crisis in The Automobile Industry - Group Discussion Ideas
7 pages
Coke vs Pepsi: 2010 Industry Analysis
No ratings yet
Coke vs Pepsi: 2010 Industry Analysis
5 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Unit 2 Notes Data Analytics
No ratings yet
Unit 2 Notes Data Analytics
11 pages
Department of Information Science & Engineering: Collaborative Programming Platform For Teaching and Learning
No ratings yet
Department of Information Science & Engineering: Collaborative Programming Platform For Teaching and Learning
21 pages
100 MCQ On Rectifiers and Converters - RK Rajput PDF
No ratings yet
100 MCQ On Rectifiers and Converters - RK Rajput PDF
18 pages
Networking Part 2
No ratings yet
Networking Part 2
17 pages
Stock Market Prediction with ML
No ratings yet
Stock Market Prediction with ML
20 pages
Stock Prediction Techniques Survey
No ratings yet
Stock Prediction Techniques Survey
16 pages
Philippine Agriculture Extension Insights
No ratings yet
Philippine Agriculture Extension Insights
50 pages
Mauritius Aquaculture Expansion and Shark Monitoring
No ratings yet
Mauritius Aquaculture Expansion and Shark Monitoring
7 pages
Cost Accounting
No ratings yet
Cost Accounting
9 pages
Practice Test 1: Performance Indicators (Pis) Division Target School Performance
No ratings yet
Practice Test 1: Performance Indicators (Pis) Division Target School Performance
6 pages
2.13inch E-Paper (B) V4 Specification
No ratings yet
2.13inch E-Paper (B) V4 Specification
35 pages
Define Refractory Materials: A. Based On Chemical Composition
No ratings yet
Define Refractory Materials: A. Based On Chemical Composition
16 pages
Ejercicios de Pasado Simple en Ingles PDF
No ratings yet
Ejercicios de Pasado Simple en Ingles PDF
4 pages
Vendor Questionnaire Template
No ratings yet
Vendor Questionnaire Template
6 pages
What Is A Long Report?
No ratings yet
What Is A Long Report?
4 pages
Dialogue Exercises on English Grammar
No ratings yet
Dialogue Exercises on English Grammar
100 pages
Accident Care Brochure 15 ST
No ratings yet
Accident Care Brochure 15 ST
2 pages
Mike Reinold
0% (1)
Mike Reinold
20 pages
Huawei OSN902 200G DCI Datasheet
No ratings yet
Huawei OSN902 200G DCI Datasheet
2 pages
Cardiac Surgery Conference 2023
No ratings yet
Cardiac Surgery Conference 2023
8 pages
EMCP 4.2 Wiring - Diagrams
100% (2)
EMCP 4.2 Wiring - Diagrams
2 pages
Configure The Extract Bulk Data Option in An Integration
No ratings yet
Configure The Extract Bulk Data Option in An Integration
2 pages
Total Resorts
No ratings yet
Total Resorts
29 pages
Natural Fertilizers Catalog 1 2
No ratings yet
Natural Fertilizers Catalog 1 2
2 pages
Simple Past Tense
No ratings yet
Simple Past Tense
17 pages
Jamaica Driver's Licence Application Form
No ratings yet
Jamaica Driver's Licence Application Form
1 page
Board Resolution.......
No ratings yet
Board Resolution.......
2 pages
Hana Yori Dango - Wish (Revised)
No ratings yet
Hana Yori Dango - Wish (Revised)
3 pages
Pump Accessories: All 316 SS Construction
No ratings yet
Pump Accessories: All 316 SS Construction
6 pages
SME Financial Practices in Santiago
No ratings yet
SME Financial Practices in Santiago
73 pages
Quote Estimate PDF
No ratings yet
Quote Estimate PDF
3 pages
Quiz Anatomy Part 1 of 3 PDF Vertebra Autonomic Nervous System
No ratings yet
Quiz Anatomy Part 1 of 3 PDF Vertebra Autonomic Nervous System
1 page
Vitamin Analysis For The Health and Food Sciences Second Edition Ronald R. Eitenmiller No Waiting Time
No ratings yet
Vitamin Analysis For The Health and Food Sciences Second Edition Ronald R. Eitenmiller No Waiting Time
85 pages
New Articles of Association of Kenya Airways
No ratings yet
New Articles of Association of Kenya Airways
47 pages
Real Estate Appraisal From Value To Worth 1st Edition Sarah Sayce 2025 PDF Download
No ratings yet
Real Estate Appraisal From Value To Worth 1st Edition Sarah Sayce 2025 PDF Download
122 pages
CBDRRM Plan 2017-2021 Samara
No ratings yet
CBDRRM Plan 2017-2021 Samara
58 pages

Understanding Big Data Characteristics

Uploaded by

Understanding Big Data Characteristics

Uploaded by

1. What is big data? Characteristics of big data and significance.

Variety of Big Data refers to structured, unstructured, and semistructured

Velocity essentially refers to the speed at which data is being created in

Significance of big data

6. Using Big Data Analytics to Solve Advertisers Problem and Offer

7. Big Data Analytics As a Driver of Innovations and Product

Another huge advantage of big data is the ability to help companies

1. Descriptive analytics​ : Descriptive analytics is introductory,

Example of descriptive analytics

● Website traffic fell drastically in Q3.

2. Diagnostic analytics​: Diagnostic analytics is retrospective as well, but

Example of diagnostic analytics

Using our previous example, we now understand where the problem

The analysis tells us:

● Website traffic fell during a search engine algorithm update.

Example of predictive analytics

The diagnostic analysis showed us a variety of issues, now it’s time

4. Prescriptive analytics​: Prescriptive analytics is the final type of

The analysis tells us:

● Publish double the amount of web content to reach traffic goals.

3. Why not big data analytics?

● Social Media Analytics

The technological applications of big data comprise of the following

For businesses whose operations involve any type of claims or transaction

● Call Center Analytics

Now we turn to the customer-facing Big Data application examples, of

The use of customer data invariably raises privacy issues. By uncovering

A biotechnology firm uses sensor data to optimize crop efficiency. It plants

Perhaps more impressive, people now carry facial recognition technology

Traditionally, the healthcare industry has lagged behind other industries in

Health care stakeholders now have access to promising new threads of

6. Applcns of unstructure data.

MPP database searches are performed by each processor on the

SMP databases share software, input / output resources and memory

MPP vs SMP Databases

The CAP theorem, also known as Brewer’s theorem, states that it is

With the advances in parallel processing and distributed systems, it is more

Note that consistency, as defined in the CAP theorem, is quite different

A highly available distributed system is one that remains operational 100%

It states that a system should continue to run even if the connection

Partition tolerance is more of a necessity than an option in modern

9. Hadoop architecture with read and write anatomy.

Anatomy of File Write in HDFS

Step 5: The DFSOutputStream sustains an internal queue of packets that

10. Processing data with hadoop

Overview: Apache Hadoop is an open source framework intended to make

Introduction: Hadoop Ecosystem is a platform or a suite which provides

Following are the components that collectively form a Hadoop ecosystem:

● HDFS: Hadoop Distributed File System

13. Flow of big data Analytics.

You might also like

1. Descriptive analytics : Descriptive analytics is introductory,

2. Diagnostic analytics: Diagnostic analytics is retrospective as well, but

4. Prescriptive analytics: Prescriptive analytics is the final type of