0% found this document useful (0 votes)
16 views

Bda Unit1

Uploaded by

pubgmobilesd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Bda Unit1

Uploaded by

pubgmobilesd23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

BIGDATA ANALYTICS
UNIT-I

INTRODUCTION TO BIGDATA
1. INTRODUCTION TO BIGDATA PLATFORM
2. CHALLENGES OF CONVENTIONAL SYSTEMS
3. INTELLIGENT DATA ANALYSIS
4. NATURE OF DATA
5. ANALYTIC PROCESSES AND TOOLS
6. ANALYSIS Vs REPORTING
7. MODERN DATA ANALYICS TOOLS
Technical Terms

Terms Literal meaning Technical Meaning

Velocity Particular direction Speed at which the data is


created

Variability Vary or change Changing the data based on


different level

Veracity Accuracy Quality of the data

Prescriptive A way of thinking about and To recommend the optimal


understanding something course of action or strategy
moving forward

Ad-hoc Used for a special and Reporting to identify patterns


immediate purpose or trends in data

Web logs The activity of having a It is essentially an online


website containing a diary or Journal whose entries are
journal. added and viewed via the
web.

Value Something is worth Extract useful data.


Correlation A statistical measure that Quantifies the strength and
expresses the extent to which direction of that relationship
two variables are linearly
related
Canned reports Preserved Things Predefined reports that
provide information about
various construction
processes

1
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

1. INTRODUCTION TO BIG DATA PLATFORM

Definition:
Big Data:
 Big Data consists of very large volumes of heterogeneous data that it being generated
often at high speed.
 It cannot managed and processed using traditional data management tools and
applications at hand.
Characteristics of Big Data:
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value or variability

1. Volume
 Volume is a huge amount of data.
 It refers to the size of data that are working with.
 This data is spread across different places, in different formats, in large volumes
ranging from Gigabytes to Terabyte, Petabytes up to Yottabyte and even more.
 The data is not only generated by humans, but large amount of data is being generated
by machines and it surpasses human generated data.

2
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2
billion GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of data.

2. Velocity:
 Velocity refers to the high speed of accumulation of data.
 In Big Data velocity data flows in from sources like machines, networks, social
media, mobile phones etc.

 There is a massive and continuous flow of data.


 This determines the potential of data that how fast the data is generated and
processed to meet the demands.
 In different fields and different areas of technology, data getting at different speeds.
 Velocity refers Batch,Monthly,weekly,Daily,Hourly,Real Time
Example: There are more than 3.5 billion searches per day are made on Google. Also,
Facebook users are increasing by 22%(Approx.) year by year.

3. Variety:
 It refers to nature of data that is structured, semi-structured and unstructured data.
 It also refers to heterogeneous sources.
 Variety is basically the arrival of data from new sources that are both inside and
outside of an enterprise.
 It can be structured, semi-structured and unstructured.
o Structured data: This data is basically an organized data. It generally refers
to data that has defined the length and format of data.
o Semi- Structured data: This data is basically a semi-organized data. It is
generally a form of data that do not conform to the formal structure of data.
Log files are the examples of this type of data.
o Unstructured data: This data basically refers to unorganized data. It
generally refers to data that doesn’t fit neatly into the traditional row and
column structure of the relational database. Texts, pictures, videos etc. are
the examples of unstructured data which can’t be stored in the form of rows
and columns.
 Variety may be text, web logs, sensor data, legacy Docs,Images,Audio,Video

3
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

4. Variability and value:


 This is factor which can be a problem for those who analyze the data.
 This refers to the inconsistently which can be shower by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
 Value represents extract useful data.

5. Veracity:
 The quality of the data being captured can vary greatly.
 Accuracy of analysis depends on the veracity of the source data.

6. Complexity:
 Data management become a very complex process, especially when large volumes
of data come from multiple sources.
 These data need to be linked, connected and correlated in order to be able to grasp
the information that is supposed to be conveyed by these data .this is termed as the
“complexity” of big data.

Sources of Big Data


1. Enterprise Data:
 There are large volumes of data in enterprises in different formats.
 Common format include flat files, emails, word documents, spread sheets,
presentations, google forms, pdf documents etc.
 This data that is spread across the organization in different formats is referred to
as Enterprise data.

2. Transactional Data:
 Every enterprise has some kind of applications which involve performing
different kinds of transactions like web applications, Mobile Applications, CRM
Systems and many more.

4
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

 To support the transactions in these applications, there are usually one or more
relational database as a backend infrastructure.
 This is mostly structured data and is referred to as Transactional data.

3. Social Media:

 There is a large amount of data getting generated on social networks like


Twitters, Facebook etc.,
 The social networks usually involve mostly unstructured data formats which
includes text, images, audio, videos etc.,
 This category of data source is referred to as social media.

4. Activity Generated:
 There is a large amount of data being generated by machines which surpasses
the data volume generated by humans.
 These includes data from medical devices, sensor data, surveillance
videos,satellites,cell phone towers, industrial machinery, and other data
generated mostly by machines.

5. Public data:
 Data published by governments, research data published by research institute,
data from weather and metrological departments, Wikipedia available to the
public.

6. Archives:
 Archived includes scanned documents, scanned copies of agreements, records of
ex-employees /completed projects, banking transactions older than the
compliance regulations.
 This type of data, which is less frequently accessed is referred to as Archive
Data.
 Organizations archives a lot of data which is either not required anymore or is
vary rarely required.

Definition:
Big Data Analytics:
 Big Data Analytics is the process of examining large data set containing a variety of
data types i.e. Big Data. To uncover hidden patterns, unknown correlations, market
trends, customer preferences and other useful business information.
 The analytical findings can lead to more effective marketing, new revenue
opportunities, better customer service, improved operational efficiency, competitive
advantages and other business benefits.

5
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Goal of Big Data Analytics:


 To help companies make more informed business decisions by enabling data
scientists, predictive modellers and other analytics professionals to analyse large
volumes of transaction data as well as other forms of data that may be untapped by
conventional business intelligence(BI) programs.

Methods (or) Types of big data analytics:


1. Descriptive analytics
 The "what happened" stage of data analysis. Here, the focus is on summarizing and
describing past data to understand its basic characteristics.
2. Diagnostic analytics
 The “why it happened” stage. By delving deep into the data, diagnostic analysis
identifies the root patterns and trends observed in descriptive analytics.
3. Predictive analytics
 The “what will happen” stage. It uses historical data, statistical modeling and machine
learning to forecast trends.
4. Prescriptive analytics
 Describes the “what to do” stage, which goes beyond prediction to provide
recommendations for optimizing future actions based on insights derived from all
previous.

Referential model of Big Data Analytics:

6
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

2. CHALLENGES OF CONVENTIONAL SYSTEMS

1. Data challenges
2. Process challenges
3. Management challenges

1. Data Challenges:
Volume:
 The volume of data, especially machine-generated data, is exploding, how fast that
data is growing every year, with new sources of data that are emerging.
 The challenge is how to deal with the large size of data?
Ex: According to the latest estimates, 402.74 million terabytes of data are created each day.
Variety:
 More than 80% of today’s information is unstructured and it is typically too big to
manage effectively. What does it mean?
 A lot of data is unstructured, or has complex structure that hard to represent in rows
and columns.
 Organizations want to able to combine all this data and analyse it together in new
ways.
Ex: More than one customer in different industries whose applications combine
geospatial vessel location data with weather and new data to make real-time mission
critical decisions.
 Data come from sensors, smart devices and social collaboration technologies.
 Data are not only structured, but raw, semi structured, unstructured data from web
pages, web log files, search indexes, e-mails, documents, sensor data etc.
 Semi structured web data such A/B testing, sessionization, bot detection and pathing
analysis all require powerful analytics on many petabytes of semi-structured web
data.
The challenge is how to handle multiple types, sources and formats?
Velocity:
 How to react to the flood of information in the time required by the application?
Veracity:
 If data is high quality in one country, and poor in another ,does the Aid response
skew ‘unfairly’ toward the well-surveyed country or toward the educated guesses
being made for the poorly surveyed one?
Several challenges:
1. How can we cope with uncertainty, imprecision, missing values, misstatements or
untruths?
2. How good is the data? How broad is the coverage?
3. How fine is the sampling resolution? How timely are the readings?
4. How well understood are the sampling biases?
5. Is there data available, at all?
Data comprehensiveness:
 Are there areas without coverage? What are the implications?

7
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Scalability:
 Techniques like social graph analysis, for instance leveraging the influencers in a
social network to create better user experience are hard problems to solve at scale.
 All of these problems combined create a perfect storm of challenges and
opportunities to create faster, cheaper and better solutions for big data analytics than
traditional approaches can solve.
2. Process Challenges:
 Capturing data.
 Aligning data from different sources.
 Transforming the data into a form suitable for analysis.
 Modeling it, whether mathematically or through some form of simulation.
 Understanding the output, visualizing and sharing the results, and how to display
complex analytics on an iPhone or a mobile device.
3. Management challenges:
Main challenges are:
 Data privacy
 Security
 Governance
 Ethical
The challenges are: Ensuring that data are used correctly.
 It is another most important challenge with Big Data. This challenge includes
sensitive, conceptual, technical as well as legal significance.
 Most of the organizations are unable to maintain regular checks due to large
amounts of data generation. However, it should be necessary to perform security
checks and observation in real time because it is most beneficial.
 There is some information of a person which when combined with external large
data may lead to some facts of a person which may be secretive and he might not
want the owner to know this information about that person.
 Some of the organization collects information of the people in order to add value to
their business. This is done by making insights into their lives that they’re unaware
of.

3. INTELLIGENT DATA ANALYSIS


Data:

 A collection of numerical values recording the magnitudes of various attributes of


the objects.
 Breaking up of any data into parts i.e., the examination of these parts to know about
their nature, proportion, function, interrelationship, etc.
Definition:
Data analysis:
 Data analysis is the process of computing various summaries and derived values from
the given collection of data.
 It describes the processing of those data.

8
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

 Data analysis is the process of a combination of extracting data from data set,
analyzing, classification of data, organizing, reasoning, and so on.
A process in which the analyst moves laterally and recursively between three modes:

1. Describing data (profiling, correlation, summarizing)


2. Assembling data (scrubbing, translating, synthesizing, filtering)
3. Creating data (deriving, formulating, simulating).
 It is a sense of making data. The process of finding and identifying the meaning of
data.
Types of analysis:
Descriptive analysis:
 It is aimed at making a statement about the data set to hand.
Ex: answer and questions about population
What is the proportion of females?
What is the proportion of males?
Inferential anlaysis:
 It is aimed at trying to draw conculsions which have more general validity.
Ex: what is the proportion of females in next year?
The Goal of Data Analysis: The aim of data analysis is to make better decisions.

INTELLIGENT DATA ANALYSIS (IDA)


 It is the repeated application of methods, as one attempts to lease out the structure, to
understand what is going on, and to refine the questions that are researchers are
seeking to answer, requires painstaking care and above all intelligence.

Importance of IDA:

 Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and
information.
 Intelligent data analysis discloses hidden facts that are not known previously and
provides potentially important information or facts from large quantities of data
(White, 2008).
 It also helps in making a decision.
 Based on statistics, machine learning, artificial intelligence, recognition of pattern,
and records and visualization technology mainly, IDA helps to obtain useful
information, necessary data and interesting models from a lot of data available online
in order to make the right choices.
 Intelligent data analysis helps to solve a problem that is already solved as a matter of
routine. If the data is collected for the past cases together with the result that was
finally achieved, such data can be used to revise and optimize the presently used
strategy to arrive at a conclusion.
 In certain cases, if some questions arise for the first time, and have only a little
knowledge about it, data from the related situations helps us to solve the new problem

9
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

or any unknown relationships can be discovered from the data to gain knowledge in
an unfamiliar area.

Steps Involved In IDA:

IDA, in general, includes three stages:

(1) Preparation of data;

(2) Data mining;

(3) Data validation and explanation

 The preparation of data involves opting for the required data from the related data
source and incorporating it into a data set that can be used for data mining.
 The main goal of intelligent data analysis is to obtain knowledge.
 It is challenging to choose suitable methods to resolve the complexity of the process.
 Regarding the term visualization, we have moved away from visualization to use the
term charting. The term analysis is used for the method of incorporating, influencing,
filtering and scrubbing the data, which certainly contains, but is not limited to
interrelating with their data through charts.

4. NATURE OF DATA

 Data should have specific items (values or facts), which must be identified.
 Specific items of data must be organized into a meaningful form.
 Data should have the functions to perform.
 The nature of data can be understood on the basis of the class to which it belongs.
 There is a large measure of cross-classification, e.g., all quantitative data are
numerical data, and most data are quantitative data.
With reference to the types of data; their nature in sciences is as follows:
1. Numerical data:
 All data in sciences are derived by measurement and stated in numerical values.
 Most of the time their nature is numerical. Even in semi quantitative data,
affirmative and negative answers are coded as '1' and '0' for obtaining numerical
data.
 Thus, except in the three cases of qualitative, graphic and symbolic data, the
remaining yield numerical data.
2. Descriptive data:
 Sciences are not known for descriptive data.
 However, qualitative data in sciences are expressed in terms of definitive
statements concerning objects.
 These may be viewed as descriptive data.
 Here, the nature of data is descriptive. Graphic and symbolic data: Graphic and
symbolic data are modes of presentation.

10
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

 They enable users to grasp data by visual perception.


 All qualitative data in social sciences can be descriptive; in nature.
 These can be in the form of definitive statements.
 However, if necessary, numerical values can be assigned to descriptive statements,
which may be reduced to numerical data.
3. Enumerative data:
 Most data in social sciences are enumerative in nature.
 However, they are refined with the help of statistical techniques to make them more
meaningful. They are known as statistical data.
 This explains the use of different scales of measurement whereby they are graded.
Properties of Data:
1) Amenability of use
2) Clarity
3) Accuracy
4) The quality of being the essence of the matter

5. ANALYTIC PROCESSES AND TOOLS


Big data analytics involves making “sense” out of large volumes of varied data that in its raw
from lacks a data model to define what each element means in the context of the others.

New issues on this new type of analysis:


1. Discovery:
 In many cases you don’t really know what you have and how different data sets relate
to each other.

 It must figure it out through a process of exploration and discovery.

2. Iteration:
 The nature of iteration is that it sometimes leads you down a path that turns out to be a
dead end.
 Many analysts and industry experts suggest that you start with small, well-defined
projects, learn from each iteration, and gradually move on to the next idea or field of
inquiry.

3. Flexible Capacity:
 Because of the iterative nature of big data analysis, be prepared to spend more time
and utilize more resources to solve problems.

4. Mining and Predicting:


 Big data analysis is not black and white.

 As you mine the data to discover patterns and relationships, predictive analytics can
yield the insights that you seek.

11
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

5. Decision Management:
 Consider the transaction volume and velocity.

 If you are using big data analytics to drive many operational decisions the you need to
consider how to automate and optimise the implementation of all those actions.

Five key approaches to analysing big data and generating insight:

1. Discovery tools:
 These are useful throughout the information lifecycle for rapid, intuitive exploration
and analysis of information from any combination of structured and unstructured
sources.
 These tools permit analysis alongside traditional BI source systems.

 Because there is no need for up-front modelling, users can draw new insights, come to
meaningful conclusions and make informed decisions quickly.

2. BI tools:
 These are important for reporting, analysis and performance management, primarily
with transactional data from data warehouses and production information systems.
 BI tools provide comprehensive capabilities for business intelligence and performance
management, including enterprise reporting, dashboards, ad-hoc analysis, scorecards,
and what-if scenario analysis on an integrated, enterprise scale platform.

3. In data –base analytics:


 It include a variety of techniques for finding patterns and relationships in our data.

 Because these techniques are applied directly within the database, you eliminate data
movement to and from other analytical servers, which accelerates information cycle
times and reduces total cost of owner ship.

4. Hadoop is useful for pre-processing data:


 To identity macro trends or find nuggets of information, such as out-of-range values.it
enables businesses to unlock potential value from new data using inexpensive
commodity servers.

5. Decision management:
 It includes predictive modelling, business rules, and self-learning to take informed
action based on the current context.

12
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Tools:

Data Storage Tools:

HDFS (Hadoop Distributed File System):


 It is the primary storage system used by Hadoop applications.
 This open source framework works by rapidly transferring data between nodes.

 It quickly replicates data onto several nodes in a cluster in order to provide reliable,
fast performance.

HBase:
 HBase is the non-relational data store for Hadoop.
 HBase is a data model that is similar to Google’s big table. It is an open source,
distributed database developed by Apache software foundation written in Java.

 HBase is an essential part of our Hadoop ecosystem.


 Operating system:OS independent.

Hive:
 Hive is a data warehouse system which is used to analyze structured data.
 It is built on the top of Hadoop.

 It was developed by Facebook.


 Hive provides the functionality of reading, writing, and managing large datasets
residing in distributed storage.

 It runs SQL like queries called HQL (Hive query language) which gets internally
converted to MapReduce jobs.

Sqoop:
 Tool designed for efficiently transferring bulk data between Apache Hadoop and
structured data stores such as relational databases

Flume:
 Distributed, reliable, and available service for efficiently collecting, aggregating, and
moving large amounts of log data .
 It has a simple and very flexible architecture based on streaming data flows.

 It's quite robust and fall tolerant, and it's really tuneable to enhance the reliability
mechanisms, fail over, recovery, and all the other mechanisms that keep the cluster
safe and reliable.

13
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

 It uses simple extensible data model that allows us to apply all kinds of online
analytic applications

Oozie:
 Workflow scheduler system to manage ApacheHadoop jobs.
 Oozie Coordinator jobs!

 Supports MapReduce, Pig, Apache Hive, and Sqoop, etc

ZooKeeper:
 ZooKeeper is a highly reliable distributed coordination kernel, which can be used for
distributed locking, configuration management, leadership election, and work queues.
 Zookeeper is a replicated service that holds the metadata of distributed applications.

Pig:
 High level programming on top of Hadoop MapReduce
 The language: Pig Latin Data analysis problems as data flows
 Originally developed at Yahoo 2006

14
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

6. ANALYSIS Vs REPORTING
(FROM WEBANALYTICS ACTION HERO)
Analysis:
 Analysis means to translate information into insights.
 The process of exploring data and reports in order to extract meaningful insights,
which can be used to better understand and improve business information.
Insight:
 Insight refers to an analyst or business user discovering a pattern in data or a
relationship between variables that they didn't previously know existed.
Reporting:
 Reporting means to translate raw data into information.
 The process of organizing data into informational summaries in order to monitor
how different areas of a business are performing.
 Reporting and analysis different in terms of their purpose, tasks, outputs, delivery
and value
 Reporting and analysis is to increase sales and reduce costs.
 Both reporting and analysis play roles in influencing and driving the actions which
lead to greater value in organizations.

Purpose:

Analysis Reporting
Provides answers Provides data
Provides what is needed Provides what is asked for
Is typically customized Is typically standardized
Involves a person Does not involve a person
Is extremely flexible Is fairly inflexible

Tasks:
 Reporting activities such as building, configuring, consolidating, organizing,
formatting, and summarizing.
 Analysis tasks are questioning, examining, interpreting, comparing and confirming.

15
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Outputs:
 On the surface, reporting and analysis deliverables may look similar with lots of
charts, graphs, trend lines, tables, and stats.
 The first is the overall approach Reporting generally follows a push approach, where
reports are passively pushed to users who are then expected to extract meaningful
insights and take appropriate actions for themselves (think self-serve).

The three main types of reporting are:


1. Canned reports
 These are the out-of-the-box and custom reports that you can access within the
analytics tool or which can also be delivered on a recurring basis to a group of end
users.
 Canned reports are fairly static with fixed metrics and dimensions.
 Some canned reports are more valuable than others, and a report's value may
depend on how relevant it is to an individual's role (SEO specialist versus web
producer).
2. Dashboards:
 These custom-made reports combine different KPIs (Key Performance Indicator)
and reports to provide a comprehensive, high-level view of business performance
for specific audiences.
 Dashboards may include data from various data sources and are also usually fairly
static.
3. Alerts:
 These conditional reports are triggered when data falls outside of expected ranges
or some other predefined criteria are met.
 Once people are notified of what happened, they can take appropriate action as
necessary.
 Analysis follows a pull approach, where the analyst actively pulls particular data to
answer specific business questions and provide recommended next steps with possible
outcomes.
 Informal analysis can occur whenever someone simply performs a mental assessment
of a report and makes a decision to act or not act based on the data.

16
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

In the case of analysis with actual deliverables, there are two main types:

1. Ad-hoc responses:
 Analysts receive requests to answer a variety of business questions, which may be
spurred by questions the reporting raised.
 Typically, these urgent requests are time sensitive and demand a quick turnaround.
 The analytics team may have to juggle multiple requests at the same time.
 As a result, the analyses cannot go as deep or wide as the analysts may like, and the
deliverable is a short and concise report, which may or may not include any specific
recommendations.
2. Analysis presentations:
 Some business questions are more complex in nature and require more time to
perform a comprehensive, deep-dive analysis.

These analysis projects result in a more formal deliverable, which includes two
important sections:
1. Key findings: The key findings highlight the most meaningful and actionable insights
gleaned from the analyses performed.
2. Recommendations: The recommendations provide guidance on what actions to take based
on the analysis findings.

Delivery:
 Through the push model of reporting, recipients can access reports through an
analytics tool, intranet site, Microsoft Excel® spreadsheet, or mobile app.
 They can also have them scheduled for delivery into their mailbox, mobile device
(SMS), or FTP site.
 Because of the demands of having to provide data to multiple individuals and groups
at regular intervals, the building, refreshing, and delivering of reports is often
automated. It's a job for robots or computers, not human beings.
 On the other hand, analysis is all about human beings using their superior reasoning
and analytical skills to extract key insights from the data and form actionable
recommendations for their organizations.
 Although analysis can be "submitted" to decision makers, it is more effectively
presented person-to-person. In their book Competing on Analytics (Harvard Business
School Press, 2007), Thomas Davenport and Jeanne Harris emphasize the importance
of trust and credibility between the analyst and decision maker.
 Decision makers typically don't have the time or ability to perform analyses
themselves. With a "close, trusting relationship" in place, the executives will frame
their needs correctly, the analysts will ask the right questions, and the executives will
be more likely to take action on analysis they trust.
Value:
 Finally, you need to keep in mind the relationship between reporting and analysis in
driving value. Think of the data-driven decision-making stages (data > reporting >
analysis > decision > action > value)

17
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

 As a series of dominoes. If you remove a domino, it can be more difficult or


impossible to achieve the desired value.
 As you can see in Figure 2.5, the path starts with having the right data that is complete
and accurate. It doesn't matter how advanced your reporting or analysis is if you don't
have good, reliable data. While most companies have an abundance of reports, the
quality of those reports can still be an issue. Effective reporting gives a broad
audience of business users an important lens into the performance of the online
business. Reporting will rarely initiate action on its own as analysis is required to help
bridge the gap between data and action. With decision acting as the gatekeeper to
action, you usually need analysis to knock it over.

Figure 2.5 If you remove one of these dominoes, you won't be able to achieve the desired
Reporting and Analysis Comparison
Purpose Tasks Outputs Delivery Value

Reporting Monitor Build Canned reports Accessed Distills data into


and alert via tool

Configure Dashboards Scheduled information for


for further analysis
delivery

Consolidate Alerts Alerts company


to exceptions in
data

Organize

Format

Summarize

Analysis Interpret Question Ad hoc responses prepared provides deeper


and and insights into
recommend shared by business
actions analyst

18
MASTER OF COMPUTER APPLICATIONS BIG DATA ANALYTICS

Purpose Tasks Outputs Delivery Value

Examine Analysis

Interpret presentations Offers


(findings + recommendations
recommendations) to drive action

Compare

Confirm

7. MODERN DATA ANALYTIC TOOLS


 The range of tools available to the modern data analyst.
 The both statistical concepts ,covering such things as what ‘probability means’ the
notations of sampling and estimates based on samples, elements of inference, as
well as more recently developed tools of intelligent data analysis such as cross-
validation and bootstrapping.
 The models are closely related to methods for rule induction.
 A rule is a substructure of a model which recognizes a specific pattern in the
database and takes some action.
 From this perspective, such tools for data analysis are very much machine learning
tools.
Analytics tools:
 ThoughtSpot
 Mode
 Power BI
 Qlik Sense
 Tableau
 Apache Hadoop
 Apache SPARK
Important Questions:
1. Define Big Data Analytics.
2. What are types of big Data Analytics?
3. Explain about Big data Analytics tools
4. Give short notes on reporting and Analyzing.
5. Write down details about Intelligent Data Analysis.
6. Elaborate Nature of data.
7. Elucidate Characteristics of Big data.
8. Explain about modern analytics tools.

UNIT-I COMPLETED
Reference Book: “Web Analytics Action Hero”
Reference Link: “tutorial and Geeks and Geeks”

19

You might also like