0% found this document useful (0 votes)

1 views

u1 c clsrm

The document provides an extensive overview of Big Data, covering its types, history, architecture, and importance in various applications. It discusses Intelligent Data Analysis (IDA), its significance, and the steps involved in data analysis, along with different types of analytics including descriptive, predictive, and prescriptive analytics. Additionally, it highlights the advantages and challenges of Big Data analytics, and contrasts Hadoop with traditional relational database management systems (RDBMS) in terms of seek time, structure, normalization, and scaling.

Uploaded by

lolrofl102938

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

u1 c clsrm

Uploaded by

lolrofl102938

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Lecture-6

Big Data (KCS-061)

Unit 1: Introduction to Big Data
• Types of digital data
• History of Big Data innovation
• Introduction to Big Data platform, drivers for Big Data
• Big Data architecture and characteristics
• 5 Vs of Big Data
• Big Data technology components
• Big Data importance and applications
• Big Data features – security, compliance, auditing and protection
• Big Data privacy and ethics
• Big Data Analytics
• Challenges of conventional systems
• Intelligent data analysis, nature of data, analytic processes and tools, analysis vs
reporting, modern data analytic tools
Intelligent data analysis
• Data:
• Data is nothing but things known or anything that is assumed; facts from which conclusions can
be gathered.

• Data Analysis:
• Breaking up of any data into parts i.e., the examination of these parts to know about their nature,
proportion, function, interrelationship, etc.
• A process in which the analyst moves laterally and recursively between three modes: describing
data (profiling, correlation, summarizing), assembling data (scrubbing, translating, synthesizing,
filtering) and creating data (deriving, formulating, simulating).
• It is the process of finding and identifying the meaning of data.
• Importance of IDA:

• Intelligent Data Analysis (IDA) is one of the major issues in artificial intelligence and information.

• Intelligent data analysis discloses hidden facts that are not known previously and provides
potentially important information or facts from large quantities of data (White, 2008).

• It also helps in making a decision. Based on machine learning, artificial intelligence, recognition of
pattern, and records and visualization technology mainly, IDA helps to obtain useful information,
necessary data and interesting models from a lot of data available online in order to make the
right choices.

• Intelligent data analysis helps to solve a problem that is already solved as a matter of routine. If
the data is collected for the past cases together with the result that was finally achieved, such
data can be used to revise and optimize the presently used strategy to arrive at a conclusion.

• In certain cases, if some questions arise for the first time, and have only a little knowledge about
it, data from the related situations helps us to solve the new problem or any unknown
relationships can be discovered from the data to gain knowledge in an unfamiliar area.
• Steps Involved In IDA:

• IDA, in general, includes three stages:

1. Preparation of data
2. Data mining
3. Data validation and explanation
• The preparation of data involves opting for the required data from the related data source and
incorporating it into a data set that can be used for data mining.

• The main goal of intelligent data analysis is to obtain knowledge.

• Data analysis is the process of a combination of extracting data from data set, analyzing,
classification of data, organizing, reasoning, and so on.

• It is challenging to choose suitable methods to resolve the complexity of the process.

• The Goal of Data Analysis:

• Data analysis need not essentially involve arithmetic or statistics. While it is true that analysis often involves
one or both, and that many analytical pursuits cannot be handled without them, much of the data analysis
that people perform in the course of their work involves at most mathematics no more complicated than the
calculation of the mean of a set of values.

• The essential activity of analysis is a comparison (of values, patterns, etc.), which can often be done by
simply using our eyes.

• The aim of the analysis is not to find out appealing information in the data. Rather, this is only a vital part of
the process (Berthold & Hand, 2003). The aim is to make sense of data (i.e., to understand what it means)
and then to make decisions based on the understanding that is achieved.

• Information in and of itself is not useful. Even understanding information in and of it is not useful. The aim of
data analysis is to make better decisions.
• The process of data analysis starts with the collection of data that can add to the solution of any
given problem, and with the organization of that data in some regular form.

• It involves identifying and applying a statistical or deterministic schema or model of the data that
can be manipulated for explanatory or predictive purposes.

• It then involves an interactive or automated solution that explores the structured data in order to
extract information – a solution to the business problem – from the data.
Big Data Analytics (Three Types)

• With the flood of data available to businesses regarding their supply chain these days, companies
are turning to analytics solutions to extract meaning from the huge volumes of data to help
improve decision making.

• Big data analytics reformed the ways to conduct business in many ways, such as it improves
decision making, business process management, etc.

• Business analytics uses the data and different other techniques like information technology,
features of statistics, quantitive methods and different models to provide results.

• There are three main types of business analytics:

• Descriptive analytics
• Predictive analytics
• Prescriptive analytics
• So, in order for a business to have a holistic view of the market and
how a company competes efficiently within that market requires a
robust analytic environment which includes:

• Descriptive Analytics, which use data aggregation and data mining to

provide insight into the past and answer: “What has happened?”

• Predictive Analytics, which use statistical models and forecasting

techniques to understand the future and answer: “What could
happen?”

• Prescriptive Analytics, which use optimization and simulation

algorithms to advise on possible outcomes and answer: “What should
we do?”
• Descriptive Analytics: Insight into the past
• Descriptive analysis or statistics does exactly what the name implies: they “describe”, or
summarize, raw data and make it something that is interpretable by humans.

• Descriptive analytics analyses a database to provide information on the trends of past or current
business events that can help the organization to develop a road map for future actions.

• Descriptive analytics are useful because they allow us to learn from past behaviors, help in
determinig what is happening at the present time, and understand how they might influence
future outcomes.

• The vast majority of the statistics we use fall into this category (Think basic arithmetic like sums,
averages, percent changes). Usually, the underlying data is a count, or aggregate of a filtered
column of data to which basic math is applied.
• Descriptive statistics are useful to show things like total stock in inventory, average dollars spent
per customer and year-over-year change in sales.

• Common examples of descriptive analytics are reports that provide historical insights regarding
the company’s production, financials, operations, sales, finance, inventory and customers.

• You should use Descriptive Analytics when you need to understand at an aggregate level what is
going on in your company, and when you want to summarize and describe different aspects of
your business.
• Predictive Analytics: Understanding the future
• Predictive analytics has its roots in the ability to “predict” what might happen.

• These analytics are about understanding the future.

• Predictive analytics provides companies with actionable insights based on data. Predictive
analytics provides estimates about the likelihood of a future outcome.

• It is important to remember that no statistical algorithm can “predict” the future with
100% certainty. Companies use these statistics to forecast what might happen in the
future. This is because the foundation of predictive analytics is based on probabilities.
• These statistics try to take the data that you have, and fill in the missing data with best guesses.
They combine historical data found in ERP, CRM, HR and POS systems to identify patterns in the
data and apply statistical models and algorithms to capture relationships between various data
sets.

• Companies use predictive statistics and analytics any time they want to look into the future.

• Predictive analytics can be used throughout the organization, from forecasting customer behavior
and purchasing patterns to identifying trends in sales activities. They also help forecast demand
for inputs from the supply chain, operations and inventory.

• One common application most people are familiar with is the use of predictive analytics to
produce a credit score. These scores are used by financial services to determine the probability of
customers making future credit payments on time.
• Prescriptive Analytics:
Advise on possible outcomes
• The relatively new field of prescriptive analytics allows users to “prescribe” a number of different
possible actions and guide them towards a solution.

• In a nutshell, these analytics are all about providing advice.

• Prescriptive analytics attempts to quantify the effect of future decisions in order to advise on
possible outcomes before the decisions are actually made.

• At their best, prescriptive analytics predicts not only what will happen, but also why it will
happen, providing recommendations regarding actions that will take advantage of the
predictions.
• These analytics go beyond descriptive and predictive analytics by recommending one or more
possible courses of action. Essentially they predict multiple futures and allow companies to assess
a number of possible outcomes based upon their actions.

• Prescriptive analytics use a combination of techniques and tools such as business rules,
algorithms, machine learning and computational modelling procedures. These techniques are
applied against input from many different data sets including historical and transactional data,
real-time data feeds, and big data.

• Prescriptive analytics are relatively complex to administer, and most companies are not yet using
them in their daily course of business. Larger companies are successfully using prescriptive
analytics to optimize production, scheduling and inventory in the supply chain to make sure they
are delivering the right products at the right time and optimizing the customer experience.

• Prescriptive Analytics should be used any time you need to provide users with advice on what
action to take.
• There's another type, called, Diagnostic analytics, that takes descriptive data a step further and
provides deeper analysis to answer the question: "Why did this happen?".
• Often, diagnostic analysis is referred to as root cause analysis.
• This includes using processes such as data discovery, data mining, and drill down and drill
through.

• When analyzing data, we can also categoize Big Data analytics as follows:
✔ Basic analytics
✔ Advanced analytics
✔ Operational analytics
✔ Monetized analytics
• Summary of the four approaches:
Advantages of Big Data Analytics
• Cost Savings: Some big data tools, such as Hadoop and cloud-based analytics, can bring a cost
advantage to a company by the amount of data they need to store. These tools also help you
identify more efficient ways to do business.

• Time Reductions: The rapid speed of tools such as Hadoop and in-memory analysis makes it easy
to identify new data sources, helping businesses analyze data instantly and make quick decisions
based on what they learn.

• New Product Development: By knowing the trends in customer needs and satisfaction through
analytics, you can design products according to customer needs.

• Understand the market conditions: Analyzing big data gives you a better understanding of
current market conditions. For example, by analyzing customer buying behavior, the company
that sells the most is analyzing and manufacturing products according to this trend. With it, it can
outperform its competitors.

• Control online reputation: Big data tools can do sentiment analysis. Therefore, you get feedback
on who is talking about your business. If you want to monitor and improve your business's online
presence, big data tools can help
Challenges of Conventional Systems
• Three major challenges that Big Data face are as follows:

1. Data or Volume

2. Process

3. Management
• Data or Volume:
• The volume of data, especially machine-generated data, is exploding,
• How fast that data is growing every year, withnewsources of data that are emerging.
• For example, in the year 2000, 800,000petabytes (PB) of data were stored in the world, and it is
expected to reach 35 zettabytes(ZB) by 2020 (according to IBM).

• Processing:
• More than 80% of today’s information is unstructured and it is typically too big to manage
effectively.
• Today, companies are looking to leverage a lot more data from a wider variety of sources both
inside and outside the organization.
• Things like documents, contracts, machine data, sensor data, social media, health records, emails,
etc. The list is endless really.

• Management:
• A lot of this data is unstructured, or has a complex structure that’s hard to represent in rows and
columns.
Relational Database Management Systems -
Why can’t we use databases with lots of disks to do
large-scale analysis? Why is Hadoop
needed?
• Although Hadoop isn’t the first distributed system for data storage and analysis, but it has some
unique properties that set it apart from other systems that may seem similar.

• Let's find the answer to the above questions by exploring how Hadoop is different than the
traditional systems like RDBMSs (for example, seek time, normalization, scaling, etc.)
• First, here are some diffrerences between
the two:
• Seek Time:

• A trend today in disk drives is that seek time is improving more slowly than transfer rate.
• Seeking is the process of moving the disk’s head to a particular place on the disk to read or
write data. It characterizes the latency of a disk operation, whereas the transfer rate
corresponds to a disk’s bandwidth.

• If the data access pattern is dominated by seeks, it will take longer to read or write large portions
of the dataset than streaming through it, which operates at the transfer rate.

• On the other hand, for updating a small proportion of records in a database, a traditional B-Tree
(the data structure used in relational databases, which is limited by the rate at which it can
perform seeks) works well. For updating the majority of a database, a B-Tree is less efficient than
MapReduce, which uses Sort/Merge to rebuild the database.

• In many ways, MapReduce can be seen as a complement to a Relational Database Management

System (RDBMS).
• MapReduce is a good fit for problems that need to analyze the whole dataset in a batch fashion,
particularly for ad hoc analysis.

• An RDBMS is good for point queries or updates, where the dataset has been indexed to deliver
low-latency retrieval and update times of a relatively small amount of data.

• MapReduce suits applications where the data is written once and read many times, whereas a
relational database is good for datasets that are continually updated.

• However, the differences between relational databases and Hadoop systems are blurring.

• Relational databases have started incorporating some of the ideas from Hadoop, and from the
other direction, Hadoop systems such as Hive are becoming more interactive (by moving away
from MapReduce) and adding features like indexes and transactions that make them look more
and more like traditional RDBMSs.
• Structure:

• Another difference between Hadoop and an RDBMS is the amount of structure in the datasets on
which they operate.
• Structured data is the realm of the RDBMS.

• Semi-structured data, on the other hand, is looser: for example, a spreadsheet, in which the
structure is the grid of cells, although the cells themselves may hold any form of data.
• Unstructured data does not have any particular internal structure: for example, plain text or
image data.

• Hadoop works well on unstructured or semi-structured data because it is designed to interpret the
data at processing time (so called schema-on-read).
• This provides flexibility and avoids the costly data loading phase of an RDBMS, since in Hadoop it
is just a file copy.
• Normalization:

• Relational data is often normalized to retain its integrity and remove redundancy.

• Normalization poses problems for Hadoop processing because it makes reading a record a
nonlocal operation, and one of the central assumptions that Hadoop makes is that it is possible to
perform (high-speed) streaming reads and writes.

• A web server log is a good example of a set of records that is not normalized (for example, the
client hostnames are specified in full each time, even though the same client may appear many
times), and this is one reason that logfiles of all kinds are particularly well suited to analysis with
Hadoop.

(Note that Hadoop can perform joins; it’s just that they are not used as much as in the relational
world.)
• Scaling:

• MapReduce—and the other processing models in Hadoop—scales linearly with the size of the
data.

• Data is partitioned, and the functional primitives (like map and reduce) can work in parallel on
separate partitions.

• This means that if you double the size of the input data, a job will run twice as slowly.

• But if you also double the size of the cluster, a job will run as fast as the original one.

• This is not generally true of SQL queries.

Thank You

FLR Technique (Lessons in Polyglottery) Languag..
100% (1)
FLR Technique (Lessons in Polyglottery) Languag..
6 pages
The Basics of Data Analytics
88% (8)
The Basics of Data Analytics
17 pages
Data Analytics for Beginners: Introduction to Data Analytics
From Everand
Data Analytics for Beginners: Introduction to Data Analytics
Anthony S. Williams
4/5 (18)
Business Analytics
From Everand
Business Analytics
Hiriyappa .B
5/5 (1)
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Module 2 - Fund. of Business Analytics
No ratings yet
Module 2 - Fund. of Business Analytics
26 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Business Analytics Introduction
No ratings yet
Business Analytics Introduction
37 pages
Unit1
No ratings yet
Unit1
21 pages
BigData DataAnalyticsTypes
No ratings yet
BigData DataAnalyticsTypes
9 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
Types of Analytics - Descriptive, Predictive, Prescriptive Analytics
No ratings yet
Types of Analytics - Descriptive, Predictive, Prescriptive Analytics
6 pages
2-Fundamentals of DA
No ratings yet
2-Fundamentals of DA
28 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
4 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
62 pages
business-analytics-notes
No ratings yet
business-analytics-notes
12 pages
1overview of Data Analysis
No ratings yet
1overview of Data Analysis
3 pages
Unit 1 Business Analytics
No ratings yet
Unit 1 Business Analytics
24 pages
Module 1 - Introduction To BA
No ratings yet
Module 1 - Introduction To BA
33 pages
unit-1-221226040256-44f48981
No ratings yet
unit-1-221226040256-44f48981
32 pages
Intro to Business Analytics
No ratings yet
Intro to Business Analytics
27 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Big - Data Unit-2
100% (2)
Big - Data Unit-2
64 pages
Data Handling
No ratings yet
Data Handling
7 pages
Beginners Guide To Analytics Ebook
100% (1)
Beginners Guide To Analytics Ebook
50 pages
data-science-unit-1
No ratings yet
data-science-unit-1
12 pages
Shruti Internship Report
No ratings yet
Shruti Internship Report
14 pages
002 - Discover Data Analysis - Overview of Data Analysis
No ratings yet
002 - Discover Data Analysis - Overview of Data Analysis
4 pages
The Two-Minute Guide To Understanding and Selecting The Right Descriptive, Predictive, and Prescriptive Analytics
No ratings yet
The Two-Minute Guide To Understanding and Selecting The Right Descriptive, Predictive, and Prescriptive Analytics
6 pages
Data Analytics And Business Intelligence NOTES (1)
No ratings yet
Data Analytics And Business Intelligence NOTES (1)
37 pages
Data Analytics Fundementals
No ratings yet
Data Analytics Fundementals
40 pages
DA UNIT 1
No ratings yet
DA UNIT 1
12 pages
Data ANALYSIS and Data Interpretation
No ratings yet
Data ANALYSIS and Data Interpretation
15 pages
Types of Analytics: What Is Descriptive Analytics?
No ratings yet
Types of Analytics: What Is Descriptive Analytics?
3 pages
Overview of Data Analysis
No ratings yet
Overview of Data Analysis
11 pages
Beginners Guide To Analytics
100% (5)
Beginners Guide To Analytics
50 pages
Research paper.pdf
No ratings yet
Research paper.pdf
9 pages
Unit 1-2
No ratings yet
Unit 1-2
8 pages
AI PL-300
No ratings yet
AI PL-300
193 pages
Module 2 - Introduction To BA
No ratings yet
Module 2 - Introduction To BA
65 pages
Lecture 2
No ratings yet
Lecture 2
12 pages
Data Analysis
No ratings yet
Data Analysis
34 pages
Data Analytics Notes
100% (2)
Data Analytics Notes
8 pages
Case Study (16b) Group
No ratings yet
Case Study (16b) Group
18 pages
Fba M1
No ratings yet
Fba M1
4 pages
Unit 1
No ratings yet
Unit 1
30 pages
Data Science Introduction
100% (1)
Data Science Introduction
54 pages
Ca 1 Merged
No ratings yet
Ca 1 Merged
677 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
3 pages
Types of Data
No ratings yet
Types of Data
11 pages
MODULE 1
No ratings yet
MODULE 1
40 pages
Module 1 - BA
No ratings yet
Module 1 - BA
24 pages
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
No ratings yet
Analysing Data: Involves Using Tools and Techniques To Identify Patterns, Trends, and
52 pages
20200321145947_DSS_Chapter_SIX
No ratings yet
20200321145947_DSS_Chapter_SIX
10 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
Bda CH1
No ratings yet
Bda CH1
18 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Business Analytics
From Everand
Business Analytics
Hiriyappa .B, Ph.D.
5/5 (1)
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
DIPPR Compound List 2011 PDF
No ratings yet
DIPPR Compound List 2011 PDF
50 pages
Catering Services Rates - Hospitality Department
No ratings yet
Catering Services Rates - Hospitality Department
5 pages
Malvern Property News 19/08/2011
No ratings yet
Malvern Property News 19/08/2011
22 pages
Learning Forensic Assessment Research and Practice 2nd Edition Rebecca Jackson pdf download
100% (2)
Learning Forensic Assessment Research and Practice 2nd Edition Rebecca Jackson pdf download
49 pages
Documentum Server 7.2 Release Notes
No ratings yet
Documentum Server 7.2 Release Notes
37 pages
10 Key Techniques For Making Cocktails
No ratings yet
10 Key Techniques For Making Cocktails
4 pages
Fall 2021 Ridley's Games UK Catalog
100% (3)
Fall 2021 Ridley's Games UK Catalog
55 pages
Ri PKM
No ratings yet
Ri PKM
29 pages
List of I E Irodov Books Important For IIT JEE Physics - Eduniti
No ratings yet
List of I E Irodov Books Important For IIT JEE Physics - Eduniti
1 page
All Saints Choir Handbook 2023-24 Final Web
No ratings yet
All Saints Choir Handbook 2023-24 Final Web
16 pages
Hailemeskel Gazu
No ratings yet
Hailemeskel Gazu
59 pages
(Ebook) Cityscapes.indb by Eva Mortensen;Birte Poulsen; instant download
No ratings yet
(Ebook) Cityscapes.indb by Eva Mortensen;Birte Poulsen; instant download
48 pages
Times of India - 2019 - 03 - 08 PDF
No ratings yet
Times of India - 2019 - 03 - 08 PDF
36 pages
Structural Geology
No ratings yet
Structural Geology
203 pages
Francis Bacon - Maxims
No ratings yet
Francis Bacon - Maxims
573 pages
Collaboration of Traditional Learning and E23 Autosaved
No ratings yet
Collaboration of Traditional Learning and E23 Autosaved
49 pages
Swami Gitananda
100% (1)
Swami Gitananda
32 pages
Unilever's Lifebuoy in India: Implementing The Sustainability Plan
No ratings yet
Unilever's Lifebuoy in India: Implementing The Sustainability Plan
13 pages
ENGL 4000 Syllabus Spring 2019
No ratings yet
ENGL 4000 Syllabus Spring 2019
7 pages
Business Continuity and Disaster Recovery: JUNE 1, 2018
No ratings yet
Business Continuity and Disaster Recovery: JUNE 1, 2018
13 pages
Annual Town Report of Springfield, Vermont - 2014
No ratings yet
Annual Town Report of Springfield, Vermont - 2014
195 pages
Unit Ii
100% (1)
Unit Ii
7 pages
32 OnScreen 2 Test EXIT
0% (1)
32 OnScreen 2 Test EXIT
6 pages
Module - Iii: Introduction To Naturopathy
No ratings yet
Module - Iii: Introduction To Naturopathy
10 pages
Drama Script
No ratings yet
Drama Script
3 pages
Research Paper in The Political Science
No ratings yet
Research Paper in The Political Science
18 pages
LS-DYNA Applications in Shipbuilding
No ratings yet
LS-DYNA Applications in Shipbuilding
17 pages
Adolescence-Between Childhood and Adulthood
No ratings yet
Adolescence-Between Childhood and Adulthood
45 pages

u1 c clsrm

Uploaded by

u1 c clsrm

Uploaded by

Lecture-6

Big Data (KCS-061)

• IDA, in general, includes three stages:

• The main goal of intelligent data analysis is to obtain knowledge.

• It is challenging to choose suitable methods to resolve the complexity of the process.

• There are three main types of business analytics:

• Descriptive Analytics, which use data aggregation and data mining to

• Predictive Analytics, which use statistical models and forecasting

• Prescriptive Analytics, which use optimization and simulation

• These analytics are about understanding the future.

• In a nutshell, these analytics are all about providing advice.

• In many ways, MapReduce can be seen as a complement to a Relational Database Management

• This is not generally true of SQL queries.

You might also like