0% found this document useful (0 votes)

44 views

Module 2 BDA

This document provides an introduction to big data analytics. It discusses the need for big data due to the large volume, variety and velocity of data being produced. Big data is defined as data sets that are too large or complex for traditional data processing methods. The document covers different types of big data, characteristics of big data, and classifications of data including structured, semi-structured and unstructured data. It also discusses outliers, missing values, relationships between variables, probability and probability distributions which are important concepts in big data analytics.

Uploaded by

ARYA MURALI ECE-2020-24

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Module 2 BDA

Uploaded by

ARYA MURALI ECE-2020-24

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Module-1

Introduction to Big Data Analytics

Need of Big Data

 The rise in technology has led to the production and storage of voluminous amounts of data.
 Due to large volume of data, variety of data, various forms and formats pose challenges to
conventional systems for storage, processing and analysis.
 Increasing complexity needs a means of quick processing analyzing and usage of data
Big Data
 Definition: Big Data is high-volume, high –velocity and high-variety information asset that requires new
forms of processing for enhanced decision making, insight discovery and process optimization

 “A collection of data sets so large or complex that traditional data processing applications are inadequate”-
Wikipedia

 “Data of very large size, typically to the extent that its manipulation and management present significant
logistical challenges” - Oxford

 “Big data refers to data sets whose size is beyond the ability of typical database software tool to capture,
store, manage and analyze” – McKinsey Global Institute

 Data: is information, usually in the form of facts or statistics that can be analyzed or used.

 Web Data: Data present on web servers in the form of text, images, videos, audios and multimedia files for
web users.

 Examples: Wikipedia, Google Maps, McGraw-Hill Connect etc.,

Classification of Data

Example: XML, JSON document

• Contains tags/markers to separate elements and
records
• Data does not associate any formal model

Possible to
• Insert, delete, update,
append
• Indexing
• Scalability
• Transaction processing
• Encryption and • Do not conform and associate
Decryption with any data models
• File types: .TXT, .CSV
• Data may have internal
structure, but do not reveal any
relationship

Multi-structured data: consists of multiple formats of data ex: structured, semi structures/unstructured
Example: streaming of data on customer interactions, data of multiple sensors, data at web/enterprise server etc
Big Data Characteristics
Big Data Types
 Social networks and web data: Facebook, Twitter, e-mails, blogs et.,

 Transaction data and Business Processes (BPs): credit card transactions, flight bookings
etc.,

 Customer master data: data for facial recognition and for the name, date of birth, location
and income category etc.,

 Machine-generated data: machine-to-machine, IoT, web blogs, computer systems log etc.,

 Human-generated data: biometrics data, human-machine interaction data, e-mail records

with a mail server, MySQL database of student grades
TIME SERIES DATA
Outliers
 An outlier is literally a value or an entire observation (row) that lies well outside of the
norm.
 For example, this is often the
 case with data entry errors. Suppose a data set includes a Height variable, a person’s height
 measured in inches, and you see a value of 720. This is certainly an outlier—and it is certainly
 an error. Once you spot it, you can go back and check this observation to see what the
 person’s height should be. Maybe an extra 0 was accidentally appended and the true value
 is 72. In any case, this type of outlier is usually easy to discover and ix.
Outliers
 An outlier is literally a value or an entire observation (row) that lies well outside of the
norm.
 For example, it would
 be strange to ind a person with Age equal to 10 and Height equal to 72. Neither of these
 values is unusual by itself, but the combination is certainly unusual Missing Values
Missing Values
 For an Excel data set, you
 might expect missing data to be obvious from blank cells. This is certainly one possibility,
 but there are others. Missing data are coded in a variety of strange ways. One common
 method is to code missing values with an unusual number such as −9999 or 9999. Another
 method is to code missing values with a symbol such as − or *. If you know the code (and
 it is often supplied in a footnote), then it is usually a good idea, at least in Excel, to perform
 a global search and replace, replacing all of the missing value codes with blanks
Missing Values
 For an Excel data set, you
 The more important issue is what to do about missing values. One option is to ignore them.
 if you use Excel’s AVERAGE function on a column of data with missing values,
 it reacts the way you would hope and expect—it adds all the nonmissing values and divides by
the number of nonmissing values
 Similarly for rows and colums
Finding Relationships amongVariables
 The primary interest is usually in relationships between
 variables. For example, it is natural to ask what drives baseball salaries. Does it depend
 on qualitative factors, such as the player’s team or position? Does it depend on quantitative
 factors, such as the number of hits the player gets or the number of strikeouts?
RELATIONSHIPS AMONG CATEGORICALVARIABLES
 Consider a data set with at least two categorical variables, Smoking and Drinking. Each
 person is categorized into one of three smoking categories: nonsmoker (NS), occasional
 smoker (OS), and heavy smoker (HS). Similarly, each person is categorized into
 one of three drinking categories: nondrinker (ND), occasional drinker (OD), and heavy
 drinker (HD). Do the data indicate that smoking and drinking habits are related? For
 example, do nondrinkers tend to be nonsmokers? Do heavy smokers tend to be heavy drinkers?
RELATIONSHIPS AMONG CATEGORICALVARIABLES
RELATIONSHIPS AMONG CATEGORICAL VARIABLES AND
A NUMERICAL VARIABLE
 whenever you want to compare a numerical measure across two or more subpopulations.Here are
some examples
 ■ The subpopulations are males and females, and the numerical measure is salary.
 ■ The subpopulations are different regions of the country, and the numerical measure is
 the cost of living.
 ■ The subpopulations are different days of the week, and the numerical measure is the
 number of customers going to a particular fast-food chain.
 ■ The subpopulations are different machines in a manufacturing plant, and the
 numerical measure is the number of defective parts produced per day.
RELATIONSHIPS AMONG CATEGORICAL VARIABLES AND
A NUMERICAL VARIABLE
 whenever you want to compare a numerical measure across two or more subpopulations.Here are
some examples
 The subpopulations are patients who have taken a new drug and those who have
 taken a placebo, and the numerical measure is the recovery rate from a particular
 disease.
 ■ The subpopulations are undergraduates with various majors (business, English,
 history, and so on), and the numerical measure is the starting salary after graduating
RELATIONSHIPS AMONG NUMERICAL VARIABLES
 This section discusses methods for inding relationships among numerical variables. For example,
we might want to examine the relationship between heights and weights of people, or between
salary and years of experience of employees. To study such relationships,
 we introduce two new summary measures, correlation and covariance, and a new type of chart
called a scatterplot.
Probability and Probability Distributions
 key aspect of solving real business problems is dealing appropriately with uncertainty. This
involves recognizing explicitly that uncertainty exists and using quantitative methods to model
uncertainty. If you want to develop realistic business models, you cannot simply act as if
uncertainty doesn’t exist.
 For example, if you don’t know next month’s demand, you shouldn’t build a model that assumes
next month’s demand is a sure 1500 units. This
 is only wishful thinking. You should instead incorporate demand uncertainty explicitly into
 your model. To do this, you need to know how to deal quantitatively with uncertainty. This
 involves probability and probability distributions
Probability and Probability Distributions
 There are many sources of uncertainty. Demands for products are uncertain, times
 between arrivals to a supermarket are uncertain, stock price returns are uncertain, changes in
interest rates are uncertain, and so on.
 In many situations, the uncertain quantity—
 demand, time between arrivals, stock price return, change in interest rate—is a numerical quantity.
In the language of probability, it is called a random variable. More formally, a random variable
associates a numerical value with each possible random outcome
Probability and Probability Distributions
PROBABILITY ESSENTIALS
 When a weather forecaster states that the chance of rain is
 70%, he or she is making a probability statement. When a sports commentator states that the odds
against the Miami Heat winning the NBA Championship are 3 to 1, he or she is also making a
probability statement.

 A probability is a number between 0 and 1 that measures the likelihood that some
 event will occur. An event with probability 0 cannot occur, whereas an event with
 probability 1 is certain to occur. An event with probability greater than 0 and less than 1 involves
uncertainty. The closer its probability is to 1, the more likely it is to occur
Rule of Complements
 The simplest probability rule involves the complement of an event. If A is any event, then the
complement of A, denoted by A (or in some books by Ac), is the event that A does not occur..
 If the probability of A is P(A), then the probability of its complement, P(A), is given
 by Equation P(A) = 1 − P(A) Equivalently, the probability of an event and the probability of its
complement sum to 1. For example, if you believe that the probability of the Dow Finishing at or
above 14,000 is 0.25, then the probability that it will Finish the year below 14,000 is
 1 − 0.25 = 0.75.
Rule of Complements
 The simplest probability rule involves the complement of an event. If A is any event, then the
complement of A, denoted by A (or in some books by Ac), is the event that A does not occur.
 14,000 mark, then the complement of A is that the Dow will inish the year below 14,000.
 If the probability of A is P(A), then the probability of its complement, P(A), is given
 by Equation P(A) = 1 − P(A) Equivalently, the probability of an event and the probability of its
complement sum to 1. For example, if you believe that the probability of the Dow Finishing at or
above 14,000 is 0.25, then the probability that it will Finish the year below 14,000 is
 1 − 0.25 = 0.75.
Addition Rule
 Events are mutually exclusive if at most one of them can occur. That is, if one of them occurs, then
none of the others can occur.,
 exhaustive events, which means that they exhaust all possibilities—one
 of these three events must occur.
 Let A1 through An be any n events. Then the addition rule of probability involves the probability that
at least one of these events will occur.
 In addition, if the events A1 through An are exhaustive, then the probability is one because one of the
events is certain to occur
 Addition Rule for Mutually Exclusive Events
 P(at least one of A1 through An) = P(A1) + P(A2) + … + P(An) (4.2)
Addition Rule
 For example, consider the following three
 events involving a company’s annual revenue for the coming year:
(1) revenue is less than $1 million,
(2) revenue is at least $1 million but less than $2 million, and
(3) revenue is at least $2 million
Therefore, their probabilities must sum to 1. Suppose these probabilities are P(A1) = 0.5,
P(A2) = 0.3, and P(A3) = 0.2
 For example, the event that revenue is at least $1 million is the event
that either A2 or A3 occurs. From the addition rule, its probability is
P(revenue is at least $1 million) = P(A2) + P(A3) = 0.5
P(revenue is less than $2 million) = P(A1) + P(A2) = 0.8
P(revenue is less than $1 million or at least $2 million) = P(A1) + P(A3) = 0.7
Conditional Probability and the Multiplication Rule
 Let A and B be any events with probabilities P(A) and P(B). Typically, the probability
P(A) is assessed without knowledge of whether B occurs. However, if you are told that
B has occurred, then the probability of A might change. The new probability of A is
called the conditional probability of A given B, and it is denoted by P(A∣B).

 Conditional Probability
 P(A∣B) =P(A and B)
P(B)
 The numerator in this formula is the probability that both A and B occur. This probability must be
known to ind P(A∣B). However, in some applications P(A∣B) and P(B) are known. Then you can
multiply both sides of Equation (4.3) by P(B) to obtain the following multiplication rule for P(A
and B).
 Multiplication Rule P(A and B) = P(A∣B) P(B)
Conditional Probability and the Multiplication Rule
 Let A be the event that Bender meets its end-of-July deadline, and let B be the event that Bender
receives the materials from its supplier by the middle of July. The probabilities
 Bender is best able to assess on July 1 are probably P(B) and P(A∣B). At the beginning of July,
Bender might estimate that the chances of getting the materials on time from its supplier are 2 out of
3, so that P(B) = 2/3. Also, thinking ahead, Bender estimates that if it receives the required materials
on time, the chances of meeting the end-of-July deadline are 3 out of 4. This is a conditional
probability statement, namely, that P(A∣B) = 3/4. Then
 the multiplication rule implies that
 P(A and B) = P(A∣B)P(B) = (3/4) (2/3) = 0.5
 That is, there is a fifty-fifty chance that Bender will get its materials on time and meet its end-of-July
deadline.
Equally Likely Events
 Much of what you know about probability is probably based on situations where outcomes are
equally likely. These include lipping coins, throwing dice, drawing balls from urns, and other random
mechanisms.
 For example, suppose an urn contains 20 red marbles and 10 blue marbles. You plan to randomly
select ive marbles from the urn, and you are interested, say, in the probability of selecting at least
three red marbles
PROBABILITY DISTRIBUTION OF A SINGLE RANDOM VARIABLE

 This section we examine the probability distribution of a single random variable.

 There are really two types of random variables:
 discrete and continuous. A discrete random variable has only a finite number of possible values,
 whereas a continuous random variable has a continuum of possible values
 For example, the number of children in a family is clearly discrete, whereas the amount of rain this
year in San Francisco is clearly continuous.
 The essential properties of a discrete random variable and its associated probability distribution are
quite simple
PROBABILITY DISTRIBUTION OF A SINGLE RANDOM VARIABLE

 Let X be a random variable. To specify the probability distribution of X, we need to

 specify its possible values and their probabilities. We assume that there are k possible values, denoted
v1, v2, . . . , vk. The probability of a typical value vi is denoted in one of two ways, either P(X = vi)
or p(vi).
 The first is a reminder that this is a probability involving the random variable X, whereas the second
is a shorthand notation. Probability distributions must satisfy two criteria: (1) the probabilities must
be nonnegative, and (2) they must sum to 1. In symbols, we must have
 ak
 i=1
 p(vi) = 1, p(vi) ≥ 0
Normal, Binomial, Poisson and Exponential distribution

 Normal distribution, also known as the Gaussian distribution, is a probability distribution that is
symmetric about the mean, showing that data near the mean are more frequent in occurrence than
data far from the mean.
 In graphical form, the normal distribution appears as a "bell curve".
 What Is Binomial Distribution?
 Binomial distribution is a statistical distribution that summarizes the probability that a value will take
one of two independent values under a given set of parameters or assumptions
 Binomial distribution is a common discrete distribution used in statistics, as opposed to a continuous
distribution, such as normal distribution. This is because binomial distribution only counts two states,
typically represented as 1 (for a success) or 0 (for a failure), given a number of trials in the data.
Normal, Binomial, Poisson and Exponential distribution

 A Poisson distribution is a discrete probability distribution. It gives the probability of an event

happening a certain number of times (k) within a given interval of time or space.
 In general, Poisson distributions are often appropriate for count data. Count data is composed of
observations that are non-negative integers
 What is Exponential Distribution?
 In Probability theory and statistics, the exponential distribution is a continuous
probability distribution that often concerns the amount of time until some specific event happens
 . For example, the amount of time (beginning now) until an earthquake occurs has an exponential
distribution. Other examples include the length, in minutes, of long distance business telephone calls,
and the amount of time, in months, a car battery lasts
 For example, the amount of money customers spend in one trip to the supermarket follows an
exponential distribution. There are more people who spend small amounts of money and fewer
people who spend large amounts of money.
PROBABILITY DISTRIBUTION OF A SINGLE RANDOM VARIABLE
NORMAL DISTRIBUTION

 Any particular normal distribution is speciied by its mean and standard deviation. By changing the
mean, the normal curve shifts to the right or left. By changing the standard deviation, the curve
becomes more or less spread out
 Continuous Distributions and Density Functions
 For continuous distributions such as the normal distribution Now instead of a list of
possible values, there is a continuum of possible values, such as all values between 0 and 100 or all
values greater than 0. Instead of assigning probabilities to each individual value in the continuum, the
total probability of 1 is spread over this continuum.
The key to this spreading is called a density function, which acts like a histogram. The higher the value
of the density function, the more likely this region of the continuum is.
NORMAL DISTRIBUTION

 A density function, usually denoted by f(x), specifies the probability distribution of

a continuous random variable X. The higher f(x) is, the more likely x is. Also, the
total area between the graph of f(x) and the horizontal axis, which represents the total
probability, is equal to 1. Finally, f(x) is nonnegative for all possible values of X.

APPLICATIONS OF THE NORMAL DISTRIBUTION

(Refer Text book)
BINOMIAL DISTRIBUTION

 The binomial distribution is a discrete distribution that can occur in two situations: (1) when
sampling from a population
with only two types of members (males and females, for example), and (2) when
performing a sequence of identical experiments, each of which has only two possible outcomes.
 Consider a situation where there are n independent, identical trials, where the
probability of a success on each trial is p and the probability of a failure is 1 − p.
Define X to be the random number of successes in the n trials. Then X has a binomial
distribution with parameters n and p.

Application (Refer Text Book)

Poisson Distribution

 The Poisson distribution is a discrete distribution. It usually applies to the number of

events occurring within a speciied period of time or space. Its possible values are all of the nonnegative
integers: 0, 1, 2, and so on—there is no upper limit. Even though there is an infinite number of possible
values, this causes no real problems
Typical Examples of the Poisson Distribution
 A bank manager is studying the arrival pattern to the bank. The events are customer arrivals, the
number of arrivals in an hour is Poisson distributed, and λ represents the expected number of arrivals
per hour.
 An engineer is interested in the lifetime of a type of battery. A device that uses this
type of battery is operated continuously. When the irst battery fails, it is replaced by
a second; when the second fails, it is replaced by a third, and so on. The events are
battery failures, the number of failures that occur in a month is Poisson distributed,
and λ represents the expected number of failures per month.
Poisson Distribution

 A retailer is interested in the number of customers who order a particular product in

a week. Then the events are customer orders for the product, the number of customer
orders in a week is Poisson distributed, and λ is the expected number of orders per week.
 In a quality control setting, the Poisson distribution is often relevant for describing
the number of defects in some unit of space. For example, when paint is applied to
the body of a new car, any minor blemish is considered a defect. Then the number of
defects on the hood, say, might be Poisson distributed. In this case, λ is the expected
number of defects per hood.
Poisson Distribution

 A retailer is interested in the number of customers who order a particular product in

 Suppose that a bank manager is studying the pattern of customer arrivals at her branch location. As
indicated previously in this section, the number of arrivals in an hour at a facility such as a bank is
often well described by a Poisson distribution with parameter λ, where λ represents the expected
number of arrivals per hour. An alternative way to view the uncertainty in the arrival process is to
consider the times between customer arrivals. The most common probability distribution used to
model these times, often called interarrival times, is the exponential distribution

610-Wriston Case Analysis
100% (3)
610-Wriston Case Analysis
5 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Chaper 3 FoDS - Copy
No ratings yet
Chaper 3 FoDS - Copy
127 pages
Data Management - Data Governance
No ratings yet
Data Management - Data Governance
27 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
Dou 10 06 2024 DBMS
No ratings yet
Dou 10 06 2024 DBMS
14 pages
Data Science & Analytics Paper
No ratings yet
Data Science & Analytics Paper
55 pages
unit4
No ratings yet
unit4
32 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Unit 2 1
No ratings yet
Unit 2 1
70 pages
Unit 4
No ratings yet
Unit 4
29 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
74 pages
What is data
No ratings yet
What is data
8 pages
Quantum DA Review
No ratings yet
Quantum DA Review
28 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Facets of Data:: Self-Describing Structure
No ratings yet
Facets of Data:: Self-Describing Structure
6 pages
W1L1,2,3 Lecture Script
No ratings yet
W1L1,2,3 Lecture Script
17 pages
Module 04 Ba
No ratings yet
Module 04 Ba
45 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Data Science Introduction
No ratings yet
Data Science Introduction
82 pages
Types of attributes-1
No ratings yet
Types of attributes-1
8 pages
Introduction to Data Science_students
No ratings yet
Introduction to Data Science_students
237 pages
big data analytics
No ratings yet
big data analytics
15 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
33 pages
The 365 DS Booklet PDF
100% (1)
The 365 DS Booklet PDF
67 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Lecture.pptx
No ratings yet
Lecture.pptx
46 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
BDS Module-1
No ratings yet
BDS Module-1
59 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Antim-Prahar-Data-Analytics-for-Business-Decisions-2025_compressed
No ratings yet
Antim-Prahar-Data-Analytics-for-Business-Decisions-2025_compressed
44 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Unit-1: 1. What Is Big Data? Discuss Different Challenges of Conventional System. Answer
No ratings yet
Unit-1: 1. What Is Big Data? Discuss Different Challenges of Conventional System. Answer
9 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
BIGDATA ANALYTICS
No ratings yet
BIGDATA ANALYTICS
19 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Data Schema Basics
From Everand
Data Schema Basics
Mei Gates
No ratings yet
Big Data UNIT I
No ratings yet
Big Data UNIT I
21 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Modeling Unstructured Data Web
No ratings yet
Modeling Unstructured Data Web
6 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Bigdata FinalAll (2)
No ratings yet
Bigdata FinalAll (2)
62 pages
Big Data Analytics Compiled Notes
No ratings yet
Big Data Analytics Compiled Notes
130 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
44 Recognizing Your Data Types: Structured and Unstructured Data
No ratings yet
44 Recognizing Your Data Types: Structured and Unstructured Data
8 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
35 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
The Following Transactions Occurred During 2014 Assume That Depreciation of
No ratings yet
The Following Transactions Occurred During 2014 Assume That Depreciation of
1 page
General Purpose Financial Statements (Basic Financial Statements)
No ratings yet
General Purpose Financial Statements (Basic Financial Statements)
3 pages
CV Roberto Ingles
No ratings yet
CV Roberto Ingles
2 pages
Best & Top #3 Places to Buy, Verified Cashapp Accounts in Today
No ratings yet
Best & Top #3 Places to Buy, Verified Cashapp Accounts in Today
7 pages
WR-13-2021-03 Revision of Daily Allowance
100% (1)
WR-13-2021-03 Revision of Daily Allowance
2 pages
Plant Layout
No ratings yet
Plant Layout
30 pages
Law On Obligation
No ratings yet
Law On Obligation
25 pages
PeopleSoft Procure-to-Pay
No ratings yet
PeopleSoft Procure-to-Pay
23 pages
Cold Emailers Playbook
No ratings yet
Cold Emailers Playbook
39 pages
Ratio Analysis Problems and Solutions
No ratings yet
Ratio Analysis Problems and Solutions
11 pages
2022 SCM 05 Supply Chain Integration
No ratings yet
2022 SCM 05 Supply Chain Integration
35 pages
Bahan Paper 1
No ratings yet
Bahan Paper 1
20 pages
Chapter 4 (Review Questions)
No ratings yet
Chapter 4 (Review Questions)
4 pages
Agriculture Income
No ratings yet
Agriculture Income
6 pages
Contructual Employees
No ratings yet
Contructual Employees
6 pages
The Creatrix Inventory-Analysis
No ratings yet
The Creatrix Inventory-Analysis
9 pages
Question 1 EDIT
No ratings yet
Question 1 EDIT
9 pages
Module 2 Question Bank
No ratings yet
Module 2 Question Bank
3 pages
PPT ch01
No ratings yet
PPT ch01
40 pages
Sbi General Insurance
No ratings yet
Sbi General Insurance
2 pages
Workshop Billing Plan
No ratings yet
Workshop Billing Plan
20 pages
Kamini Rice Mill Inv 2
No ratings yet
Kamini Rice Mill Inv 2
1 page
02 Task-Performance 1-2
No ratings yet
02 Task-Performance 1-2
2 pages
Sale of Immovable Property
No ratings yet
Sale of Immovable Property
8 pages
Islamic Corporate Governance The Signifi
No ratings yet
Islamic Corporate Governance The Signifi
22 pages
IC Accounts Payable Ledger Template Updated 8552
No ratings yet
IC Accounts Payable Ledger Template Updated 8552
2 pages
AN-ANALYSIS-OF-CAPITAL-STRUCTURE-RELIANCE-INDUSTRIES-LIMITED-1
No ratings yet
AN-ANALYSIS-OF-CAPITAL-STRUCTURE-RELIANCE-INDUSTRIES-LIMITED-1
15 pages
C. General Banking LAW OF 2000 (GBL) : 6. Penalties For Violation
No ratings yet
C. General Banking LAW OF 2000 (GBL) : 6. Penalties For Violation
9 pages
Eco-System For GST and GST Suvidha Providers
76% (17)
Eco-System For GST and GST Suvidha Providers
31 pages

Module 2 BDA

Uploaded by

Module 2 BDA

Uploaded by

Module-1

Introduction to Big Data Analytics

 Examples: Wikipedia, Google Maps, McGraw-Hill Connect etc.,

Example: XML, JSON document

 Human-generated data: biometrics data, human-machine interaction data, e-mail records

 This section we examine the probability distribution of a single random variable.

 Let X be a random variable. To specify the probability distribution of X, we need to

 A Poisson distribution is a discrete probability distribution. It gives the probability of an event

 A density function, usually denoted by f(x), specifies the probability distribution of

APPLICATIONS OF THE NORMAL DISTRIBUTION

Application (Refer Text Book)

 The Poisson distribution is a discrete distribution. It usually applies to the number of

 A retailer is interested in the number of customers who order a particular product in

 A retailer is interested in the number of customers who order a particular product in

You might also like