0% found this document useful (0 votes)

6 views

U3 Prob & Stat & Hypo

The document covers fundamental concepts in probability, statistics, and machine learning, focusing on data types, data processing, and the significance of hypothesis testing. It explains the architecture of big data, the machine learning process, and various statistical methods including Bayes' theorem and hypothesis testing techniques. Additionally, it discusses central tendency, dispersion, and the importance of data quality in achieving accurate results.

Uploaded by

bharatpatidar399

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

U3 Prob & Stat & Hypo

Uploaded by

bharatpatidar399

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Probability, Statistics

& Hypothesis
Understanding Data

• All facts are data.

• Human interpretable
• Diffused data.
• Operational data
• Non-operational data
Big Data

• Big is a larger data whose volume is much larger than 'Small data'
and is characterized as follows:
• Volume
• Velocity
• Variety
• Veracity
• Validity
• Value
Types of Data

• In Big Data, there are three kinds of data.

• Structured data –
• Unstructured data –
• Semi-structured data –
Big Data Architecture Layers

• There are four main Big Data architecture layers:

• Data Ingestion
• Data Processing
• Data Storage
• Data Visualization
• The Big data processing cycle involves data management that consists
of the following steps:
• Data collection
• Data preprocessing
• Applications of machine learning algorithm
• Interpretation of results and visualization of machine learning algorithm
Process of Machine Learning
• ML is the process that starts with defining the data and ends with a
model with some defined level of accuracy.
• Define the problem (Problem Domain)
• Data collection
• Data preparation
• Splitting data in training and testing
• Algorithm selection
• Performance of ML algorithm
Define the problem (Problem Domain)
• Problem: Find out whether the input image is of a human or not.
• To define a problem, it will divide in
• Task (T),
• Experience (E)
• Performance (P)
• Task(T): Classify an image to find if contains human or not.
• Experience(E): Images with the label contains human or not.
• Performance(P): (Error rate) out of all classified images, what
percentage is wrong prediction.
• Lower error rate leads to higher accuracy.
Data collection
• Data can be collected from
• Open/public data source,
• Social media
• Academic research.
• Government or institutional data
• A good quality data yields a better result.
• Good data is one that has the following properties:
• Timeliness
• Relevancy
• Accuracy
• Reliability
• Knowledge about the data
Data preparation
• Preprocessing of data involves processes as follows:
• Cleaning: Involves identifying and rectifying errors or inconsistencies in the
dataset.
• Handling missing values
• Removing duplicates
• Correcting inconsistent data
• Dealing with outliers
• Binning methods
• Smoothing by mean
• Smoothing by bin medians
• Smoothing by bin boundaries

• Formatting:
• Converting categorical data
• Date and time handling
Data preparation
• Sampling:
• Random Sampling:
• Over-sampling and Under-sampling
• Bootstrapping:
• Decomposition:
• Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)
• Scaling:
• Normalization (Min-Max Scaling)
• Standardization (Z-score Scaling)
• Robust Scaling:
Data preprocessing
• In real world, the available data is dirty. 'dirty' means:
• Incomplete data
• Outlier data
• Data with inconsistent values
• Inaccurate data
• Data with missing values
• Duplicate data
Example
• The 'bad' or 'dirty' data can be observed in following patient table.
Example
• Consider the set: S = (12, 14, 19, 22, 24, 26, 28, 31, 32). Apply various
binning techniques and show the result.
Split data in training and testing

• For training the model, data needs to be divided into 3- parts:

Algorithm Selection
• Selection of algorithm depends on the problem definition.
• For e.g. Classification of emails as 'spam' or 'not spam'
requires algorithms that takes input variable and gives an
output as SPAM/ Not SPAM).
Data Transformations
• Data transformation perform operations like normalization to improve
the performance of the data mining algorithms.
• Normalization technique:
• Min-Max Procedure
• z-Score Normalization
Min-Max Procedure
• It is a normalization technique where each variable V is normalized by
its difference with the minimum value divided by the range to a new
range, say 0-1.
Example
• Consider the set: V= (88, 90, 92, 94). Apply Min-Max procedure and
map the marks to a new range 0-1.
z-Score Normalization
• This procedure works by taking the difference between the field value
and mean value, and by scaling this difference by standard deviation
of the attribute.
Example
Consider the mark list V= {10, 20, 30}, convert the marks to z-score.
Data Types

• Data types are classified as:

Data Types
• Other way of classifying the data is based on the number of variables used in the
dataset.
• The data can be classified as
• Univariate data
• Bivariate data
• Multivariate data
Univariate

• Univariate analysis has only one variable.

Bivariate data
• This type of data involves two different variables.
• The analysis of this type of data deals with causes and relationships.
• Bivariate statistics
• Covariance

• Correlation
Example
• Find the covariance and correlation of data
X= (1, 2, 3, 4, 5) and Y= (1, 4, 9, 16, 25).
Multivariate data
• When the data involves three or more variables, it is categorized
under multivariate.
• Analysis techniques are regression analysis, path analysis, factor
analysis, cluster analysis and multivariate analysis of variance
(MANOVA).
Central tendency

• Popular measures of central tendency are

• Mean: The sum of all values divided by the total number of values
• Median: The middle number in an ordered dataset.
• Mode: The most frequent value.
Dispersion
• The spread out a set of data around the central tendency is called
dispersion.
• Dispersion is represented by
• Variance
• Standard deviation
Machine Learning and Importance of Probability and Statistics

• Statistics - In Machine learning statistics is used to analyze data. It

helps find unseen patterns.
• Probability distributions - In machine learning it is used to calculate
confidence intervals for parameters and to calculate critical regions for
hypothesis tests.
Random variable

• A random variable ‘X’ is a process by which a (real) number

x(s) is assigned to each possible outcome of a statistical
experiment
• RV are of two types
• Discrete RVs and
• Continuous RVs
• The nth moment of a random variable X is defined as
Mean square value

• When n = 2

• is called mean square value of random variable X.

Central moments
• The central moments are the moments of the difference
between random variables X and its mean mx .
• Thus the nth central moment is defined as
Variance of random variable
• The second central moment ; i.e. n = 2 is called
variance of random variable X.
Standard Deviation
• The square root of variance is called standard
deviation of random variable X.
Standard deviation = √ (variance)
Cumulative Distribution Function

• The CDF of a RV is defined as the probability that the RV

X takes values less than or equal to x.

• Properties of CDF
Probability Density Function

• Probability density function PDF is defined by

Gaussian or Normal Distribution
• Gaussian Distribution is also called Normal Distribution.
• The PDF for a Gaussian random variable is given as,
Properties of PDF
• It is non negative function for all values of x

• The area under the PDF curve is always unity

• The peak value occurs at x = m (i.e. mean value).

• The plot of Gaussian PDF has even symmetry around mean value
Probability

• Marginal probability is the probability of an event irrespective of the

outcome of another variable.
• Joint probability is the probability of two events occurring
simultaneously.
• Conditional probability is the probability of one event occurring in
the presence of a second event.
Marginal probability

• P (F) =?
Joint Probability of Two Variables

• Find the probability that employee should rank 1 officer and male.
• Occurrence of 2 or more events.
Conditional probability

• Find the probability that employee should rank 1 officer given he is

male.
Bayes Theorem in ML
• Bayes theorem is given by Mr. Thomas Bayes in 17th century.
• Bayes Theorem is a method to determine conditional probabilities –
that is, the probability of one event occurring given that another event
has already occurred.
• Bayes Theorem is a mathematical formula that describes how to
update the probability of a hypothesis (or event) based on new
evidence.
• It allows the model to revise its predictions as new data becomes
available.
• Used widely in Machine Learning (ML), particularly in classification
problems and probabilistic models.
Bayes Theorem in ML

P(B | A) ∗ P(A)
P(A | B) =
P(B)
• The above equation is called as Bayes Rule or Bayes Theorem.
• P(A|B) is called as posterior probability, which we need to calculate. It is
defined as updated probability after considering the evidence.
• P(B|A) is called the likelihood. It is the probability of evidence when hypothesis
is true.
• P(A) is called the prior probability, probability of hypothesis before considering
the evidence
• P(B) is called marginal probability. It is defined as the total probability of the
evidence under all hypotheses
• Hence, Bayes Theorem can be written as:
posterior = likelihood * prior / evidence
Example
• Diagnose whether someone has a certain disease, if following
information is given :
• Prior Probability: Based on the general population, the prior
probability of someone having this disease is 1% (i.e., 1 out of 100
people has it).
• Evidence: Conduct a medical test, which is not perfect. The test has:
• A True Positive Rate of 90%, meaning if the person has the disease, the test
will correctly identify it 90% of the time.
• A False Positive Rate of 10%, meaning if the person does not have the disease,
the test will incorrectly say they have it 10% of the time.
Find out the updated probability (posterior probability) that the
person has the disease given that the test result is positive.
Bayes’ Theorem Examples

• A man is known to speak the lies 1 out of 4 times. He throws a die and
reports that it is a six. Find the probability that is actually a six?

• What is the probability that a patient has diseases meningitis with a

stiff neck? A doctor is aware that disease meningitis causes a patient to
have a stiff neck, and it occurs 80% of the time. He is also aware of
some more facts, which are given as follows :
• The Known probability that a patient has meningitis disease is 1/30,000.
• The Known probability that a patient has a stiff neck is 2%.
What is a Bayesian Network?
• Bayesian Network (BN): A graphical model that represents
probabilistic relationships among variables using a directed acyclic
graph (DAG).

• Components:
• Nodes: Represent random variables (e.g., weather, disease, sensor readings).
• Edges: Represent dependencies or conditional relationships between
variables.
• Conditional Probability Tables (CPTs): Define the probability of a node
given its parent nodes.
Applications of Bayesian Networks

• Medical Diagnosis
• Decision Support Systems
• Risk Assessment
• Machine Learning
• Natural Language Processing
Advantages of Bayesian Networks

• Modeling Uncertainty
• Visual Representation
• Learning from Data
Example
Example
Bayesian Networks Example
• Example: Harry installed a new burglary alarm at his home to detect
burglary. The alarm reliably responds at detecting a burglary but also
responds for minor earthquakes. Harry has two neighbors David and
Sophia, who have taken a responsibility to inform Harry at work when
they hear the alarm. David always calls Harry when he hears the
alarm, but sometimes he got confused with the phone ringing and calls
at that time too. On the other hand, Sophia likes to listen to high
music, so sometimes she misses to hear the alarm. Here we would like
to compute the probability of Burglary Alarm.
• Calculate the probability that alarm has sounded, but there is
neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.
Bayesian Networks
Hypothesis
• A hypothesis is an assumption or prediction based on some evidence
that can be tested .
• It is specifically used in Supervised Machine learning, where an ML
model learns a function that best maps the input to corresponding
outputs with the help of an available dataset.
• Types Hypothesis
• Null Hypothesis (H0)
• Alternative Hypothesis (H1)
Hypothesis testing

• There are two types of hypothesis tests,

• Parametric
• Tests – Z, t, F
• Non-parametric
• Tests – chi-square
Significance level α
• The significance level, often denoted as alpha (α), is a threshold used in
hypothesis testing to determine whether the null hypothesis should be rejected.
• Mostly two values are considered for the level of significance, i.e. 5% and 1%.
Z-test
• It assumes normal distribution of data whose population variation is known.
• The sample size is assumed to be large.
• The focus is to test the population mean.
• The z-statistic is given as:

• X is the sample mean,

• n is sample size
• µ is mean of population
• σ is standard deviation of population
Examples
• A company has developed a vaccine that is supposed to increase immunity
level. The standard deviation of immunity level in the general population is
20. vaccine is tested on 40 patients and obtain a mean immunity level of
96.25. Using an alpha value of 0.05, is this immunity level significantly
different than the population mean of 100?
• The population of GATE scores are known to have a standard deviation of
9. The Engineering department hopes to receive applicants with a GATE
scores over 300. This year, the mean GATE scores for the 40 applicants was
303.8. Using a value of α = 0.05 is this new mean significantly greater than
the desired mean of 300?
• Swami Vivekanand school science teacher claims that students in his
section will score higher marks than those in his colleague’s section. The
mean science score for 60 students in his section is 22.1, and the standard
deviation is 4.8. The mean science score for 40 of the colleagues’ sections is
18.8, and the standard deviation is 8.1. At α = 0.05, can the teacher’s claim
be supported?
One Sample Test
• In this test, the mean of one group is checked against the set average
that can be either theoretical value or population mean.

• Here, t is t-statistic, x is the mean of the sample, µ is the theoretical

value or population mean
• sx is the standard deviation, and n is the sample size.
• Degree of freedom N-1.
Example
• The following data represents marks for 10 students 9.5, 10, 8, 7, 11, 7, 6.5, 8.5,
10.5, 12. Is the mean value for students significantly differ from the mean value of
general population (12). Evaluate the role of chance. ( α = 0.05 )
•
Independent Two Sample Test
• t-statistic for two groups A and B is computed as follows:

• Here, given mean(A) and mean(B) are for two different samples.
• N1 and N2 are sample sizes of two groups A and B.
• s² is the variance of the two samples and the degree of freedom is Here, given
N1+N2-2.
• Then, t-statistics compared with the t-critical value.
Chi- Square test
• Chi-Square test is a non-parametric test.
• It measures the statistical significance between observed frequency and expected
frequency, and each observation is independent of each other and follows normal
distribution.
• This comparison is used to calculate the value of the Chi-Square statistic as:

• E is the expected frequency, O is the observed frequency and the degree of

freedom is C-1, where, C is number of categories.
• The Chi-Square test allows us to detect the duplication of data and helps to
remove the redundancy of values.
Example
• Consider the following Table, where the machine learning course registration is
done by both boys and girls. There are 50 boys and 50 girls in the class and the
registration of the course is given in the table. Apply Chi-Square test and find out
whether any differences exist between boys and girls for course registration.
Concept learning

• Concept learning: is the process of acquiring knowledge about categories, ideas,

or things based on shared features.
• In the context of concept learning, "shared features" refer to the common
characteristics or attributes that are present in all instances of a category.
• Purpose: It helps to group similar objects, events, or ideas together and
distinguish them from other categories.
• Importance: It is essential for problem-solving, categorization, language learning.
• Concept learning requires three things:
• Input -Training dataset which is a set of training instances, each labeled with
the name of a concept or category to which it belongs.
• Output - Target concept or Target function f. It is a mapping function f(x) from
input x to output y. It is to determine the specific features or common features
to identify an object. In other words, it is to find the hypothesis to determine
the target concept. For e.g., the specific set of features to identify an elephant
from all animals.
• Test - New instances to test the learned model.
• Formally, It is defined as-"Given a set of hypotheses, the learner
searches through the hypothesis space to identify the best hypothesis
that matches the target concept".
Hypothesis space (H):
• Hypothesis space is defined as a set of all possible legal hypotheses;
hence it is also known as a hypothesis set. It is used by supervised
machine learning algorithms to determine the best possible hypothesis
to describe the target function or best maps input to output.
Searching the Hypothesis Space
• There are two ways of learning the hypothesis,
• Specialization - General to Specific learning
• This learning methodology will search through the hypothesis space for an
approximate hypothesis by specializing the most general hypothesis.
• Generalization - Specific to General learning
• This learning methodology will search through the hypothesis space for an
approximate hypothesis by generalizing the most specific hypothesis
General to Specific

• This approach starts with a general idea and then narrows down to
specific instances or observations.
• It is often associated with deductive reasoning, where you begin with
a general principle and apply it to a specific case.
• Example:
• General: "All birds have wings."
• Specific: "This animal is a bird, so it must have wings."
Specific to General

• This approach starts with specific observations or data and then make
broader generalization.
• It's associated with inductive reasoning, where conclusions are drawn
based on the analysis of specific instances or patterns.
• Example:
• Specific: "I have seen five different swans, and all of them are white."
• General: "All swans must be white."
Hypothesis Space Search by Find-S Algorithm
• The Find-S algorithm is a simple machine learning algorithm used for
concept learning.
• Find-S algorithm is initially starts with the most specific hypothesis.
• This algorithm considers only the positive instances and eliminates
negative instances while generating the hypothesis.
Bias and Variance

• In Machine Learning (ML), bias and variance are two fundamental

concepts that help us understand the performance of a model.
• Bias
• Variance
Bias
• Bias refers to the difference between the model's expected prediction
and the true value.
• Types of Bias:
• High Bias: The model is too simple and fails to capture the underlying
patterns in the data. This results in poor performance on both training and test
data. (underfitting)
• Low Bias: The model is complex and able to capture the underlying patterns
in the data. However, it may also capture noise in the data, leading to
overfitting.
Variance
• Variance refers to the variability of the model's predictions. It
measures how much the model's predictions change when it is trained
on different subsets of the data.
• Types of Variance:
• High Variance: The model is too complex capturing not just the
patterns but also the noise. This results in poor performance on test
data.
• Low Variance: The model is simple and fails to capture the
underlying patterns in the data. This results in poor performance on
both training and test data.
Trade-off between Bias and Variance
• There is a fundamental trade-off between bias and variance.
• As we increase the complexity of the model:
• Bias decreases (the model becomes less biased)
• Variance increases (the model becomes more prone to overfitting)
• Conversely, as we decrease the complexity of the model:
• Bias increases (the model becomes more biased)
• Variance decreases (the model becomes less prone to overfitting)

Mayankjot Singh F2020STAT213: Answer(s) Submitted: - 18 - 2 - 60 - 12 - 68 - 12
No ratings yet
Mayankjot Singh F2020STAT213: Answer(s) Submitted: - 18 - 2 - 60 - 12 - 68 - 12
3 pages
Chapter 9: Continuous Probability Distributions: Ms. Amna Riaz
100% (1)
Chapter 9: Continuous Probability Distributions: Ms. Amna Riaz
10 pages
Notes
No ratings yet
Notes
12 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
C207 Study Guide
No ratings yet
C207 Study Guide
27 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
ADS_M1_02
No ratings yet
ADS_M1_02
16 pages
Statistics: - MACHINE LEARNING - Exciting!
No ratings yet
Statistics: - MACHINE LEARNING - Exciting!
18 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
unit1
No ratings yet
unit1
78 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
15 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages
DJ 14 Ai&ds 3
No ratings yet
DJ 14 Ai&ds 3
20 pages
Glossary for AI ML and Statistics
No ratings yet
Glossary for AI ML and Statistics
85 pages
Data Science and Visualization
No ratings yet
Data Science and Visualization
37 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
PPT CH 1 PR Ir
No ratings yet
PPT CH 1 PR Ir
48 pages
AS Level Mathematics Statistics (New)
No ratings yet
AS Level Mathematics Statistics (New)
49 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
Maths Roadmap For Machine Learning - Statistics
No ratings yet
Maths Roadmap For Machine Learning - Statistics
5 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
PR FInal Theory
No ratings yet
PR FInal Theory
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
ML2_Math_Algo
No ratings yet
ML2_Math_Algo
72 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
No ratings yet
Glossary of Common Machine Learning, Statistics and Data Science Terms - Analytics Vidhya
54 pages
Ann
No ratings yet
Ann
88 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
Stat Reviewer
No ratings yet
Stat Reviewer
4 pages
L2_ Mathematical Preliminaries
No ratings yet
L2_ Mathematical Preliminaries
41 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
Principles of Statistical Inference
100% (10)
Principles of Statistical Inference
236 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
L2 - Mathematical Preliminaries.
No ratings yet
L2 - Mathematical Preliminaries.
42 pages
ML Academy - Part II
No ratings yet
ML Academy - Part II
8 pages
Data Science - UNIT-2 - Notes
No ratings yet
Data Science - UNIT-2 - Notes
13 pages
Statss
No ratings yet
Statss
4 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Origin of Statistics
No ratings yet
Origin of Statistics
4 pages
1.4
No ratings yet
1.4
98 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
BSM with SPSS[1]
No ratings yet
BSM with SPSS[1]
90 pages
Unit 4
No ratings yet
Unit 4
5 pages
FIN10002 - Notes Master
No ratings yet
FIN10002 - Notes Master
44 pages
Introduction to probabilty, statistics and data exploration
No ratings yet
Introduction to probabilty, statistics and data exploration
31 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Week 10 - Statistics, Random Sampling, Point Estimation
No ratings yet
Week 10 - Statistics, Random Sampling, Point Estimation
14 pages
Unit 2
No ratings yet
Unit 2
25 pages
MLCourseSlides
No ratings yet
MLCourseSlides
427 pages
Unit 1
No ratings yet
Unit 1
21 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Term End Model Examination Question Paper - Fall - 2011-12: Use of The Statistical Tables Is Permitted
No ratings yet
Term End Model Examination Question Paper - Fall - 2011-12: Use of The Statistical Tables Is Permitted
4 pages
Slide 11
No ratings yet
Slide 11
32 pages
1 Continuous-Time Markov Chains
No ratings yet
1 Continuous-Time Markov Chains
26 pages
18 Mat 412
No ratings yet
18 Mat 412
4 pages
Fall 2023 Course 6.30 - 6.70 - Electrical Engineering and Computer Science
No ratings yet
Fall 2023 Course 6.30 - 6.70 - Electrical Engineering and Computer Science
13 pages
Self Quiz U5
No ratings yet
Self Quiz U5
7 pages
LHS
No ratings yet
LHS
4 pages
Problems On Poisson Distribution
100% (1)
Problems On Poisson Distribution
4 pages
Unit 4 Random Number Generators
No ratings yet
Unit 4 Random Number Generators
25 pages
Mutual Information
No ratings yet
Mutual Information
3 pages
Discrete Random Variables Excel PDF
No ratings yet
Discrete Random Variables Excel PDF
11 pages
3 ARIMA Models - 3.1 Autoregressive Moving Average Models
No ratings yet
3 ARIMA Models - 3.1 Autoregressive Moving Average Models
37 pages
Sample Midterm Exam
No ratings yet
Sample Midterm Exam
5 pages
Cvar Var PPT Unlocked
No ratings yet
Cvar Var PPT Unlocked
42 pages
Instant Access to Introduction To Stochastic Calculus With Applications 3Rd Edition Fima C Klebaner ebook Full Chapters
No ratings yet
Instant Access to Introduction To Stochastic Calculus With Applications 3Rd Edition Fima C Klebaner ebook Full Chapters
50 pages
Final PHM113s - Modelanswer
No ratings yet
Final PHM113s - Modelanswer
4 pages
Instant Download (Ebook) Applied Engineering Statistics by R. Russell Rhinehart, Robert M. Bethea ISBN 9781032119489, 1032119489 PDF All Chapters
100% (10)
Instant Download (Ebook) Applied Engineering Statistics by R. Russell Rhinehart, Robert M. Bethea ISBN 9781032119489, 1032119489 PDF All Chapters
81 pages
Prob Best-2 PDF
No ratings yet
Prob Best-2 PDF
181 pages
Conjugate Prior
No ratings yet
Conjugate Prior
5 pages
Chi Square Test
No ratings yet
Chi Square Test
23 pages
Spe 125472 Pa
No ratings yet
Spe 125472 Pa
6 pages
Probability and Statistics Ver.6 - May2013 PDF
100% (1)
Probability and Statistics Ver.6 - May2013 PDF
129 pages
Statistics - Dispersion - Week 4
No ratings yet
Statistics - Dispersion - Week 4
4 pages
Solution manual for Introduction to Econometrics 3rd edition by James H Stock Mark W Watson James H Stock Mark W Watson pdf download
100% (2)
Solution manual for Introduction to Econometrics 3rd edition by James H Stock Mark W Watson James H Stock Mark W Watson pdf download
31 pages
F 18 Final
No ratings yet
F 18 Final
16 pages
Chapter 1 - Estimation Theory
No ratings yet
Chapter 1 - Estimation Theory
166 pages
4.2 Expected Value and Variance of Continuous Random Variables
No ratings yet
4.2 Expected Value and Variance of Continuous Random Variables
2 pages

U3 Prob & Stat & Hypo

Uploaded by

U3 Prob & Stat & Hypo

Uploaded by

Probability, Statistics

• All facts are data.

• In Big Data, there are three kinds of data.

• There are four main Big Data architecture layers:

• For training the model, data needs to be divided into 3- parts:

• Data types are classified as:

• Univariate analysis has only one variable.

• Popular measures of central tendency are

• Statistics - In Machine learning statistics is used to analyze data. It

• A random variable ‘X’ is a process by which a (real) number

• is called mean square value of random variable X.

• The CDF of a RV is defined as the probability that the RV

• Probability density function PDF is defined by

• The area under the PDF curve is always unity

• The peak value occurs at x = m (i.e. mean value).

• Marginal probability is the probability of an event irrespective of the

• Find the probability that employee should rank 1 officer given he is

• What is the probability that a patient has diseases meningitis with a

• There are two types of hypothesis tests,

• X is the sample mean,

• Here, t is t-statistic, x is the mean of the sample, µ is the theoretical

• E is the expected frequency, O is the observed frequency and the degree of

• Concept learning: is the process of acquiring knowledge about categories, ideas,

• In Machine Learning (ML), bias and variance are two fundamental

You might also like