0% found this document useful (0 votes)
105 views

Data Science Imp Questions and Answers

1. Joint probability is the probability that two random variables take on specific values at the same time. It is represented by P(X=x, Y=y) and describes the bivariate probability distribution between two random variables. 2. A probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a given value, while a cumulative distribution function (CDF) describes the probability that a random variable is less than or equal to a particular value. 3. Expected value is the average or mean (μ) of a random variable and represents the value we expect the variable to take on average over many trials. Variance measures how far values of a random variable are from the expected
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Data Science Imp Questions and Answers

1. Joint probability is the probability that two random variables take on specific values at the same time. It is represented by P(X=x, Y=y) and describes the bivariate probability distribution between two random variables. 2. A probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a given value, while a cumulative distribution function (CDF) describes the probability that a random variable is less than or equal to a particular value. 3. Expected value is the average or mean (μ) of a random variable and represents the value we expect the variable to take on average over many trials. Variance measures how far values of a random variable are from the expected
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Science imp questions and answers

CO-3

1) What is joint probability and examples.?


A) The joint probability mass function of the discrete random variable x and y
denoted as fxy(x y) satisifiies
* fxy(x,y) grearter than orequal to 0
* sigma x sigma y fxy(x,y) =1
* fxy(x,y) =p(X=x, Y=y)
The joint probability distribution of two random variables =bivariate probability
distribution.
The joint probability distribution of two discrete random variables is usually
written as P(X=x, Y=y).

2) what is PDF and CDF.?


A) PDF:
• Example: Suppose you have variable x which is continuous random variable.
So x is variable which is typically tells you the bus travel time from
Bangalore to Hyderabad is 10 to 15 hours.
1) What is the probability that bus can reach next day in exactly 13 hours?
0(Values should be in range)
2. What is the probability that bus can reach next day in 11 -13hours
CDF: CDF(Cumulative Distribution Function)
We have seen how to describe distributions for discrete and continuous random
variables.Now what for both:
CDF is a concept which is used for describing the distribution of random variables
either it is continuous or discrete.It is used to tell how much percentage of value
is less than a particular value.
For Example : Lets take age variable from haberman dataset and now what i am
writing is P(age=50) = 0.60.What it means that 60% of patients are less than age
of 50 in dataset.
3) functions of random varable.?

4) Expected value of a random variable.?


• Expected value is just the average or mean (µ) of random variable x.
• It’s sometimes called a “weighted average” because more frequent values
of X are weighted more highly in the average.
• It’s also how we expect X to behave on-average over the long run
(“frequentist” view again).
5) What is Variance & sum of variance in random variable .?

 The variance of a random variable X is a measure of how spread


outit is. Are the values of X clustered tightly around their mean, or
can we commonly observe values of X a long way from the mean
value?

 The variancemeasures how far the values of X are from their mean,
on average.

 If X has high variance, we can observe values of X a long way from


the mean.

 If X has low variance, the values of X tend to be clustered tightly


around the mean value.
6) Properties of covariance.?
7) what is covariance.?
 Covariance signifies the direction of the linear relationship between the
two variables. By direction we mean if the variables are directly
proportional or inversely proportional to each other. (Increasing the value
of one variable might have a positive or a negative impact on the value of
the other variable
 The values of covariance can be any number between the two opposite
infinities. Also, it’s important to mention that covariance only measures
how two variables change together, not the dependency of one variable
on another one.
 The value of covariance between 2 variables is achieved by taking the
summation of the product of the differences from the means of the
variables as follows:

• Xᵢ= Observation point of variable X


• x̅= Mean of all observations(X)
• Yᵢ= Observation point of variable Y
• ȳ = Mean of all observations(Y)
• n= Number of observations
Example:
• Following data shows the number of customers with their corresponding
temperature
• Mean of X, x̅ = (97+86+89+84+94+74)/6 = 524/6= 87.333
• Mean of Y, Ȳ = (14+11+9+9+15+7)/6 = 65/6= 10.833

8) What is co relation and examples of co realation.?


 Correlation analysis is a method of statistical evaluation used to study the
strength of a relationship between two, numerically measured,
continuous variables.
 It not only shows the kind of relation (in terms of direction) but also how
strong the relationship is. Thus, we can say the correlation values have
standardized notions, whereas the covariance values are not standardized
and cannot be used to compare how strong or weak the relationship is
because the magnitude has no direct significance. It can assume values
from -1 to +1.
 To determine whether the covariance of the two variables is large or
small, we need to assess it relative to the standard deviations of the two
variables.
 For example: Sales might increase if lot of money is spent on product
marketing.
 Why it is useful?
 1. If two variables are closely correlated, then we can predict one variable
from the other.
 2. Correlation plays a vital role in locating the important variables on
which other variables depend.
 3. It’s used as the foundation for various modeling techniques.
 4. Proper correlation analysis leads to better understanding of data.
 5. Correlation contribute towards the understanding of causal
relationship(if any).
• OV(x, y) = 22.46
• σx = 331.28/5=66.25= 8.13
• σy = 48.78/5=9.75=3.1
• correlation = 22.46/(8.13x 3.1)= 22.46/25.20 =0.8
• 0.8 shows that strength of the correlation between temperature and
number of customers is very strong

CO-4

You might also like