1. Joint probability is the probability that two random variables take on specific values at the same time. It is represented by P(X=x, Y=y) and describes the bivariate probability distribution between two random variables.
2. A probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a given value, while a cumulative distribution function (CDF) describes the probability that a random variable is less than or equal to a particular value.
3. Expected value is the average or mean (μ) of a random variable and represents the value we expect the variable to take on average over many trials. Variance measures how far values of a random variable are from the expected
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
105 views
Data Science Imp Questions and Answers
1. Joint probability is the probability that two random variables take on specific values at the same time. It is represented by P(X=x, Y=y) and describes the bivariate probability distribution between two random variables.
2. A probability density function (PDF) describes the relative likelihood of a continuous random variable taking on a given value, while a cumulative distribution function (CDF) describes the probability that a random variable is less than or equal to a particular value.
3. Expected value is the average or mean (μ) of a random variable and represents the value we expect the variable to take on average over many trials. Variance measures how far values of a random variable are from the expected
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13
Data Science imp questions and answers
CO-3
1) What is joint probability and examples.?
A) The joint probability mass function of the discrete random variable x and y denoted as fxy(x y) satisifiies * fxy(x,y) grearter than orequal to 0 * sigma x sigma y fxy(x,y) =1 * fxy(x,y) =p(X=x, Y=y) The joint probability distribution of two random variables =bivariate probability distribution. The joint probability distribution of two discrete random variables is usually written as P(X=x, Y=y).
2) what is PDF and CDF.?
A) PDF: • Example: Suppose you have variable x which is continuous random variable. So x is variable which is typically tells you the bus travel time from Bangalore to Hyderabad is 10 to 15 hours. 1) What is the probability that bus can reach next day in exactly 13 hours? 0(Values should be in range) 2. What is the probability that bus can reach next day in 11 -13hours CDF: CDF(Cumulative Distribution Function) We have seen how to describe distributions for discrete and continuous random variables.Now what for both: CDF is a concept which is used for describing the distribution of random variables either it is continuous or discrete.It is used to tell how much percentage of value is less than a particular value. For Example : Lets take age variable from haberman dataset and now what i am writing is P(age=50) = 0.60.What it means that 60% of patients are less than age of 50 in dataset. 3) functions of random varable.?
4) Expected value of a random variable.?
• Expected value is just the average or mean (µ) of random variable x. • It’s sometimes called a “weighted average” because more frequent values of X are weighted more highly in the average. • It’s also how we expect X to behave on-average over the long run (“frequentist” view again). 5) What is Variance & sum of variance in random variable .?
The variance of a random variable X is a measure of how spread
outit is. Are the values of X clustered tightly around their mean, or can we commonly observe values of X a long way from the mean value?
The variancemeasures how far the values of X are from their mean, on average.
If X has high variance, we can observe values of X a long way from
the mean.
If X has low variance, the values of X tend to be clustered tightly
around the mean value. 6) Properties of covariance.? 7) what is covariance.? Covariance signifies the direction of the linear relationship between the two variables. By direction we mean if the variables are directly proportional or inversely proportional to each other. (Increasing the value of one variable might have a positive or a negative impact on the value of the other variable The values of covariance can be any number between the two opposite infinities. Also, it’s important to mention that covariance only measures how two variables change together, not the dependency of one variable on another one. The value of covariance between 2 variables is achieved by taking the summation of the product of the differences from the means of the variables as follows:
• Xᵢ= Observation point of variable X
• x̅= Mean of all observations(X) • Yᵢ= Observation point of variable Y • ȳ = Mean of all observations(Y) • n= Number of observations Example: • Following data shows the number of customers with their corresponding temperature • Mean of X, x̅ = (97+86+89+84+94+74)/6 = 524/6= 87.333 • Mean of Y, Ȳ = (14+11+9+9+15+7)/6 = 65/6= 10.833
8) What is co relation and examples of co realation.?
Correlation analysis is a method of statistical evaluation used to study the strength of a relationship between two, numerically measured, continuous variables. It not only shows the kind of relation (in terms of direction) but also how strong the relationship is. Thus, we can say the correlation values have standardized notions, whereas the covariance values are not standardized and cannot be used to compare how strong or weak the relationship is because the magnitude has no direct significance. It can assume values from -1 to +1. To determine whether the covariance of the two variables is large or small, we need to assess it relative to the standard deviations of the two variables. For example: Sales might increase if lot of money is spent on product marketing. Why it is useful? 1. If two variables are closely correlated, then we can predict one variable from the other. 2. Correlation plays a vital role in locating the important variables on which other variables depend. 3. It’s used as the foundation for various modeling techniques. 4. Proper correlation analysis leads to better understanding of data. 5. Correlation contribute towards the understanding of causal relationship(if any). • OV(x, y) = 22.46 • σx = 331.28/5=66.25= 8.13 • σy = 48.78/5=9.75=3.1 • correlation = 22.46/(8.13x 3.1)= 22.46/25.20 =0.8 • 0.8 shows that strength of the correlation between temperature and number of customers is very strong