GRE Data Analysis | Distribution of Data, Random Variables, and Probability Distributions
Last Updated :
19 Jun, 2019
Distribution of Data:
The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur, we can think of a distribution as a function that describes the relationship between observations in a sample space.
Example:
The lifetimes of 800 electric devices were measured. Because the lifetimes had many different values, the measurements were grouped into 50 intervals, or classes, of 10 hours each:
601 to 610 hours, 611 to 620 hours, and so on, up to 1, 091 to 1, 100 hours. The resulting relative frequency distribution, as a histogram, has 50 thin bars and many different bar heights, as shown in Data Analysis Figure below.
Relative frequency is how often something happens divided by all outcomes. As an example here, it can be considered as the number of electric devices having lifetime of (Ex 601 to 610) divided by the total devices.
In the histogram, the
median is represented by M, the
mean is represented by m, and the
standard deviation is represented by d.
- The median, represented by M, is between 730 and 740
- The mean, represented by m, is between 750 and 760
- The sum of areas of all 50 bars of relative frequency is 1
Histograms that represent very large data sets grouped into many classes have a relatively smooth appearance. Consequently, the distribution can be modeled by a smooth curve that is close to the tops of the bars. This curve is called a distribution curve.
The purpose of the distribution curve is to give a good illustration of a large distribution of numerical data that does not depend on specific classes. Property of distribution curve is that the area under the curve in any vertical slice, just like a histogram bar, represents the proportion of the data that lies in the corresponding interval on the horizontal axis.
A random variable can map each value from sample space to a real number and moreover sum of values from real number is always equal to 1
Example:
In an experiment three fair coins are tossed, then sample space is
S = { HHH, HHT, HTH, THH, HTT, TTH, THT, TTT}
Let variable X count the number of times head turns up, hence we call it as Random variable. Moreover random variable is generally represented by X.
Now,
X can take values
3, 2, 1, 0
P(X = 1) is probability of occurring head one time,
P(X = 1) = P(THT) + P(TTH) + P(HTT) = 3/8
Types of random variable:
- Discrete Random Variable:
A variable that can take one value from a discrete set of values.
Example:
Let x denote the sum of dice, Now x is discrete random variable as it can take one value from the set { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }, since the sum of two dice can only be one of these values.
- Continuous Random Variable:
A variable that can take one value from a continuous range of values.
Example:
x denotes the volume of water in a 500 ml cup. Now x may be a number from 0 to 500, any of which value, x may take.
Probability Distribution:
Probability distributions indicate the likelihood of an event or outcome.
P(x) = the likelihood that random variable takes a specific value of x.
Example:
In an experiment three fair coins are tossed, then sample space is,
S = {HHH, HHT, HTH, THH, HTT, TTH, THT, TTT}
X is random variable having values 3, 2, 1, 0 then
P(X = 0) = P(TTT) = 1/8
P(X = 1) = P(HTT) + P(TTH) + P(THT) = 3/8
P(X = 2) = P(HHT) + P(HTH) + P(THH) = 3/8
P(X = 3) = P(HHH) = 1/8
Therefore,
X (random variable) |
P(X) |
0 |
1/8 |
1 |
3/8 |
2 |
3/8 |
3 |
1/8 |
This table is called the probability distribution of random variable X.
Distribution can be divided into
2 types:
- Discrete distribution:
Based on discrete random variable, examples are Binomial Distribution, Poisson Distribution.
- Continuous distribution:
Based on continuous random variable, examples are Normal Distribution, Uniform Distribution, Exponential Distribution.
Probability Mass Function:
Let x be discrete random variable then its Probability Mass Function p(x) is defined such that
1. p(x)
0
2.
= 1
3. p(x) = P(X=x)
Probability Density Function:
Let x be continuous random variable then probability density function F(x) is defined such that
1. F(x)
0
2.
= 1
3. P(a < x < b) =
Properties of Discrete Distribution:
1.
= 1
2. E(x) =
3. V(x) =
Properties of Continuous Distribution:
1.
= 1
2. E(x) =
3. V(x) =
4. p(a < x < b) =
Where,
E(x) denotes expected value or average value of the random variable x,
V(x) denotes the variance of the random variable x.
Types of Distributions:
Similar Reads
Real Life Applications of Continuous Probability Distribution
A Continuous Probability Distribution is a statistical concept that describes the probability distribution of a continuous random variable. It specifies the probabilities associated with various outcomes or values that the random variable can take within a specified range. In this article, we'll loo
13 min read
GRE Data Analysis | Numerical Methods for Describing Data
Data can be described numerically by various statistics, or statistical measures. These statistical measures are often grouped in 3 categories: 1. Measures of central tendency 2. Measures of position 3. Measures of dispersion Measures Of Central Tendency: In statistics, a central tendency (or measur
7 min read
How to Find Standard Deviation of Probability Distribution
Standard Deviation is a measure in statistics that determines the amount of variability or dispersion in a set of values. Understanding how to calculate the standard deviation is useful for analyzing the variability or spread of random variables in probability distributions. In this article, we will
10 min read
Probability Distribution - Function, Formula, Table
A probability distribution describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a frequency distribution shows how often outcomes occur in a sample or
15+ min read
Poisson Distribution | Definition, Formula, Table and Examples
The Poisson distribution is a discrete probability distribution that calculates the likelihood of a certain number of events happening in a fixed time or space, assuming the events occur independently and at a constant rate. It is characterized by a single parameter, λ (lambda), which represents the
11 min read
Poisson Distribution : Meaning, Characteristics, Shape, Mean and Variance
What is Poisson Distribution?Poisson distribution describes the likelihood of a certain number of events occurring within a given time frame. It applies to situations where events happen independently and at a constant average rate. This distribution proves useful when numerous trials exist, each wi
7 min read
Probability Distribution Function
Probability Distribution refers to the function that gives the probability of all possible values of a random variable.It shows how the probabilities are assigned to the different possible values of the random variable.Common types of probability distributions Include: Binomial Distribution.Bernoull
9 min read
How to create a plot of cumulative distribution function in R?
Empirical distribution is a non-parametric method used to estimate the cumulative distribution function (CDF) of a random variable. It is particularly useful when you have data and want to make inferences about the population distribution without making any assumptions about its form. In this articl
4 min read
GRE Data Analysis | Probability
Probability is a numerical representation of the chance of occurrence of a particular event. Here the event is the word used to describe any particular set of the outcome. [Tex]\text{Probability} = \frac{\textup{Number of success}}{\textup{Total number of possibility}} [/Tex]For Example, when a coin
3 min read
Binomial Distribution in Probability
Binomial Distribution is a probability distribution used to model the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes: success or failure. This distribution is useful for calculating the probability of a specific number of successes in sce
15 min read