Statistics 091147
Statistics 091147
Outlines
• Statistics
• Types of statistics
• Population and sample
• Types of sampling
• Simple random sampling
• Stratified sampling
• Systematic sampling
• Convenience sampling
Statistics
• Hypothesis testing
• E.g., Test the claim that the population mean weight is 120 pounds
Mean
Median
Mode
Central tendency
Mean: Influenced by outliers
Median: It is not affected by extreme values (outliers) and is a
useful measure for skewed distributions.
Mode: The mode is particularly useful for categorical or discrete
data
Note: The range describes the spread of the whole set of data, whilst the
interquartile range describes the spread of the middle set of data.
Range is greatly affected by outliers (extreme results in the data), where the
interquartile range is not.
Boxplot
Boxplot
• The minimum is found at the position of the first line at 5
• The maximum is found at the position of the last line at 25
• The lower quartile (Q1) is found at the position of the start of the
box at 10
• The upper quartile (Q3) is found at the position of the end of the
box at 20
• The median (Q2) is found at the position of the line inside the box
at 18
Boxplot
Boxplot
Summary
• Step 1. Put the numbers in order from smallest to largest
• Step 2. The minimum is the smallest number in the list
• Step 3. The maximum is the largest number in the list
• Step 4. The median is found in the middle of the list
• Step 5. The lower quartile is the median of the first half of the data.
• Step 6. The upper quartile is the median of the second half of the
data
Outlier detection
Outlier detection
Outlier detection
Covariance
• Covariance is a statistical concept that measures the degree to which
two random variables change together.
• It's often used to understand the direction of linear relationship
between two variables.
• It is often denoted by r
Pearson Correlation Coefficient
• Positive Correlation (r>0): A positive value of r indicates a positive
linear relationship between the variables. As one variable increases,
the other tends to increase as well. The closer r is to 1, the stronger
the positive correlation.
• Negative Correlation (r<0): A negative value of r indicates a negative
linear relationship between the variables. As one variable increases,
the other tends to decrease. The closer r is to -1, the stronger the
negative correlation.
• No Correlation (r≈0): A correlation coefficient close to 0 suggests little
to no linear relationship between the variables.
Pearson Correlation Coefficient
Pearson Correlation Coefficient
• A distribution curve has negative and positive sides, so there are positive
and negative standard deviations and z-scores.
• A negative value means it is on the left of the mean, and a positive value
indicates it is on the right.