0% found this document useful (0 votes)
4 views

Angilan, Ef

The document provides an overview of statistical concepts, including measures of central tendency (mean, median, mode), point measures (variance, standard deviation, percentiles), and measures of dispersion (range, quartile deviation). It differentiates between population and sample, descriptive and inferential statistics, as well as parametric and nonparametric tests. Each concept is explained with definitions, formulas, and examples to illustrate their application in data analysis.

Uploaded by

angilaneung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Angilan, Ef

The document provides an overview of statistical concepts, including measures of central tendency (mean, median, mode), point measures (variance, standard deviation, percentiles), and measures of dispersion (range, quartile deviation). It differentiates between population and sample, descriptive and inferential statistics, as well as parametric and nonparametric tests. Each concept is explained with definitions, formulas, and examples to illustrate their application in data analysis.

Uploaded by

angilaneung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Eurelyn Fe B.

Angilan

A. Identify and describe the following:

1. Measure of central tendency- Measures of central tendency are statistical values that
describe the center or typical value of a dataset. The most common measures are mean,
median, and mode:

1. Mean (Arithmetic Average):


o The mean is calculated by summing all the values in a dataset and then
dividing by the number of values.
o Formula: Mean=n∑x where x is each value and n is the number of values.
o Example: In the dataset [3,5,7,9] [3, 5, 7, 9] [3,5,7,9], the mean is
3+5+7+94=6\frac {3+5+7+9}{4} = 643+5+7+9=6.
o Advantages:
▪ Uses all data points.
▪ Well-suited for symmetrical distributions.
o Disadvantages:
▪ Sensitive to extreme values (outliers).
2. Median (Middle Value):
o The median is the middle value when the data is ordered from least to
greatest. If the dataset has an even number of values, the median is the
average of the two middle values.
o Example: In the dataset [3,5,7,9] [3, 5, 7, 9] [3,5,7,9], the median is
5+72=6\frac {5+7}{2} = 625+7=6. For [1,3,5,7,9] [1, 3, 5, 7, 9] [1,3,5,7,9], the
median is 5.
o Advantages:
▪ Not affected by outliers or extreme values.
▪ Suitable for skewed distributions.
o Disadvantages:
▪ Does not use all data points.
3. Mode (Most Frequent Value):
o The mode is the value that occurs most frequently in a dataset. There can
be more than one mode (bimodal, multimodal) or no mode if no value
repeats.
o Example: In the dataset [2,3,3,5,7,9] [2, 3, 3, 5, 7, 9][2,3,3,5,7,9], the mode
is 3.
o Advantages:
▪ Useful for categorical data.
▪ Identifies the most common value.
o Disadvantages:
▪ May not exist in all datasets or may not be unique.

Each measure of central tendency provides a different perspective on the data,


and their usefulness depends on the nature and distribution of the dataset.
2. Point measures - Point measures refer to specific statistics or values that
summarize a dataset or provide information about a particular aspect of it. These
measures provide single, concise values that describe various properties of the
data. Here are some common point measures used in statistics:

1. Mean (Arithmetic Mean)


• A measure of central tendency that represents the average value of a dataset.
• Calculated by summing all the values and dividing by the number of values.

• Example: For the dataset [2,4,6,8,10], the mean is = 652+4+6+8+10=6.


2. Median
• The middle value when a dataset is ordered. If there are an even number of values,
the median is the average of the two middle values.
• It indicates the central point and is less sensitive to outliers.
• Example: In the dataset [3,5,7,9] the median is 6.

3. Mode
• The value that occurs most frequently in a dataset. A dataset may have one mode,
more than one mode (bimodal or multimodal), or no mode if all values occur with
the same frequency.
• Example: In the dataset [1,2,2,3], the mode is 2.

4. Range
• A measure of dispersion that represents the difference between the maximum and
minimum values in a dataset.

• Formula: Range=Maximum Value−Minimum value


• Example: For the dataset [2,4,6,8,10], the range is 10−2=8.
5. Variance

• A measure of how much the data points vary from the mean. Variance is the
average of the squared differences from the mean.
• Formula: Variance(σ2)=n∑(x−μ)2

• Example: In a dataset [1,2,3],the variance is calculated as follows:


First, calculate the mean, which is 2. Then, the squared deviations from the mean
are (1−2)2,(2−2)2,(3−2)2. Summing these and dividing by 3 gives the variance.

6. Standard Deviation
• The square root of the variance, representing how spread out the values are from
the mean. It is expressed in the same units as the original data, making it easier
to interpret.
• Formula: Standard Deviation(σ)=n∑(x−μ)2
• Example: If the variance of a dataset is 4, the standard deviation is 4=2\sqrt{4} =
24=2.
7. Percentile
• The value below which a given percentage of observations in a dataset fall. For
example, the 25th percentile is the value below which 25% of the data fall.

• Example: If a test score of 75 is at the 90th percentile, it means that 90% of the
students scored below 75.
8. Z-score (Standard Score)

• Represents how many standard deviations a data point is from the mean. It is used
to standardize data and identify outliers.
• Formula: Z=σx−μ where x is the data point, μ is the mean, and σ is the standard
deviation.
• Example: If a data point of 101010 has a mean of 555 and a standard deviation of
222, its Z-score is 10−52=2.5\frac{10-5}{2} = 2.5210−5=2.5.
9. Interquartile Range (IQR)
• A measure of statistical dispersion representing the difference between the 75th
and 25th percentiles (Q3 - Q1).

• Example: If the 25th percentile of a dataset is 30 and the 75th percentile is 70, the
IQR is 70−30=4070 - 30 = 4070−30=40.

10. Skewness
• A measure of the asymmetry of the probability distribution of a real-valued random
variable. Positive skewness indicates a long right tail, while negative skewness
indicates a long-left tail.
• Example: If a dataset has more small values with a few large values, it would have
positive skewness.

These point measures summarize various aspects of a dataset, such as its central
location, spread, and shape.

3. Measures of dispersion- A measure of dispersion indicates the scattering of data. It


explains the disparity of data from one another, delivering a precise view of their
distribution. The measure of dispersion displays and gives us an idea about the variation
and the central value of an individual item.
The variation can be measured in different numerical measures, namely:
(i) Range: It is the simplest method of measurement of dispersion and defines the
difference between the largest and the smallest item in a given distribution. If Y max and
Y min are the two ultimate items, then
Range = Y max – Y min

(ii) Quartile deviation: It is known as semi-interquartile range, i.e., half of the difference
between the upper quartile and lower quartile. The first quartile is derived as Q, the middle
digit Q1 connects the least number with the median of the data. The median of a data set
is the (Q2) second quartile. Lastly, the number connecting the largest number and the
median is the third quartile (Q3). Quartile deviation can be calculated by
Q = ½ × (Q3 – Q1)
(iii) Mean deviation: Mean deviation is the arithmetic mean (average) of
deviations ⎜D⎜of observations from a central value (mean or median).

Mean deviation can be evaluated by using the formula: A = 1⁄n [∑i|xi – A|]
(iv) Standard deviation: Standard deviation is the square root of the arithmetic average
of the square of the deviations measured from the mean. The standard deviation is given
as, σ = [(Σi (yi – ȳ) ⁄ n] ½ = [(Σ i yi 2 ⁄ n) – ȳ 2] ½ Apart from a numerical value, graphics
methods are also applied for estimating dispersion.
Types of Measures of Dispersion

(1) Absolute measures


• Absolute measures of dispersion are expressed in the unit of variable itself, like
kilograms, rupees, centimeters, marks,

(2) Relative measures


• Relative measures of dispersion are obtained as ratios or percentages of the
average.

• These are also known as coefficients of dispersion.


• These are pure numbers or percentages that are totally independent of the units.

B. Differentiate the following:


1. Population and sample - A population is the entire group that you want to draw
conclusions about. A sample is the specific group that you will collect data from. The size
of the sample is always less than the total size of the population. In research, a
population doesn’t always refer to people. It can mean a group containing elements of
anything you want to study, such as objects, events, organizations, countries, species,
organisms, etc. Populations are used when your research question requires, or when you
have access to, data from every member of the population. Usually, it is only
straightforward to collect data from a whole population when it is small, accessible and
cooperative. When your population is large in size, geographically dispersed, or difficult
to contact, it’s necessary to use a sample. With statistical analysis, you can use sample
data to make estimates or test hypotheses about population data.
2. Descriptive and inferential statistics- The key difference between descriptive and
inferential statistics is descriptive statistics aren’t used to make an inference about a
broader population, whereas inferential statistics are used for this purpose. Rather than
being used to report on the data set itself, inferential statistics are used to generate
insights across vast data sets that would be difficult or impossible to analyze. Essentially,
descriptive statistics state facts and proven outcomes from a population, whereas
inferential statistics analyze samplings to make predictions about larger populations.
In the example of a clinical drug trial, the percentage breakdown of side effect frequency
and the mean age represents statistical measures of central tendency and normal
distribution within that data set. However, inferential statistics methods could be applied
to draw conclusions about how such side effects occur among patients taking this
medication. The resulting inferential statistics can help doctors and patients understand
the likelihood of experiencing a negative side effect, based on how many members of the
sample population experienced it.
Since descriptive statistics focus on the characteristics of a data set, the certainty level is
very high. Outliers and other factors may be excluded from the overall findings to ensure
greater accuracy, but calculations are often much less complex and can result in solid
conclusions. However, inferential statistics are designed to test for a dependent variable
— namely, the population parameter or outcome being studied — and may involve
several variables. The calculations are more advanced, but the results are less certain.
There will be a margin of error as well. After all, inferential statistics are more like highly
educated guesses than assertions. A sampling error may skew the findings, although a
variety of statistical methods can be applied to minimize problematic results.
3. Parametric and nonparametric - Parametric tests are those that make assumptions
about the parameters of the population distribution from which the sample is drawn. This
is often the assumption that the population data are normally distributed. Non-parametric
tests are “distribution-free” and, as such, can be used for non-Normal variables.

You might also like