0% found this document useful (0 votes)
19 views

Week 5 - Result and Analysis 1 (UP)

The document provides an overview of statistical analysis and data presentation techniques. It discusses descriptive statistics such as measures of central tendency (mean, median, mode) and variation (range, variance, standard deviation). It also covers inferential statistics, graphical presentation methods like histograms and box plots, and key statistical concepts like the normal distribution and confidence intervals.

Uploaded by

eddy siregar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Week 5 - Result and Analysis 1 (UP)

The document provides an overview of statistical analysis and data presentation techniques. It discusses descriptive statistics such as measures of central tendency (mean, median, mode) and variation (range, variance, standard deviation). It also covers inferential statistics, graphical presentation methods like histograms and box plots, and key statistical concepts like the normal distribution and confidence intervals.

Uploaded by

eddy siregar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

RESULT AND ANALYSIS

INTRODUCTION

• Data analysis is about manipulating and presenting results


• Data need to be organised, summarised, and analysed in order to draw/infer
conclusion
• Commonly used approaches or tools
a. Statistics
b. Models
c. Standards

STATISTICAL ANALYSIS

Statistics is indeed used in everyday life


e.g.,
a. A typical one-a-day vitamin pill boosted certain immune responses in older
people by 64%.

b. Rain covering 30 to 40% of Southern Johor in the late afternoon

c. Of 1000 households polled nationwide, 40% said they owned at least one
cordless phone, 9% had two or more.

Statistics is the science of conducting studies to collect, organise, summarize


and draw conclusions from data

Two main areas of statistics


a. Descriptive statistics
b. Inferential statistics

Descriptive Statistics - use to describe situation. E.g Results of National Census


on average age, income, employment, level of education. Involved
∇ data collection
∇ organization
∇ summation

Inferential Statistics - to make inferences from sample to populations based on


probability theory
∇ generalizing from samples to populations

1
∇ performing hypothesis testing
∇ determining relationships among variables, and
∇ making predictions

A population consists of all subjects (human or otherwise) that are being studied

A sample is a subgroup of the population

GRAPHICAL PRESENTATION

• The three most commonly used graphs


i. Histogram
ii. Frequency Poligon
iii. Cumulative frequency graph or ogive – the sum of the frequencies
accumulated up to the upper boundary of a class in the distribution. The
graph represent how many values are below a certain class boundary

Other Types of Graph

a. Pareto Charts
b. Time Series
c. Pie

DATA DESCRIPTION

Three aspects
1. Measures of Central Tendency

Definition Symbol
Mean sum of values divided by total µ, x
number of value
Median Middle point in the data set MD
Mode Most frequent data value None
Midrange (Lowest value plus highest value)/2 MR

2
2. Measures of Variation.

Sometime the mean is not good enough to describe a data set as in the following
example.

Example: A testing lab wishes to test two experimental brands of outdoor paint to
see how long each would last before fading. Different chemical agents are added in
each group and only six cans are involved. These two groups constitute two small
populations. The results (in months) follow.

Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
Mean = 35 Mean 35

Note that Brand A and B gave similar means = 35. Thus one might conclude that
both brand of paint last equally well. But a different conclusion might be withdrawn
when the data set are examined graphically.

The range

for Brand A: 60-10 = 50 month


for Brand B: 45-25 = 20 month

Measures indicating the degree of spread/variation

Definition Symbols
Range distance between highest and lowest value R
Variance average of the squares of the distance each σ , s2
2

value id from the mean


Standard Square root of the variance σ, s
Deviation

3
Population variance (σ2)

∑ ( X − µ )2
σ2 =
N

Where
X = individual value
µ = population mean
N = population size

Population Std. Deviation (σ)

∑ ( X − µ )2
σ = σ2 =
N

The unbiased estimator of the population variance is a statistic whose value


approximates the expected value of a population variance. Denoted by s2.

s 2
=
(
∑ X −X )2
n −1

Sample standard deviation

s = s2 =
(
∑ X−X )2
n −1

Where:
X = individual value
X = sample mean
n = sample size

Coefficient of variation is the standard deviation divided by the mean. The result is
expressed as a percentage.

For sample

s
CVar = .100%
X
4
For populations

σ
CVar = .100%
µ

3. Measure of Position

Definition Symbol
Standard score or z Number of standard deviation a data z
score value is above or below the mean
Percentile Position in hundredths a data value is Pn
in the distribution
Decile Position in tenths a data values is in Dn
the distribution
Quartile Position in fourths a data value is in Qn
the distribution

EXPLORATORY DATA ANALYSIS (EDA)

To discover various aspects of data. In EDA data are organised to facilitate


further analysis

Common methods used

i. Stem and leaf plot


∇ a combination of sorting and graphing
∇ retain the actual data while showing them in graphic form

ii. Box Plots

∇ Graphically present data set


∇ Show five specific values
⇒ The lowest/minimum value
⇒ The lower hinge (LH)
⇒ The median
⇒ The upper hinge (UH)
⇒ The highest/maximum value

5
The lower hinge (LH) is the median of all value less than or equal to the median
when the data set has an odd number of values, or the median of all value less than
the median when the data set has an even number of values.

The upper hinge (UH) is the median of all value greater than or equal to the median
when the data set has an odd number of values, or as the median of all value
greater than the median when the data set has an even number of values.

Information Obtained from a Box Plot

a. If the median is near the center of the box, the distribution is


approximately symmetric
b. If the median falls to the left of the center of the box, the distribution is
positively skewed
c. If the median falls to the right of the center, the distribution is negatively
skewed
d. If the lines are about the same length, the distribution is approximately
symmetric
e. If the right line is larger than the left line, the distribution is positively
skewed
f. If the left line is larger than the right line, the distribution is negatively
skewed

The Normal Distribution

• Data values are evenly distributed about the mean – symmetrical

• Properties of the theoretical Normal Distribution


i. bell-shape
ii. the mean, median, and mode are equal and located at the center of the
distribution
iii. Unimodal – only one mode
iv. Symmetrical about the mean – the shape is the same for both side
v. The curve is continuous
vi. The curve never touches x axis but it get increasingly closer
vii. Total area under the curve = 1 or 100%
viii. The area under the curve within one standard deviation = 0.68 or 68%;
within 2 std deviation 95% and 3 std deviation = 99.7

6
Confidence Intervals

• Inferential statistics involved estimation


• Process of estimating value of parameter from information obtained from
sample
• Questions: How good is point estimate (the mean). No way of knowing how close
the point estimate is to the population mean. --- some doubt over the accuracy
of point estimates.
• The problem can be addressed by introducing interval estimate. It is an
interval or a range of values used to estimate the parameters. This estimate
may or may not contain the value of the parameter being estimated.

• The parameter is specified as falling between two values. Example average age
of all student

26.9<µ<27.7 OR 27.3 + 0.4

• The probability of being correct can be assigned. E.g 95% confidence interval
means that its is 95% sure/chance that the population mean is contained within
the range.

Confidence level of an interval estimate of a parameter is the probability that the


interval estimate will contain the parameter.

I am ----% confident that the interval ---- to ----- includes the population
mean,µ

You might also like