0% found this document useful (0 votes)
43 views35 pages

Introduction To Statistics

Uploaded by

Dee soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views35 pages

Introduction To Statistics

Uploaded by

Dee soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

STATISTIC

S
STATISTICS
 Statistics is the study of the collection,
analysis, interpretation, presentation, and
organization of data. In other words, it is a
mathematical discipline to collect, summarize
data.
APPLICATIONS OF
STATISTICS

Statistics in Statistics in Statistics in Statistics in Statistics in social


Economics industry insurance astronomy sciences

Statistics in Statistics in
Statistics in
Biology and Psychology and Statistics in war
Physical Science
Medical Science Education
LIMITATIONS OF STATISTICS
It is not concerned
It does not It does not reveal
with the
recognize the the entire story of
qualitative
individual items: a phenomenon:
phenomena:

Its results are true


It laws are not It is likely to be
only on an
exact: misused:
average:
STATISTIC VS. PARAMETER

A statistic is a characteristic of a sample.


• It is a numerical or graphic way to summarize data obtained from a
sample
A parameter is a characteristic of a population.
• It is a numerical or graphic way to summarize data obtained from
the population
TYPES OF NUMERICAL DATA
 There are two fundamental types of numerical data:
 Categorical data: obtained by determining the frequency of occurrences in each of several
categories
 Quantitative data: obtained by determining placement on a scale that indicates amount or degree
TYPES OF ENQUIRY

OFFICIAL, SEMI- INITIAL OR CONFIDENTIAL DIRECT OR REGULAR OR CENSUS OR


OFFICIAL OR UN- REPETITIVE OR NON- INDIRECT AD-HOC SAMPLE
OFFICIAL CONFIDENTIAL

PRIMARY OR
SECONDARY
CLASSIFICATI
ON OF DATA
Classification is the process of arranging data into
sequences and groups according to their common
characteristics or separating them into different but
related parts.
FUNCTIONS OF
CLASSIFICATION
It condenses the data

It facilities comparisons

It helps to study the relationships

It facilitates the statistical treatment of the data:


TECHNIQUES FOR SUMMARIZING
QUANTITATIVE DATA

Frequency Histograms Stem and Leaf Distribution Averages Variability


Distributions Plots curves
DISCRETE VS CONTINUOUS FREQUENCY
DISTRIBUTIONS
Continuous Frq. Dis.
Raw Data Arranged Data Discrete Frq.Dis. Exclusive Classes
Marks Marks Marks Freqency
Marks Frequency
76 32 32 1
0 - 25 0
93 39 39 2
25 - 50 3
39 39 50 2
50 - 75 3
50 50 66 1
75 – 100 5
76 50 76 2
81 66 81 1 Inclusive Classes
66 76 90 1 Marks Frequency
50 76 93 1 0 - 25 0
39 81 26 - 50 5
90 90 51 - 75 1
32 93 76 – 100 5
Histogram, Stem and Leaf Diagram
Histogram Stem and Leaf Plot

Stem Leaf
6 78
9 018
SUMMARY MEASURES
Summary Measures

Central Tendency Quartile Variation


Arithmetic Median Mode
Mean
Range Coefficient of
Variation
Variance
Geometric Mean
Standard Deviation
MEASURES OF CENTRAL
TENDENCY
Central Tendency

Average (Mean) Median Mode


n

X i
X  i 1

n
N

X i
 i 1

N
MEAN (ARITHMETIC MEAN)
 Mean (arithmetic mean) of data values
 Sample mean

Sample Size
n

X i
X1  X 2    X n
X i 1

 Population mean
n n
Population Size
N

X i
X1  X 2    X N
 i 1

N N
The most common measure of central tendency
Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

MEAN
WEIGHTED MEAN
A form of mean obtained from groups of data in


which the different sizes of the groups are
accounted for or weighted. f ( x)
xw 
N total
MEDIAN
 Robust measure of central tendency
 Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14


Median = 5
In an Ordered array, median is the “middle” number
Median = 5
 If n or N is odd, median is the middle number
 If n or N is even, median is the average of the two middle numbers
MODE
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical data
 There may be no mode
 There may be several modes

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No Mode
Mode = 9
QUARTILES
 Split Ordered Data into 4 Quarters

25% 25% 25% 25%


Q 
1 Q2  Q3 
i  n  1
 Position of i-th Quartile

Qi  
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22

19  1 12  13 
Position of Q1   2.5 Q1   12.5
4 2
DIFFERENCES IN MEASURES
OF CENTRAL TENDENCY
 Mode, median and mean could be three different numbers in asymmetrical distributions of
data.
 For any data set there is only one mean and median but there may be many modes.
 Median is less influenced by the extreme values than mean.
 Mean is almost never observed, median is observed in only odd numbered data sets and mode
is always observed in the data set.
MEASURES OF VARIABILITY
 Measures of variability show how spread out the distribution of scores is from the mean,
or how much dispersion or scatter exists in the distribution. If there is a large degree of
dispersion, that is, if the scores are very dissimilar, we say the distribution has a large or high
variability, or variance. If the scores are very similar, there is a small degree of dispersion and
a small variance.
MEASURES OF VARIATION
Variation

Variance Standard Deviation Coefficient


of Variation
Range
Population Population Standard
Variance (σ2) deviation (σ)
Inter-
quartile Sample
Range Variance (S2)
Sample Standard
deviation (S)
RANGE
The range is simply the numerical
difference between the highest and lowest
scores in the distribution.
INTERQUARTILE RANGE

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of


the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile


• IQR = Q3 – Q1
STANDARD
DEVIATION
 The measure of variability used most often in
research is the standard deviation, a statistic
that indicates the average distance of the scores
from the mean of the distribution.
COMPARING STANDARD
DEVIATIONS
Data A Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = .9258

Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.57
STANDARD DEVIATION
WITH A MEAN OF 62 AND A SD OF
3, 95% OF SCORES SHOULD FALL
BETWEEN 62-2*3 AND 62+2*3. I.E
56 AND 68
VARIANCE
 The Variance, s2, represents the amount of variability of the data
relative to their mean
 As shown below, the variance is the “average” of the squared
deviations of the observations about their mean

s 2

 ( x  x)
i
2

n 1
► The Variance, s2, is the sample variance, and is used to
estimate the actual population variance, s 2

 2

 (x  )
i
2

N
COEFFICIENT OF VARIATION
 Measures relative variation

 Always in percentage (%)

 Shows variation relative to mean

 Is used to compare two or more sets of data measured in different units

S 
CV   100%
X 
SKEWNESS
When graphing the mean, median and mode of
a distribution, roughly speaking, a distribution
has positive skew if the right tail is longer and
negative skew if the left tail is longer.
SHAPE OF A DISTRIBUTION
 Describes how data is distributed

 Measures of shape
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
POSITIVELY SKEWED
 This distribution has a positive skew. Note that the mean is larger than the median.
NEGATIVELY SKEWED
 This distribution has a negative skew. The median is larger than the mean.

You might also like