Unit II: Basic Data Analytic Methods
Unit II: Basic Data Analytic Methods
Methodology which scientists and mathematicians have developed for interpreting and
drawing conclusions from collected data.
• Definition (Statistics).
Statistics consists of a body of methods for collecting and analyzing data.
What is statistics ?
Statistical methods can be used to find answers to the questions like:
What kind and how much data need to be collected?
How can we analyse the data and draw conclusions from it?
How can we assess the strength of the conclusions and evaluate their
uncertainty?
statistics provides methods
1. Design: Planning and carrying out research studies.
Political science: How accurate are the gallups and opinion polls?
• The number of observations that fall into particular class (or category)
of the qualitative variable is called the frequency (or count) of that
class
• A table listing all classes and their frequencies is called a frequency
distribution
• The qualitative data are presented graphically either as a pie chart or
as a horizontal or vertical bar graph.
Example
• Mode :The mode identifies the most common value or values in the data
set. Depending on the data, there might be one or more modes, or no
mode at all.
Measures of centre
• Range: Range shows the mathematical distance between the lowest
and highest values in the data se
• Range= ( Max value – Min Value )
• Standard Deviation :Standard deviation measures the variability of
the data set. Like range, a smaller standard deviation indicates less
variability.
• Standard Deviation =
• where ∑ means sum
• X represents each data set value
• Ẍ represents the mean value
• N number of values in dataset
Example
• Dataset : 20, 24, 25, 36, 25, 22, 23
• Mean : (20+24+25+36+25+22+23) / 7 =175 /7 = 25
• Median : Sort data set :- 20, 22, 23, 24, 25, 25, 36
• Middle Value : 24
• Mode : Sort data set :- 20, 22, 23, 24, 25, 25, 36
• Repeated Value : 25
• Range : ( 36 -20 ) = 16
• Standard Deviation :
=√[ (20-25)2 +(24-25)2+(25-25)2 + (36-25)2+(25-25)2+(22-25)2+(23-25)2 ]/(7-1-)
= [ 25 +1 +0 +121 +0 + 9+ 4 ] / 6 = √16/6 = √26.66 =5.16
Example
• Dataset : 57, 64, 43, 67, 49, 59, 44, 47, 61, 59
• Mean
• Median
• Mode
• Range
• Standard Deviation
Measures of Centre
• measures of centre and variation, the sample mean and the sample
standard deviation s are the most commonly reported. Since their
values depend on the sample selected, they vary in value from
sample to sample. In this sense, they are called random variables
• If there is no link between the data then use the independent pairs
test
• Difference of mean =
• N number of samples , D is Difference between samples
• Standard deviation of difference
• Standard Error SE =
• Value of
• Degree of freedom
• DoF= n – 1 (where “n” is the number of items in your set)
• DOF (Two Samples) =(N1 + N2) – 2.
• Question
• In an investigation to determine the effectiveness of sequencing of fingerprints
10 prints are taken enhanced with DFO and then with ninhydrin. The points of
detail at each stage are recorded. Is there a difference at the 95% confidence
level?
t-test for matched pairs
1. Set up the null and alternative hypothesis
• H0 there is no difference in the number of minutae when using ninhydrin
• HA there are more minutae observed after the enhancement of ninhydrin.
• This is a one-tail test.
• We are testing at the 95% or 5% (0.05) level
t-test for matched pairs
1. Set up the null and alternative hypothesis
2. Calculate the difference between the pairs in the sample