Lecture 1
Lecture 1
Not every statistical operation can be used with every variable. The type of statistical operations we employ will depend on how our variables are measured.
Levels of Measurement-contd.
The critical difference between ordinal and interval data is that the intervals or differences between values of interval data are consistent and meaningful.
Summary-types of data
cordinal: Values are real numbers; All calculations are valid; Data may be treated as ordinal or nominal. Ordinal: Values must represent the ranked order of the data; Calculation based on ordering process are valid; Data may be treated as nominal but not as interval. Nominal: Values are arbitrary numbers that represent categories; Only calculation based on the frequencies of occurrence are valid; Data may not be treated as ordinal or interval.
Continuous variables: variables that, in theory, can take on all possible numerical values in a given interval
Example: length, income
Analyzing Data:
Descriptive and Inferential Statistics
Population: The total set of individuals, objects, groups, or events in which the researcher is interested. A descriptive measure of a population is called a parameter. Sample: A relatively small subset selected from a population. A descriptive measure of a sample is called a statistic. Descriptive statistics: Procedures that help us organize, summarize and present the data collected from either a sample or a population, in a convenient and informative way. Inferential statistics: The logic and procedures concerned with making conclusions or inferences about characteristics of populations based on sample data.
What is statistics?
Statistics is a science which consists of collection of methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusion based on data.
Cumulative Distributions
Pie chart
Pie chart: a graph showing the differences in frequencies or percentages among categories of a nominal or an ordinal variable. The categories are displayed as segments of a circle whose pieces add up to 100 percent of the total frequencies.
The Histogram
Histogram: a graph showing the differences in frequencies or percentages among categories of an interval-ratio variable. The categories are displayed as contiguous bars, with width proportional to the width of the category and height proportional to the frequency or percentage of that category.
200
100
Std. Dev = 17.03 Mean = 44.5 0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 60.0 55.0 65.0 70.0 75.0 80.0 85.0 90.0 N = 1422.00
AGE OF RESPONDENT
Shapes of histograms
There are four typical shape characteristics
Shapes of histograms
Modal classes
A modal class is the one with the largest number of observations. A unimodal histogram
Weight Data Males: 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155
Females: 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108
10 8 13 9 11 14 6 4 12 7 5
8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68
9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74
7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73
8 8 8 8 8 8 8 19 8 8 8
6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.50 5.56 7.91 6.89
Regression results Number of observations = 11 Mean of Xs =9.0 Mean of Ys = 7.5 Regression coefficent of b of Y on X=0.5 Equation of regression line = Y= 3+0.5X Estimated standard error of b =0.118 Multiple R square = 0.667
ANSCOMBE PLOTS
The illustartion by Anscombes plots is clearly indicative of that the pictures are particularly valuable in an exploratory setting because not only they can confirm or contradict what we thought in advance about the data, but they can also reveal in a dramatic way the things that we did not even suspect.
What is the most likely outcome? What outcome do we expect? What is the outcome in the middle?
The Mode
The category or score with the largest frequency (or percentage) in the distribution. The mode can be calculated for variables with levels of measurement that are: nominal, ordinal, or intervalratio.
The Median
The score that divides the distribution into two equal parts, so that half the cases are above it and half below it. The median is the middle score, or average of middle scores in a distribution.
Median Exercise
Calculate the median:
Given the ordered list of cases, the median is the value of the case in position (n+1)/2. Example: A Sample of 10 adults was asked to report the number of hours they spent on the internet in the previous month. The results are listed below: 0, 7, 12,5,33,14, 8,0,9,22 Place them in ascending order as follows: 0,0,5,7,8,9,12,14,22,33 The median is the average of the 5th and 6th observations (the middle two) which are 8 and 9. The median is 8.5
Depth means a values position relative to the nearest extreme. Median depth= (n+1)/2. Quartile depth= (tmd+1)/2 Truncated median depth means the integer part of the median depth. Compute the first and third quartile for the above example.
The Mean
The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.
x
i 1
x bar equals the sum of all the scores, x, divided by the number of scores, n.