Statistics is the collection, analysis, and interpretation of data to make decisions. It has two aspects: theoretical, dealing with statistical formulas and proofs, and applied, using formulas to solve real-world problems. Applied statistics can be descriptive, summarizing data, or inferential, using samples to make predictions. Basic terms include population, sample, parameter, statistic, quantitative and qualitative variables, levels of measurement, and sampling methods. Data is organized using frequency distributions which group values into classes to show frequencies.
Statistics is the collection, analysis, and interpretation of data to make decisions. It has two aspects: theoretical, dealing with statistical formulas and proofs, and applied, using formulas to solve real-world problems. Applied statistics can be descriptive, summarizing data, or inferential, using samples to make predictions. Basic terms include population, sample, parameter, statistic, quantitative and qualitative variables, levels of measurement, and sampling methods. Data is organized using frequency distributions which group values into classes to show frequencies.
analyze, present, and interpret data for the purpose of
making more effective decisions.
Statistics has two aspects:
Theoretical Statistics deals with the development,
derivation, and proof of statistical theorems, formulas, rules and laws. Applied Statistics involves the application of those theorems, formulas, rules, laws to solve- real-world problems. Applied statistics can be divided into two areas: 1. Descriptive Statistics consists of methods used to summarize and describe the important characteristics of data 2. Inferential Statistics consists of methods that use sample results to help make decisions or predictions about a population Basic Terms A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a subset of a population. A parameter is a numerical measurement describing some characteristics of a population. A statistic is a numerical measurement describing some characteristics of a sample. We use statistic as an estimate of the parameter. Quantitative variable is a variable that can be measured numerically. Qualitative variable is a variable that cannot assume numerical value but can be classified into categories.
A discrete variable can assume only a finite or
countable number of values. A continuous variable can assume the infinitely many values corresponding to the points on a line interval. Levels of Measurement Variables can also be classified according to how they are categorized, counted or measured.
1. Nominal level of measurement refers to data that
can only be counted and put into categories. There is no particular order for the groupings.
2. Ordinal level of measurement presumes that one
category is higher than the other. 3. Interval level of measurement includes the ranking characteristics of the ordinal measurement with an additional property that specifies that the distance between numbers is the same.
4. Ratio level of measurement has all the
characteristics of the interval level of measurement, in addition it includes the inherent zero starting point and the ratio between two numbers is meaningful. Why Sample the Population? Some the major reasons why sampling is necessary are: 1. The destructive nature of certain tests. 2. The physical impossibility of checking all items in the population. 3. The cost of studying all the items in a population is often prohibitive. 4. The adequacy of sample results. 5. To contact the whole population would often be time-consuming. Basic Methods of Sampling 1. Random Sampling – all members of the population have the same chance of being selected for the sample. 2. Systematic Sampling – a random starting point is selected, and then every kth item is selected for the sample. 3. Stratified Sampling – the population is divided into several groups, or strata, and then a sample is selected from each stratum. 4. Cluster Sampling – the population is divided into primary units, and then samples are drawn from the primary units. When data are collected, the information obtained from a population or sample may be recorded in a sequence that is random or unranked. Such data are called raw data. For example, we collect information on the ages of 50 students selected from a university. 21 19 24 25 29 18 20 19 22 19 25 19 31 19 23 22 28 21 20 22 25 23 18 37 27 34 26 27 37 33 19 25 22 25 23 18 23 19 23 26 22 21 20 19 21 23 21 25 21 24 When the bulk of data is quite large, a good over-all picture of all the information needed can be presented by grouping the data into a number of categories.
Frequency Distribution – a grouping of the data into
categories showing the number of observations in each category. The data presented in a frequency distribution table are called grouped data. Steps in Developing a Frequency Distribution
1. Determine the number of classes (avoid having
fewer than 5 or more than 15 classes). To determine the suggested number of classes, find the smallest integer k such that 2𝑘 ≥ n, where n is the total number of observations. Alternatively, make use of the Sturges’ Formula: k = 1 + 3.3 log n where k refers to the number of classes and n is the total number of observations 2. Decide on the size of the class interval. A suggested class interval can be found by
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 – 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
3. Tally the raw data into the classes to arrive at the
frequency distribution. Important Terms Class - grouping of values in a frequency distribution. Class Interval - the range of values of a given class Class Limits - the boundaries of an interval Class Boundary – the midpoint of the upper limit of one class and the lower limit of the next class Class Frequency – the number of values in a data set that belong to a certain class Class Mark - the midpoint between class limits of a class interval Class Width (or Size) – The difference between the two boundaries of a class Relative Frequency and Percentage Distribution It may be desirable to convert class frequencies to relative class frequencies to show the percent of the total number of observations in each class. To covert the class frequencies in a frequency distribution to relative frequencies, the frequency of class is divided by the sum of all frequencies. The percentage for a class is obtained by multiplying the relative frequency of that class by 100. Cumulative Frequency Distribution A cumulative frequency distribution gives the total number if values that fall below the upper boundary of each class. It is obtained by adding frequencies successively from the lowest to the highest class interval. Graphic Presentation of a Frequency Distribution 1. Histogram – a graph in which classes are marked on the horizontal axis and frequencies are marked on the vertical axis. 2. Polygon – a graph formed by joining the midpoints of the tops of successive bars in a histogram by straight lines. 3. Ogive – is a curve drawn for the cumulative distribution