Lesson 1 - Roles of Statistics and Data Analysis
Lesson 1 - Roles of Statistics and Data Analysis
Steps:
1. Acknowledging Variability – the differences in the data that we are dealing with.
2. Collecting Data Sensibly – gathering of correct information using the most efficient
way.
3. Describing Variability in the Data – describe the differences of the data gathered.
4. Descriptive Statistics – after we gather the correct information we have to present and
organize it in the most efficient way.
5. Drawing Conclusions in a Way that Recognizes Variability in the Data and Probability
Conclusion – we have to come up with a result/output using the correct statistical tool
and considering also the differences between the data.
• Statistics
➢ the scientific discipline that provides methods to help make sense of data.
➢ Suspicion: Extreme Skeptics, usually speaking out of ignorance, characterize this
discipline as a subcategory of lying.
➢ If used properly, statistical methods offer a set of POWERFUL tools for gaining
insight into the world around us.
➢ Often used in business, medicine, agriculture, social sciences, natural sciences and
applied sciences.
➢ And statistics teaches us how to make intelligent judgements and informed
decisions in the presence of uncertainty and variation.
1. Be Informed
➢ To understand news reports making data-based claims.
➢ Extract information from tables and graphs.
➢ Understand the basics for valid research design.
➢ If all measurements were identical for every individual, this task would be easy.
➢ But population without variability are virtually non-existent.
➢ In fact, variability is universal.
➢ We need to understand variability to be able to collect, analyze, and draw
conclusions from data in a sensible way.
➢ The branch called descriptive statistics helps to increase our understanding of the
nature of variability in a population.
➢ Conclusions based on data are seen regularly in popular media and professional
and academic populations.
➢ Decisions are data driven in business, industry and government.
Sample – a subset of the population, selected for study in some prescribed manner.
➢ Raw data without analysis is of little value, likewise even a sophisticated analysis
cannot provide meaningful information from data that were not collected in a
sensible way.
The article “Brain Shunt Tested to Treat Alzheimer’s “(San Francisco Chronicle, October
23, 2002) summarizes the findings of a study that appeared in the journal Neurology. Doctors
at Stanford Medical Center were interested in determining whether a new surgical approach
to treating Alzheimer’s disease results in improved memory functioning.
The surgical procedure involves implanting a thin tube, called a shunt, which is designed
to drain toxins from the fluid-filled space that cushions the brain. 11 patients had shunts
implanted and were followed for a year, receiving quarterly tests of memory function.
Another sample of Alzheimer’s patients was used as a comparison group. Those in the
comparison group received the standard care for Alzheimer’s disease.
After analyzing the data from this study, the investigators concluded that the “results
suggested the treated patients essentially held their own in the cognitive tests while the patients
in the control group steadily declined. However, the study was too small to produce conclusive
statistical evidence.”
Based on these results, a much larger 18-month study was planned. That study was to
include 256 patients at 25 medical centers around the country.
1. What were the researchers trying to learn? What question motivated their research?
5. Was an appropriate method of analysis used, given the type of data and how the data
were collected?
6. Are the conclusions drawn by the researchers supported by the data analysis?
• Describing Data
➢ Variable – any characteristics whose value may change from one individual or object
to another.
➢ Data – results from making observations either on a single variable or
simultaneously on two or more variables.
• Types of Data Set
2. Bivariate Data Set – when a data set consists of two attributes recorded
simultaneously for each individual.
3. Multivariate Data Set – result from obtaining a category or value for each of two
or more attributes.
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑒𝑡
• Bar Charts – a graph of the frequency distribution of categorical data.
• How to construct:
➢ Horizontal line, with category names below line at regularly spaced intervals.
➢ Vertical line, label the scale using in frequency or relative frequency.
➢ Rectangular bar above every category should be same width, height determined
by category’s frequency.
• Dotplots for Numerical Data – a dotplot is a simple way to display numerical data
when the data set is reasonably small.
• How to construct:
➢ Draw a horizontal line and mark with an appropriate measurement scale.
➢ Locate each value in the data set along the measurement scale and represent it by
a dot. If there are two or more observations with same value, stack the dots
vertically.