0% found this document useful (0 votes)

137 views

Different Types of Data - BioStatistics

This document provides an introduction to different types of data and studies in statistics. It discusses the differences between populations and parameters, which describe all subjects and values of a population, versus samples and statistics, which are subsets of a population used to make inferences. Specifically, it covers: - Experimental and observational studies, noting that experiments provide more control while observations do not. - Sources of bias like sampling bias when samples are not random or underrepresent the population. - Types of variables like qualitative (categorical) and quantitative (numeric).

Uploaded by

Sophia Mabansag

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views

Different Types of Data - BioStatistics

Uploaded by

Sophia Mabansag

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Module 1: Different Types of Data

Introduction to Statistics

What is Statistics

Most of you have probably thought about statistics, and maybe even used informal
statistical methods to help explain something you’ve experienced, or perhaps to win an
argument. Out of the number of students in the class (call it n), I would guess we’d have
n different definitions of Statistics, all of them somewhat similar, but none exactly the
same.

I would guess that maybe 80% of you would define Statistics as a Mathematics course.
I would also venture to say that maybe 40% of you have some anxiety about taking this
course because math isn’t your sharpest skill. Let me tell you now that Statistics does
use math, but I don’t call it a math course. If you’ve ever taken a Analytical Chemistry
course, or a Physics course, did you label it as a math course? Probably not. It was a
Chemistry course, or Physics, or whatever. I ask you to view this course, and the field of
Statistics similar to how you view the courses above. Do they use math? Yes. Is that the
main focus? No.

I think of Statistics as the science of collecting, explaining and interpreting data. There is
a fine distinction between “explaining” and “interpreting.” I can use math to explain data,
like calculating an average, but what does that average mean for the dataset? For the
population? Taking the numbers and interpreting them is, to me, the most important part
of the field of Statistics. In readings and elsewhere, this is called inferential statistics.
Making inferences from data about populations is going to be the main focus of this
course. We’re going to use a software package called R to do math for us, which will
allow us to think solely about what results mean.
Where is Statistics used

In short, everywhere! While our focus in this course will be Statistics in the Biological
sciences, the tools you’ll gain here will apply to your everyday life. By the end of this
course, I hope you are able to watch the news, or read social media and start asking
questions about the statements being made.

Now that we have a better understanding of what we’re getting into, let’s begin where
any statistical study begins: the data

Please read the following:

Chapter II (Graphing Distributions)

• section 2 - Qualitative Variables

• section 3 - Quantitative Variables

There are many ways to gather data, and some are better than others. In this module,
we’ll discuss different types of experiments, some sampling methods, and a brief
introduction to some popular Experimental Designs in Statistics.

Experimental

Experimental studies are most widely used in scientific fields, like medical or biology.
The reason for that is because experimental studies provide the researchers with the
most control. Researchers determine what their independent variable(s) are, and what
and how they will measure the dependent variable. Most of the examples we’ve talked
about so far have been experimental studies: think the diet and exercise example from
the last learning module. Researchers decided how many different types of exercises
and diets to use, as well as which types of those two explanatory variables. Subjects
used in the study will be assigned to the different levels of diet and exercise by the
researchers, thus providing the researchers with the most control.
There are many popular experimental studies, especially in the field of medicine. The
most popular is a double blind study, where neither the researcher nor the subject know
which level or the independent variable(s) they’ve been assigned to.

Because of the amount of control granted in experimental studies, we are able to make
inferences about the data. For me, inference is the most important thing we can do with
data. It’s when we’re able to take observations from a select group of subjects (our
sample), and make generalizations about an entire group of subjects, be it people,
frogs, or whatever.

Unfortunately, it won’t always be possible to gather data by using an experimental

study. Often times, this requires a lab, or more resources like time and money, that just
aren’t available. That’s where observational studies come in.

Observational

Observational studies don’t provide the same amount of control for the researchers that
experimental studies do. In observational studies, researchers can determine
independent variable(s), but they just have to observe the dependent variable, not
knowing if the independent variable(s) caused the change in the dependent variable, or
if it was some other unmeasured variable.

Consider you want to measure purchases at a mall based on what people are wearing.
You have two levels of your independent variable (clothes): fancy clothes, non-fancy
clothes. You track people that fall into those two categories and see what they buy.One
possible inference drawn from this study would be that people in fancy clothes are more
likely to buy big screen televisions than people dressed in non-fancy clothes. So do you
think that fancy clothes will generally lead people to buy big screen televisions?
Probably not, or there is another variable unmeasured that is the actual cause (like
income).

Since we don’t have that much control, observational studies don’t provide as powerful
of inferences as experimental studies do. Researchers like experimental studies over
observational studies because we are able to infer from experimental studies better
than observational studies.
Surveys are very popular in the world, and I’m sure that all of you have filled out a
survey at some time in your life. While researchers designed those surveys, they have
no control over how you answer (i.e. just observing your responses). That makes
surveys observational studies. We should be wary of inferences made from surveys
because of this reason. That being said, there are situations where a survey is the only
way we’ll be able to get any data on a subject. If we were interested in, say, how well
does marijuana help with pain relief, we can’t set up an experimental study where we hit
somebody’s toe with a hammer, hand them a joint, and ask how they feel. A survey will
be the only way we can learn that response. So inferences can be made from
observational studies, like surveys, but I wouldn’t buy into the inferences as much as I
would from an experimental study.

Hide notes

Skip to main content

Throughout this course we’ll be talking about populations, samples, and the difference
between the two. It will be important that we understand what both are, and that those
differences exist.

Populations and Parameters

Populations are usually thought of as how many people are in a city/state/country/world.

In Statistics, a population is simply all the subjects of interest. We determine the
population of interest when we begin the study. For example, if I’m interested in the
average height of Doane students, my population will be all enrolled Doane students,
including students enrolled at the school of Arts and Sciences, and the GPS schools. If I
wanted to see how a certain species of frog reacts to different temperatures, my
population will be all the frogs of that species.

Parameters are values of interest of the population. These are usually some calculated
value, like and average, or a maximum. In Statistics, we usually symbolize these
parameters with Greek letters. We’ll talk more about which letters are used to represent
different parameters later.

These parameters can only be known if we have measurements from all subjects of the
population. If (and it can even be an impossible if) the parameter is known, then there is
no reason to do statistical analysis on the data. Our inference methods in statistics help
us make generalizations about those population parameters. If we already know them,
why go through the work?

We can think of instances, however, where it won’t be possible to gather all the
population. In the frog example, to know a given parameter, we would need to measure
all the frogs in that species. How difficult would it be to find every single frog of that
species? To use another example to get this point across, what if we’re doing a study
on leaf weight for trees in a local park. Would you want to go out and weigh every single
leaf for all the trees in the park? I wouldn’t. That’s why taking samples will be important.

Samples and Statistics

Samples are subsets of the population of interest. There are multiple ways samples can
be collected, some of which we’ll talk about in a bit. Samples will be important because
they will allow us to understand an entire population better.

Values that we get from samples are called statistics. These statistics will always be
known, since our sample data will be at hand. You may not be able to tell me the
parameter average of the heights of Doane students, but you can easily gather a
sample of students to calculate that average.

When we take samples, and calculate sample statistics, we can (and will) then use
those sample statistics to make inferences about the population parameters.

Since we’re using sample statistics to make inferences about those population
parameters, we need to make sure those statistics will represent the population. One of
the ways that won’t happen is when there is bias in the sample. In general, bias occurs
when a sample does not represent the population of interest. When bias happens, our
inferences suffer. We’re going to talk about three main types of bias.

Sampling Bias

A sampling bias occurs when the sample was not random, or a population was under
covered (meaning that not every subject of the population had a chance of being
selected). If I was trying to determine the average height of Doane students, but I only
sampled students on the one campus, I have created a sampling bias because I didn’t
include the other 3 campuses.

Response Bias
Response bias occurs when the subject is influenced to respond a certain way. This can
easily be thought about in a survey setting, but it doesn’t just exist in observational
studies. If I was running the frog temperature study from the last learning module, and I
noticed that the frogs in the 60oF tank were shivering. Let’s say that I felt bad, so I
kicked up the temp a little. I’m now influencing the response to be inaccurate because
I’ve changed the study.

Nonresponse Bias

Nonresponse bias simple occurs when subjects don’t respond. If you’ve ever hung up
on a telemarketer, or deleted any emails with surveys in them, you may have provided
some nonresponse bias. Nonresponse bias will only be a concern if a large proportion
of those sampled don’t respond, and that percentage will be dependent on the sample
size. If I asked the class if you prefer cookies or brownies, and I only had two students
respond, that sample probably won’t represent the entire class (so nonresponse bias
would be a huge issue). Voter turnout for national elections is usually around 56%, but
we don’t usually consider the President-Elect to be bogus because of nonresponse bias
since 56% of the US population is still a large number of people.

Hide notes
There are many ways to collect samples. We’ll talk about some of the more popular ones.

Simple Random Sample

A simple random sample (SRS) is when each subject of a population has an equally likely
chance of being selected. Pulling names out of a hat, or using a random number generator to
select subjects are ways of getting a SRS. These types of samples are preferred to non-random
samples, but it doesn’t mean that we will avoid bias. Consider a population of 20 people, 10
men and 10 women. If I randomly select 5 people, I could still end up with all five of the sampled
people be women, or all five being men. If you find yourself in a situation where that may be a
concern, use a different method.

Stratified

In a stratified sample, we subset the population into groups, called strata. These strata are
mutually exclusive, and are determined based on some quality that is easily defined. At the high
school level, we can set up strata based on class rank (first year, sophomore, junior, senior).
Once we have strata defined, to obtain the sample, we’ll randomly select a number of subjects
from each strata (say 10 from each class rank). This will ensure you have representation from
each strata.
Cluster

A cluster sample is similar to stratified sampling in that we group the population into groups, this
time called clusters. Clusters are usually based more on convenience, and subjects shouldn’t be
all of the same type. If I wanted to determine heights of Doane students just on Crete campus, I
can set up clusters based on the 6 different living options (5 dorms, and off campus). With the
way things are set up, there are different age groups living in all areas, and we have no all-one-
sex dorms, so the students are pretty well randomized within the living spaces. To gather data,
we’ll then randomly select clusters, and measure all subjects in the clusters selected.

Convenience

Convenience samples are the easiest to gather, but the sampling method that shouldn’t be
used. I sometimes call this the lazy sample because observations are collected by convenience.
Asking friends, coworkers, family to participate because they are close to you/easy for you to
ask most likely will not provide you with a good representation of the population.

There are some experiments that have built in structure to them. There are many
different types, some more complex than others, and experimental design is an entire
field of Statistics. Most of the designs are outside the scope of this class, but we’ll briefly
talk about some of the more frequent ones.

A good experimental design will try to capture all known sources of variation. This will
help us in understanding how well independent variables predict dependent variables.

Terminology

There is terminology from experiments that will help us understand design setups, and
will be important to this class moving forward.

The first thing we’ll talk about is treatment. A treatment is a level of an explanatory
variable. Think back to the frog example. We had one explanatory variable
(temperature), and we had three levels (60, 70, 80o F). The temperature values are
treatments, so there is one explanatory variable, three treatments. Treatments are what
the researcher applies to the subject.

Another term is experimental unit (e.u.). An experimental unit is the smallest possible
unit in which the treatment can be applied. If we’re dealing with people, we’d often think
that the experimental unit would be one person, and that’s usually the case. There are
instances, however, where the experimental unit will not be an individual subject.
Assume I’m testing what type of diet will increase weight in pigs, with 5 sties, 3 pigs per
sty. Also assume I’m testing 5 different food slops (five treatments). If I place the
different types of food in troughs (one trough for each cage), the smallest individual unit
that receives the treatment is all the pigs in the given sty. So I would have five
experimental units since I have five troughs.

Finally, we need to talk about replication. A replication is when multiple experimental

units are given the same treatment. Replication is important in statistics because it
reduces the randomness that can be seen in everyday life. In the pig example, I had no
replication (5 treatments for 5 experimental units). Any differences I observe in weights
could be because of the treatments, or it could be because of the pigs, their
environment, or a number of other factors. If I could see the same trends over several
groups under the same treatment, I will have more confidence in saying that it was the
treatment that caused those trends.

Completely Randomized Design

The first type of experimental design is a completely randomized design (CRD). In a

completely randomized design, all experimental units are randomly assigned the
treatments, no matter the factors not involved in the experiment. Typically, in a CRD, we
will only have one explanatory variable, but that variable can have multiple treatments.
To make the pig example above a CRD, I would randomly assign each of the five food
slops to the five sties.

Completely Randomized Block Design

An extension to a CRD is a completely randomized block design (CRBD). With a CRBD

we’re going to take a CRD and add a block (or group) to the design. The blocks in a
CRBD are usually organic in nature, meaning that the researcher cannot apply a block
to a subject, rather the subject comes from that block/group. If, say, there were 5
different farms from where our pigs came from in the example above, that may add
variation to our study (the pigs could have been raised differently, bred differently,
whatever). To capture that variation, we would want to keep track of that variable as
well. So now we will have a treatment effect, as well as a block effect.

Factorial

The last type of design we’ll talk about is a factorial design. A factorial design has more
than one independent variable (usually two, but can be more). The diet and exercise
example we’ve been working with so far in this course has been an example of a
factorial design. In factorial designs, we are not only interested in the two independent
variables, but also how the two react together (we call this an interaction term).

Please read the following:

Chapter VI (Research Design)

• section 2 - Scientific Method

• section 3 - Measurement

• section 4 - Data Collection

• section 5 - Sampling Bias

• section 6 - Experimental Designs

Exploring Fan Passions Driving Consumer Engagement The Case of Philippine K Pop and K Drama Fans
No ratings yet
Exploring Fan Passions Driving Consumer Engagement The Case of Philippine K Pop and K Drama Fans
38 pages
Lecture - 1 Introduction
No ratings yet
Lecture - 1 Introduction
9 pages
merged_presentation_8614
No ratings yet
merged_presentation_8614
290 pages
Math 231 (1)
No ratings yet
Math 231 (1)
88 pages
2 Obtaining Data 1
No ratings yet
2 Obtaining Data 1
24 pages
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
No ratings yet
Chapter 1: Introduction To Statistics: 1.1 An Overview of Statistics
5 pages
SFB Module I 2019
No ratings yet
SFB Module I 2019
37 pages
Biometry Lecture 1
No ratings yet
Biometry Lecture 1
20 pages
STT 430/630/ES 760 Lecture Notes: Chapter 1: Introduction
No ratings yet
STT 430/630/ES 760 Lecture Notes: Chapter 1: Introduction
5 pages
SPSS for you
From Everand
SPSS for you
A Rajathi
4.5/5 (4)
Lesson 1 Basic Concepts of Statistics
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
What Is Statistics1
No ratings yet
What Is Statistics1
20 pages
MS 14L1 Introduction To Statistics
No ratings yet
MS 14L1 Introduction To Statistics
30 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
44 pages
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
28 pages
Lesson 01
No ratings yet
Lesson 01
6 pages
Gravetter10e_Ch01_PPT_FINAL
No ratings yet
Gravetter10e_Ch01_PPT_FINAL
48 pages
stats.2021.u1
No ratings yet
stats.2021.u1
31 pages
Statistics Report Group 1
No ratings yet
Statistics Report Group 1
64 pages
Chapter1 Stats
No ratings yet
Chapter1 Stats
7 pages
Updated -BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
Updated -BCSC 108 MAY 24 Introduction to Statistics
69 pages
The Nature of Probability and Statistics
100% (1)
The Nature of Probability and Statistics
16 pages
Basic Concept in Statistics-Biostat
No ratings yet
Basic Concept in Statistics-Biostat
29 pages
Introduction to Statistics
No ratings yet
Introduction to Statistics
30 pages
BPSY55
No ratings yet
BPSY55
53 pages
Biostatistics Prelims Week 1
No ratings yet
Biostatistics Prelims Week 1
37 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
ENGINEERING DATA ANALYSIS NOTES
No ratings yet
ENGINEERING DATA ANALYSIS NOTES
6 pages
Sataticis PDF
No ratings yet
Sataticis PDF
25 pages
m1002 Lecture One 2025
No ratings yet
m1002 Lecture One 2025
15 pages
The Nature of Probability and Statistics
No ratings yet
The Nature of Probability and Statistics
22 pages
SASA
No ratings yet
SASA
22 pages
Introduction To Biostatistics: Dr. M. H. Rahbar
No ratings yet
Introduction To Biostatistics: Dr. M. H. Rahbar
35 pages
Statistics Is About Data (Observations) - in Statistics We Organize and Analyze Data. We Come Up
No ratings yet
Statistics Is About Data (Observations) - in Statistics We Organize and Analyze Data. We Come Up
2 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Biostatistics
No ratings yet
Biostatistics
124 pages
Introductory Statistics For The Behavioral Sciences Presentation - Chapters 1 & 2
No ratings yet
Introductory Statistics For The Behavioral Sciences Presentation - Chapters 1 & 2
37 pages
MMW Module 4
No ratings yet
MMW Module 4
54 pages
MMW Module 4 Lesson 1
No ratings yet
MMW Module 4 Lesson 1
13 pages
Lesson 1
No ratings yet
Lesson 1
2 pages
BSEM-26_CHAPTER-1-1-10 (1)
No ratings yet
BSEM-26_CHAPTER-1-1-10 (1)
10 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
Chapter 1 A Preview of Business Statistics
No ratings yet
Chapter 1 A Preview of Business Statistics
3 pages
BRIEF HISTORY OF STATISTICS
No ratings yet
BRIEF HISTORY OF STATISTICS
15 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
Chapter One Definition of Statistics
No ratings yet
Chapter One Definition of Statistics
17 pages
Chapter - 1 - Introduction To Statistics
No ratings yet
Chapter - 1 - Introduction To Statistics
50 pages
GW E8 CH 01
No ratings yet
GW E8 CH 01
50 pages
Stats Unit I Notes
No ratings yet
Stats Unit I Notes
24 pages
A Lesson 1 Introduction To Statistics & SPSS
100% (1)
A Lesson 1 Introduction To Statistics & SPSS
8 pages
Chapter-1 Data analysis
No ratings yet
Chapter-1 Data analysis
14 pages
Basic Steps in A Statistical Study: Fundamentals of Statistics
No ratings yet
Basic Steps in A Statistical Study: Fundamentals of Statistics
19 pages
College Statistics LP2
No ratings yet
College Statistics LP2
6 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
Chapter-1 (Introduction To Biostatistics)
No ratings yet
Chapter-1 (Introduction To Biostatistics)
30 pages
Educ 301 Angel Mae a. Llobrera
No ratings yet
Educ 301 Angel Mae a. Llobrera
14 pages
Hand-Out in Statistics Statistics
No ratings yet
Hand-Out in Statistics Statistics
4 pages
1lesson 1 Basic Concepts of Statistics With Answers
No ratings yet
1lesson 1 Basic Concepts of Statistics With Answers
9 pages
Chapter 1
No ratings yet
Chapter 1
30 pages
LESSON 2 Introduction To Statistics Continuation
100% (1)
LESSON 2 Introduction To Statistics Continuation
32 pages
Chapter 1 - F23
No ratings yet
Chapter 1 - F23
16 pages
Design For Human Flourishing in Architec
No ratings yet
Design For Human Flourishing in Architec
10 pages
Research Across Fields: Cs - Rs11-Iiia-5 Provide Examples of Research in
100% (1)
Research Across Fields: Cs - Rs11-Iiia-5 Provide Examples of Research in
4 pages
Mcgill University Thesis Office
100% (1)
Mcgill University Thesis Office
5 pages
Decision Making Pattern of Married Working Women in Coimbatore City
No ratings yet
Decision Making Pattern of Married Working Women in Coimbatore City
5 pages
Tropical Rock Engineering
No ratings yet
Tropical Rock Engineering
383 pages
Edited Manuscript Queque
No ratings yet
Edited Manuscript Queque
70 pages
Research Nuradin
No ratings yet
Research Nuradin
40 pages
A201 Beeb1013 Guidelines For Assignment
No ratings yet
A201 Beeb1013 Guidelines For Assignment
4 pages
Icst 202337
No ratings yet
Icst 202337
6 pages
Springer Nature Latex Template
No ratings yet
Springer Nature Latex Template
13 pages
2014 Oxford Handbookof Public Accountability CH 1
No ratings yet
2014 Oxford Handbookof Public Accountability CH 1
23 pages
Characteristics of Systematic Literature Review
100% (2)
Characteristics of Systematic Literature Review
10 pages
Constructing The Risk Manageme
No ratings yet
Constructing The Risk Manageme
27 pages
Cover Letter For Environmental Officer
100% (2)
Cover Letter For Environmental Officer
6 pages
Новый документ
No ratings yet
Новый документ
5 pages
TADS
No ratings yet
TADS
8 pages
3d Design Dissertation Topics
100% (2)
3d Design Dissertation Topics
5 pages
HDFC Bank: Organizational Study Report On
No ratings yet
HDFC Bank: Organizational Study Report On
75 pages
Cost Reduction Techniques-Final
70% (10)
Cost Reduction Techniques-Final
75 pages
Bachelor Thesis Business Management
100% (3)
Bachelor Thesis Business Management
4 pages
Entrepreneurship and Unemployment Edited.
No ratings yet
Entrepreneurship and Unemployment Edited.
12 pages
Gender Differences PR2 RRL
No ratings yet
Gender Differences PR2 RRL
3 pages
English Class XI ASL Project & Viva Voce Outline 2024-25
No ratings yet
English Class XI ASL Project & Viva Voce Outline 2024-25
5 pages
Letter For BR 2
No ratings yet
Letter For BR 2
6 pages
Market Research
No ratings yet
Market Research
33 pages
Title Matrix
No ratings yet
Title Matrix
1 page
Risk Assessment (E900304)
No ratings yet
Risk Assessment (E900304)
3 pages
Summary
No ratings yet
Summary
5 pages
Analysis of The Main Character Needs in Life of Pi Movie Using Maslow'S Theory
No ratings yet
Analysis of The Main Character Needs in Life of Pi Movie Using Maslow'S Theory
16 pages

Different Types of Data - BioStatistics

Uploaded by

Different Types of Data - BioStatistics

Uploaded by

Module 1: Different Types of Data

Please read the following:

Chapter II (Graphing Distributions)

• section 2 - Qualitative Variables

• section 3 - Quantitative Variables

Unfortunately, it won’t always be possible to gather data by using an experimental

Skip to main content

Populations and Parameters

Populations are usually thought of as how many people are in a city/state/country/world.

Samples and Statistics

Simple Random Sample

Finally, we need to talk about replication. A replication is when multiple experimental

Completely Randomized Design

The first type of experimental design is a completely randomized design (CRD). In a

Completely Randomized Block Design

An extension to a CRD is a completely randomized block design (CRBD). With a CRBD

Please read the following:

Chapter VI (Research Design)

• section 2 - Scientific Method

• section 4 - Data Collection

• section 5 - Sampling Bias

• section 6 - Experimental Designs

You might also like