0% found this document useful (0 votes)

27 views28 pages

1a-Biostat Review

Uploaded by

dieu2802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views28 pages

1a-Biostat Review

Uploaded by

dieu2802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introductory HIV/AIDS Data Analysis

Workshop
Pham Ngoc Thach University of Medicine

Rishi Chakraborty1,2
1Center for AIDS Research, Duke University, North Carolina, USA
2Department of Biostatistics and Bioinformatics, Duke University, North Carolina, USA

March 19-20, 2024

Biostatics Review

1
MODELS

Relate a dependent variable,outcome,

or response,Y, to some other variable(s).

These other variable(s) are called

s
independent or regressor variables, X’ .

LINEAR MODELS
Y is Normally distributed. Y ~ N( µ , σ2 )
2
If all the independent variables are numeric,
we have regression.

If all the independent variables are,

categorical, we have analysis of variance.

If we have both types of independent

variables, we have analysis of covariance.

LINEAR MODELS
3
Exceptions

Correlation - not a model at all

Logistic regression - dichotomous outcome

Poisson regression - count outcome

4
The sciences do not try to explain, they hardly
even try to interpret, they mainly make
models. By a model is meant a mathematical
construct which, with the addition of certain
verbal interpretations, describes observed
phenomena. The justification of such a
mathematical construct is solely and precisely
that it is expected to work.

John Von Neumann

5
Population (parameters, N obs.)

Inferential
Probability
Statistics

Sample (statistics, n obs.)

6
Statistics
n
∑ Xi
Sample mean, X = i=1
n
Sample median n

2
∑ (X i - X) 2

Sample variance, s = i=1

(n -1)
2
Sample standard deviation, s = s 7
Probability

0 < p < 1

Discrete Distributions

Binomial - number of successes in n trials

Poisson - number of events in an interval

8
Continuous Distributions
Normal Y ~ N( µ , σ2 )

µ−σ µ µ+σ 9
A standard Normal distribution is one where
µ = 0 and σ2 = 1. This is denoted by Z.
Z ~ N(0 , 1)

-3 -2 -1 0 1 2 3 10
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.

P(Z < 1.27)

-3 -2 -1 0 1 2 3 11
1.27
Table A (continued)

Cumulative Probabilities for the Standard Normal (Z) Distribution

Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 Z

0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359 0.00
0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.10
0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.20
0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.30
0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.40

0.50 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.50
0.60 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.60
0.70 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.70
0.80 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.80
0.90 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 0.90
1.00 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.00

1.10 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.10
1.20 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.20
1.30 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.30
1.40 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.40
1.50 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1.50
12
Table A of the statistical tables gives cumulative
probabilities for a standard Normal distribution.

P(Z < 1.27)

= .8980

-3 -2 -1 0 1 2 3 13
1.27
For other Normal distributions, we can convert
to a standard Normal by standardizing.
Y-µ
Z = ~ N(0 , 1)
σ

Y = diastolic blood pressure Y ~ N(77 , 11.62)

 60 - 77 
P(Y < 60) = P  Z < 
 11.6 
= P(Z < -1.47) = .0708 14
Other Distributions

t
- one parameter, called the df
- similar to a Z, but with “fatter tails”

- specific percentiles are in Table B

t(12),.95

15
Table B
Percentiles of the t-Distribution

df t.60 t.70 t.80 t.90 t.95 t.975 t.99 t.995 t.9995

1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657 636.619

2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925 31.599
3 0.277 0.584 0.978 1.638 2.353 3.182 4.541 5.841 12.924
4 0.271 0.569 0.941 1.533 2.132 2.776 3.747 4.604 8.610
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032 6.869
6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.707 5.959
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499 5.408
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355 5.041
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250 4.781
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169 4.587
11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.106 4.437
12 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.055 4.318
13 0.259 0.538 0.870 1.350 1.771 2.160 2.650 3.012 4.221
14 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.977 4.140
15 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947 4.073

16
Other Distributions

t
- one parameter, called the df
- similar to a Z, but with “fatter tails”

- specific percentiles are in Table B

t(12),.95 = 1.782

For “lower tail” values, t(df),α = -t(df),1-α 17

χ2
- one parameter, called the df
- specific percentiles are in Table C

F
- two parameter, called the numerator df
and the denominator df
- specific percentiles are in Tables D1 – D3
18
Sampling Distributions

The mean of a sampling distribution is called

the expected value of the statistic.

The standard deviation of a sampling distribution

is called the standard error of the statistic.

19
Sampling Distribution of X

E(X) = µ
σ2 σ
Var(X) = ⇒ s.e.(X) =
n n
2
 σ 
If X ~ N(µ , σ ), then X ~ N  µ , 
2
 n 
X -µ
⇒ ~ N ( 0 , 1)
σ/ n 20
Central Limit Theorem

For n sufficiently large, the sampling

distribution of X is at least approximately
Normal for any underlying distribution!

X -µ
~ N ( 0 , 1)
σ/ n
21
Statistical Inference
- Estimation
- Hypothesis Testing
A point estimate is a single statistic that is
used to estimate a population parameter.
We can also estimate a parameter by a
100(1-α)% confidence interval.

This has a probability of “capture” of (1-α).

22
µ
( )
( )
( )
( )
( ) 100(1-α)% of
( ) these intervals
. will capture the
.
. parameter (µ)
.
.
( )
23
Form of most confidence intervals:

point estimate ± (table value)(std. error)

A 100(1-α)% C. I. for µ is:

(
X ± t (n−1),1−α/2 ) s
n
24
Hypothesis testing

Test a null hypothesis, H0,

against an alternative hypothesis, H1.

Two possible decisions:

- Reject H0 (in favor of H1)

- Fail to reject H0
25
TRUTH
DECISION H0 true H1 true

Reject H0 Type I error correct

Fail to reject H0 correct Type II error

α = P(Type I error) = P(Reject H0 | H0 true)

α is the significance level of the test

β = P(Type II error) = P(Fail to reject H0 | H1 true)

Power = P(Reject H0 | H1 true) = 1 - β 26

p - values

The probability of getting a test statistic at

least as “extreme” (in the direction stated
by H1) as the one observed.

Reject H0 if the p-value < α.

27
Hypothesis Testing Steps
1) Determine hypotheses
2) Decide on α ( .01 , .05 , .10 )

3 & 4) State rejection region, calculate test statistic

(or)
Calculate test statistic and p-value

5) Make decision (reject or not reject)

6) Write conclusions (interpret results),
in the context of the problem 28

Statistical Formulae and Tables
No ratings yet
Statistical Formulae and Tables
12 pages
Homework 5
No ratings yet
Homework 5
5 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Microsoft Word - AS - A Level 9231 9709 Mathematics MF19 2020
No ratings yet
Microsoft Word - AS - A Level 9231 9709 Mathematics MF19 2020
2 pages
Tables Exam 20212022
No ratings yet
Tables Exam 20212022
3 pages
Two-Sample Estimate of A Common Variance: 2 + +: Sampling and Testing X X X X S N N
No ratings yet
Two-Sample Estimate of A Common Variance: 2 + +: Sampling and Testing X X X X S N N
2 pages
Data Science Formula - Very Imp
No ratings yet
Data Science Formula - Very Imp
6 pages
Tabelas de Distribuição - Econometria
No ratings yet
Tabelas de Distribuição - Econometria
8 pages
Formula Sheet For Maths
No ratings yet
Formula Sheet For Maths
5 pages
Week 6
No ratings yet
Week 6
14 pages
Normal
No ratings yet
Normal
1 page
417318 List of Formulae and Statistical Tables (2)
No ratings yet
417318 List of Formulae and Statistical Tables (2)
1 page
Normal Distribution Z Table
100% (1)
Normal Distribution Z Table
1 page
Normal Distribution Table
No ratings yet
Normal Distribution Table
1 page
MF 19 Math A Level Formula
No ratings yet
MF 19 Math A Level Formula
1 page
Microsoft Word - AS - A Level 9231 9709 Mathematics MF19 2020
No ratings yet
Microsoft Word - AS - A Level 9231 9709 Mathematics MF19 2020
1 page
Normal Distribution Table - A Level
No ratings yet
Normal Distribution Table - A Level
1 page
BS Lect 13
No ratings yet
BS Lect 13
34 pages
Data Analytics
No ratings yet
Data Analytics
10 pages
Statistical Tables SNM MA3251
No ratings yet
Statistical Tables SNM MA3251
10 pages
Statistical Tables All in One
No ratings yet
Statistical Tables All in One
6 pages
Business Statistics Formula
No ratings yet
Business Statistics Formula
4 pages
Statistics and Probability
No ratings yet
Statistics and Probability
4 pages
MF15
100% (1)
MF15
8 pages
Lecture Notes 5 - EDA - Continuous Random Variable
No ratings yet
Lecture Notes 5 - EDA - Continuous Random Variable
27 pages
Formula List MF15
No ratings yet
Formula List MF15
8 pages
Stat 5002 Final Exam Formulas W 21
No ratings yet
Stat 5002 Final Exam Formulas W 21
7 pages
List of Formulae and Statistical Tables
No ratings yet
List of Formulae and Statistical Tables
4 pages
Distribution Tables
No ratings yet
Distribution Tables
16 pages
STAT 5002 Final Exam Formulas
No ratings yet
STAT 5002 Final Exam Formulas
8 pages
Statistial Tables
No ratings yet
Statistial Tables
2 pages
Please DO NOT Bring This Formula Sheet To The Class Room On Exam Day. Formula Sheet and Function Tables Will Be Provided
No ratings yet
Please DO NOT Bring This Formula Sheet To The Class Room On Exam Day. Formula Sheet and Function Tables Will Be Provided
5 pages
CEN 414 - Tutorial 4
No ratings yet
CEN 414 - Tutorial 4
3 pages
List of Formula FSPH 0054
No ratings yet
List of Formula FSPH 0054
10 pages
MF26
No ratings yet
MF26
12 pages
Final Exam Sheet
No ratings yet
Final Exam Sheet
3 pages
STAT 221 Introduction To Probabibility
No ratings yet
STAT 221 Introduction To Probabibility
13 pages
Prof. S P Bansal: Co - Principal Investigator
No ratings yet
Prof. S P Bansal: Co - Principal Investigator
18 pages
Exam SRM Tables
No ratings yet
Exam SRM Tables
3 pages
T Tables
No ratings yet
T Tables
3 pages
Table
No ratings yet
Table
4 pages
Normal Student Chi Square
No ratings yet
Normal Student Chi Square
3 pages
BYU Stat 121 Statistical Tables
No ratings yet
BYU Stat 121 Statistical Tables
1 page
标准正态分布表
No ratings yet
标准正态分布表
2 pages
MF 10
No ratings yet
MF 10
12 pages
Note 30 Nov 2024 3
No ratings yet
Note 30 Nov 2024 3
3 pages
L4 Continuous Probability
No ratings yet
L4 Continuous Probability
28 pages
Note 30 Nov 2024 3
No ratings yet
Note 30 Nov 2024 3
3 pages
Note 30 Nov 2024 2
No ratings yet
Note 30 Nov 2024 2
3 pages
Note 30 Nov 2024 5
No ratings yet
Note 30 Nov 2024 5
3 pages
Lec 6 - Continuous Random Variables
No ratings yet
Lec 6 - Continuous Random Variables
27 pages
111 08.6 Lecture Notes
0% (1)
111 08.6 Lecture Notes
5 pages
AP Stats Reference Sheet
No ratings yet
AP Stats Reference Sheet
7 pages
Econometric Formulae Statistical Tables
No ratings yet
Econometric Formulae Statistical Tables
4 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
No ratings yet
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
14 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Cubic Spline L16
No ratings yet
Cubic Spline L16
23 pages
Applied Logistic Regression Analysis
No ratings yet
Applied Logistic Regression Analysis
36 pages
Example 04.02 Butler With Deliveries-JayDomingoFinal
No ratings yet
Example 04.02 Butler With Deliveries-JayDomingoFinal
75 pages
Soybean Annual Balance Sheet
No ratings yet
Soybean Annual Balance Sheet
27 pages
Saputri, Pengaruh-Body-Condition-Score-Periode-Steaming-Up-Terhadap-Jumlah-dan-Lama-Produksi-Kolostrum
No ratings yet
Saputri, Pengaruh-Body-Condition-Score-Periode-Steaming-Up-Terhadap-Jumlah-dan-Lama-Produksi-Kolostrum
6 pages
BurkeyAcademy Spatial Regression CheatSheet 0.65
No ratings yet
BurkeyAcademy Spatial Regression CheatSheet 0.65
2 pages
Sources of Error
No ratings yet
Sources of Error
11 pages
Econometrics Summary
No ratings yet
Econometrics Summary
6 pages
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
No ratings yet
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
26 pages
Regression Analysis and Modelling - Amar Sahay
No ratings yet
Regression Analysis and Modelling - Amar Sahay
93 pages
Heteroscedasticity Week 1 Econometrics
No ratings yet
Heteroscedasticity Week 1 Econometrics
33 pages
Mondali 2016
No ratings yet
Mondali 2016
9 pages
Case Study - Temp Viscosity and Comp
No ratings yet
Case Study - Temp Viscosity and Comp
4 pages
CH - 10 - Basic Regression Analysis With Time Series Data
No ratings yet
CH - 10 - Basic Regression Analysis With Time Series Data
27 pages
Unit 2 DS
No ratings yet
Unit 2 DS
10 pages
Simple+Moderation+in+JAMOVI+-+v 3 +-+10 7 21
No ratings yet
Simple+Moderation+in+JAMOVI+-+v 3 +-+10 7 21
21 pages
Auto Regressive Model
No ratings yet
Auto Regressive Model
3 pages
MIDAS Usersguide V2.3
No ratings yet
MIDAS Usersguide V2.3
57 pages
8 SLR Gsba 545 2024
No ratings yet
8 SLR Gsba 545 2024
28 pages
QM 8 Panel Regression, Random Effects
No ratings yet
QM 8 Panel Regression, Random Effects
39 pages
ANOVA Notes ST - Xavier's College Kolkata
No ratings yet
ANOVA Notes ST - Xavier's College Kolkata
49 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Regression Modeling
No ratings yet
Regression Modeling
3 pages
The Coefficient of Determinaton Exposed - Hahn
No ratings yet
The Coefficient of Determinaton Exposed - Hahn
4 pages
Econometrics I CH 3 MLR
No ratings yet
Econometrics I CH 3 MLR
30 pages
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
No ratings yet
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
637 pages
Vector Autoregressions Model
No ratings yet
Vector Autoregressions Model
6 pages
A Model For Integrating Fixed Random and
No ratings yet
A Model For Integrating Fixed Random and
21 pages

1a-Biostat Review

Uploaded by

1a-Biostat Review

Uploaded by

Introductory HIV/AIDS Data Analysis

March 19-20, 2024

Relate a dependent variable,outcome,

These other variable(s) are called

If all the independent variables are,

If we have both types of independent

Correlation - not a model at all

Logistic regression - dichotomous outcome

Poisson regression - count outcome

John Von Neumann

Sample (statistics, n obs.)

Sample variance, s = i=1

Binomial - number of successes in n trials

Poisson - number of events in an interval

P(Z < 1.27)

Cumulative Probabilities for the Standard Normal (Z) Distribution

P(Z < 1.27)

Y = diastolic blood pressure Y ~ N(77 , 11.62)

- specific percentiles are in Table B

df t.60 t.70 t.80 t.90 t.95 t.975 t.99 t.995 t.9995

1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.657 636.619

- specific percentiles are in Table B

For “lower tail” values, t(df),α = -t(df),1-α 17

The mean of a sampling distribution is called

The standard deviation of a sampling distribution

For n sufficiently large, the sampling

This has a probability of “capture” of (1-α).

point estimate ± (table value)(std. error)

A 100(1-α)% C. I. for µ is:

Test a null hypothesis, H0,

Two possible decisions:

- Reject H0 (in favor of H1)

Reject H0 Type I error correct

Fail to reject H0 correct Type II error

α = P(Type I error) = P(Reject H0 | H0 true)

β = P(Type II error) = P(Fail to reject H0 | H1 true)

Power = P(Reject H0 | H1 true) = 1 - β 26

The probability of getting a test statistic at

Reject H0 if the p-value < α.

3 & 4) State rejection region, calculate test statistic

5) Make decision (reject or not reject)

You might also like