0% found this document useful (0 votes)
3 views

5, 6, 7, 8) Principles of Statistical Analysis + Elements of Statistical Interference + Testing Statistical Hypothesis + Types of Errors in Statistical Inference - 10, 18 Dec 2

The document outlines the principles of statistical analysis, emphasizing the importance of using sample data to make inferences about a larger population. It discusses key concepts such as Bayes theorem, descriptive and inferential statistics, hypothesis testing, and the significance levels in statistical testing. Additionally, it highlights the potential for Type I and Type II errors in hypothesis testing and the importance of sample representativeness.

Uploaded by

khchncnxwn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

5, 6, 7, 8) Principles of Statistical Analysis + Elements of Statistical Interference + Testing Statistical Hypothesis + Types of Errors in Statistical Inference - 10, 18 Dec 2

The document outlines the principles of statistical analysis, emphasizing the importance of using sample data to make inferences about a larger population. It discusses key concepts such as Bayes theorem, descriptive and inferential statistics, hypothesis testing, and the significance levels in statistical testing. Additionally, it highlights the potential for Type I and Type II errors in hypothesis testing and the importance of sample representativeness.

Uploaded by

khchncnxwn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Course 2

Lesson 5-6
-Principles of Statistical Analysis
-Elements of Statistical Inferences

Assist. Prof. Emrah Gökay ÖZGÜR


Department of Biostatistics
Dersin basinda

busoruyuaozdiik.no

Bayes theorem

We use Bayes theorem


40 Doctor P D 0,06
80 Teacher PT 0,03

mammen pop no
40 0,06
240
PLD 581 0,06
1 0,06 31 0,03 21.002 1
Introduction
• When we analyze medical data for research purposes the intention is to extrapolate
the findings from a sample of individuals to the population of all similar
individuals.

• We see this most clearly in animal and laboratory studies as well as in much
epidemiological research, where the data cannot be identified with individual
subjects, but it applies equally to case-control studies, clinical trials and indeed to
clinical research in general.
Introduction
• While we may also be interested in each individual from a clinical point of view,
research is usually aimed at summarizing the experience of many individuals to
draw general conclusions.

• Thus one of the the main ideas of statistics is this - the aim of statistical analysis is
to use the information gained from a sample of individuals to make inferences
about the relevant population. clearim
Arastirma Adimlaris
Problem
AsininbagisikligaetkisiGalismatiir.li
Prospektif
Dahiletneveharia tutma
Kriterleri belirleme Galismaya Kimler
dahiledilecekkimleredilmeyec.ee
Ornellem Buyukligihesaplana Kaa
Kisi
arastirmayadahiledilecek.si
Orneksegimi.CO
asilikliorneklemseaimi
300 Eisi seaildi veveri
topland
Veri Kontroli
Eldeedilenverilerindogrusekildesistemegirildigindenemi.no

lunar
BigoistatistikselVeri Analizi
Sonucaulasma
Yorumlama Tartisma

Tindersin
incelenerpopulasyon
jzeti
Bigoistatistiesel Veri Analizi 2 Gesit
Introduction
Descriptive Statistic comparison of
group
• In most research studies some data are collected for descriptive purposes for example
information about the demographic and clinical characteristics of subjects being studies.

• The first step in the analysis of a set of data is to describe such basic data, and simple
descriptive methods for this. In observational studies most if not not trials and laboratory
experiments, are explicitly comparisons between different sets of observations.
Siralama
Principles of Statistical Analysis sorulabilis
• Statistical analysis involves the collection, interpretation, presentation, and organization
of data. It helps in drawing meaningful conclusions from data and making informed
decisions. There are several principles that guide statistical analysis:

1 Clearly Defined Objectives:

• Before starting any statistical analysis, it is essential to have a clear understanding of the
objectives. What are you trying to achieve or learn from the data?
2
Data Collection:

• Ensure that the data collected is relevant to the research question or problem at hand.
• Use appropriate methods for data collection to minimize bias and errors.

Randomization:
3
• Randomization helps in reducing bias and ensures that the sample is representative of the
population. Random sampling and assignment are crucial in many experimental designs.
4 Replication:

• Replicating experiments or studies helps in assessing the consistency and reliability of the
results. It adds strength to the findings and increases the generalizability of the results.

5
Descriptive Statistics:

• Begin with descriptive statistics to summarize and present the main features of the data.
This includes measures such as mean, median, mode, range, and standard deviation.

insist iii iii


6 Inferential Statistics:

• Use inferential statistics to make predictions or inferences about a population


based on a sample of data. This involves hypothesis testing and estimation.

7 Statistical Significance:

• Statistical significance indicates whether an observed effect or relationship is


likely not due to chance. It is often expressed through p-values, with a lower p-
value suggesting stronger evidence against a null hypothesis.
Sampling Distribution
• The most important idea is that we take results obtained in the sample and use
them as our best estimate of what is true for the relevant population.
So for example;
• if we find that a new treatment for psoriasis relieves the symptoms of patients
more often than on standard treatment,
• serum cholesterol is higher in men than women, or
• a certain combination of temperature and light optimizes cell growth in a
laboratory experiment;
then each case we would expect that the same is likely to be true in the population.

For this interpretation to be valid the sample must be representative of the


population as we stated previously.
Sampling Distribution

• The theoretical probability distribution of any statistic based on samples of size n


drawn at random from a population is referred to as sampling distribution.

• A probability of all possible sample means of a fixed n drawn from a given


population of measurements is known as a sampling distribution of means.
• Let's have a data set of 25 serum albumin values, n=25.
• We are not dealing with the individual values. We are dealing with the mean
and SD from the sample of 25 people (sampling distribution). Instead of dealing
with the original distribution of values, we must consider "what would happen if
we repeatedly sampled 25 people (sampling distribution of means) and measured
their serum albumin.
• That is to say, we did the study a million times, using 25 subjects each time,
calculated the mean and SD and then displayed all these means and Sds.
• It should seem evident that these mean values for a sample size 25 would be more tightly
distributed about the true mean (population mean) than would the original individual
values.

• So, the means of sample means are reasonably good approximations to the population
because the mean of a sampling distribution of means is to the population mean.

• The means of randomly selected samples are unbiased estimators of the population mean.
We select 100.000 samples from the population.

Samples Sample Means Sample Standard Deviation

h1, h2, h3, ......., h25 s1

h1, h2, h3, ......., h25 s2

. . .

. . .

. . .

. . .

. . .

. . .

h1, h2, h3, ......., h25 S100.000

Sampling distribution of
means:
We deal with mean of all
sample means and with
standard deviation of these
sample means.
EE.IQ
temsil908

Mean of themortalamarin
means of ortalamasi
Population possible
samples =
Possible samples
Nortalamasi
Means of population
possible samples mean

BURADA DURDUK
• What can we say about the variability of the means of these samples in relation to the population (i.e.true)
means?
• In other words, is the variability among sample means the same as the variability among population
measurements?
• The answer is no.
• Because the variability among sample means is higher than the variability in the population measurements
and that the variability declines with increasing sample size.
• The expected value of the standard deviation of the means of several samples, which is now called standard
error of the mean (SEM) and, abbreviated to standard error (SE) is; as shown below.

SD SD
SEM = ----------------- = -------
 sample size n
Point and Interval Estimation
• If we do not know the population mean ( as is usually the case), we'll be forced to
rely on sampling as a way of obtaining an estimate.

• Indeed, any one of 100 sample means can be regarded as an unbaised estimator
of the population mean.

• For example, if we select the first sample under n=5, our best estimate of the
population mean will be 35 g/l(serum albumin). This estimate, involving a single
value is called a point estimate.
Point and Interval Estimation

• It's sometimes hard to frame the findings in the terminology of absolutes or point
estimates, we prefer to express our findings in terms of interval estimates.

• For example: Candidate A will gain 70% votes (a point estimate) or candidate A
will gain 60% - 70% of votes (an interval estimate).
Testing Statistical Hypotheses
• Much of the activity of professionals in biological and health sciences involves
testing either formal hypotheses or informal hunches about the effects of
independent variables.
Here an example;
• The mean height of 38 newborn male babies was 52 cm and the standard deviation
was 1.87 cm. The mean height in the general population is 50.8 cm. Are the
heights of 38 newborn boys different from the general population?
How, then do we go about deciding whether the data support or not in hypothesis?

• Let's use the example above to illustrate the process of hypothesis testing.

• Are the heights of 38 newborn boys the same as the general population or not?

• To start with, the researcher sets up two mutually exclusive and exhaustive
hypotheses (mutually exclusive because both cannot be correct, exhaustive one of
them must be correct), called the null hypothesis and the alternative hypothesis.
The null hypothesis (H0) states some expectation about the parameter, usually in the
negative form (hence the use of the term null). In the present example, the null
hypothesis might take the following form:
H0: The mean height of 38 newborn male babies is not different from the mean
height in the general population..

Null and alternative hypotheses are always expressed in terms of population


parameters, so the symbolic form of H0, when dealing in means ;

H 0 : 0 = 
where 0 is the mean value hypothesized under the null hypothesis.
• By contrast, the alternative hypothesis (H1) denies the validity of the null
hypothesis. Thus, in the present example, it would be expressed as follows;

H1: The mean height of newborn male babies is different from the mean height in the
general population symbolically;, it is expressed as;

H1 : 0  
• How do we go about deciding which is correct? The answer is testing.

• Thus, statistical testing or proving of a scientific hypothesis comes in the form of


rejecting the null hypothesis. If the null hypothesis can be legitimately rejected,
then the alternative hypothesis must be asserted.
Level of Significance

• Some scientists are willing to reject the null hypothesis and assert the alternative hypothesis if the
result they obtained would have occurred, by chance with a probability of 5% of the time or less.

• In other words, after calculating a test statistic ( a value which we can compare with the known
distribution of what we expect when the null hypothesis is true), decision on hypothesis is made
with using  or p-value ( which is the tail area of the distribution), for designing the significance
level.

• Thus, when the 5% significance level is used, =0.05 or p=0.05.


Level of Significance

• The 0.01 significance level is a more conservative criterion for rejecting H0. If the
probability is greater than 5% (p>0.05), H0 is not rejected.

• The P value or calculated probability is the estimated probability of rejecting the


null hypothesis (H0) of a study question when that hypothesis is true.
Two Types of Errors of Inference

• Two possible errors can be made when using  or p to make a decision.


• Firstly we can obtain a significant result(p<0.05), and thus reject the null
hypothesis is true. This is called a Type I error, and may be thought of as a "false
positive" result.
• Alternatively, we may obtain a non significant result (p>0.05) when the null
hypothesis is not true in which case we make a Type II error. In other words, we
do not reject(accept) H0, when null hypothesis is not true.
Two Types of Errors of Inference

• The probability of Type I error is called  and the probability of Type II error is
called  .
• For any hypothesis test the value  is determined in advance, usually as 5%.
• The value of  depends on the size of effect that one is interested in, and also the
sample size.
• Now often we talk about the power of a study to detect an effect of specified size,
where the power is 1- or 100 (1-)%.
Two Types of Errors of Inference

You might also like