0% found this document useful (0 votes)
2 views

07. Notes_ Application of Statistical Tools

The document discusses the application of biostatistical and mathematical tools in research, emphasizing the importance of data analysis for deriving meaningful insights. It outlines basic statistical tools, definitions of key terms such as error, accuracy, precision, and bias, and describes various statistical tests like t-tests and F-tests for comparing data sets. Additionally, it highlights the significance of understanding data distribution and the use of software for statistical analysis to enhance research quality.

Uploaded by

BASF18M0 40
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

07. Notes_ Application of Statistical Tools

The document discusses the application of biostatistical and mathematical tools in research, emphasizing the importance of data analysis for deriving meaningful insights. It outlines basic statistical tools, definitions of key terms such as error, accuracy, precision, and bias, and describes various statistical tests like t-tests and F-tests for comparing data sets. Additionally, it highlights the significance of understanding data distribution and the use of software for statistical analysis to enhance research quality.

Uploaded by

BASF18M0 40
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

FRW 408/ FRW 504

7. APPLICATION OF BIO-MATHEMATICAL AND STATISTICAL TOOLS

We carry out research to test hypotheses, and we do that by getting hold of data. Hopefully, if
our experiments are planned and executed correctly, we can get hold of good data that can tell
us something unique about the world.
While the first part of any experiment – the planning and execution – is critically important, it
is only half the battle. How the data is treated is just as important, and analyzing good data in
the right way can lead to groundbreaking findings and insights.
Data analysis is often seen as the most scary aspect of completing research, but it doesn’t
have to be that way. While you’ll need to understand what to do with the data, and how to
interpret the results, software that is designed for statistical analysis can make this process as
smooth and as easy as possible.

BASIC STATISTICAL TOOLS

Data input format for MINITAB / SPSS


Knowing Menu Bar of the App/ software
Performing / using Menu Functions

[One tool for the quality assurance of the research work is the statistical operations necessary
to control and verify the analytical procedures as well as the resulting data.
Making mistakes in analytical work is unavoidable. This is the reason why a complex system
of precautions to prevent errors, and to detect them, has to be set up. For the detection itself
as well as for the quantification of the errors, statistical treatment of data is indispensable.
A multitude of different statistical tools is available, some of them simple, some complicated,
and often very specific for certain purposes.
In analytical work, the most important common operation is the comparison of data, or
sets of data, to quantify accuracy and precision. Fortunately, most of the information needed
in regular laboratory work can be obtained by: the "t-test, the "F-test", and regression
analysis. The value of statistics lies with organizing and simplifying data, to permit some
objective estimate showing that an analysis is under control or that a change has occurred.

Definitions of some Terms used in Statistics

Error
Error is the collective noun for any departure of the result from the "true" value. Analytical
errors can be:
1. Random or unpredictable deviations between replicates, calculated as with the "standard
deviation".
2. Systematic or predictable regular deviation from the "true" value, quantified as "mean
difference" (i.e. the difference between the true value and the mean of replicate
determinations).
The "true" value of an attribute is by nature indeterminate and often has only a very relative
meaning. Particularly in soil science for several attributes there is no such thing as the true
value as any value obtained is method-dependent (e.g. cation exchange capacity).

Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by
a combination of random and systematic errors (precision and bias) and cannot be quantified

873 / 1351


FRW 408/ FRW 504

directly. The test result may be a mean of several values. An accurate determination produces
a "true" quantitative value, i.e. it is precise and free of bias.

Precision
The closeness with which results of replicate analyses of a sample agree. It is a measure of
dispersion or scattering around the mean value and usually expressed in terms of standard
deviation, standard error or a range (difference between the highest and the lowest result).

Bias
The consistent deviation of analytical results from the "true" value caused by systematic
errors in a procedure. Bias is the opposite but most used measure for "trueness" which is the
agreement of the mean of analytical results with the true value, i.e. excluding the contribution
of randomness represented in precision.

Basic Statistics

Some understanding of basic statistics is essential and they will briefly be discussed here.
The basic/ first step in data analysis, after collection of data, is to know that a set of data has a
normal distribution. (When the distribution is skewed statistical treatment is more
complicated). The primary parameters used whether data is normally distributed or not are
the mean (or average) and the standard deviation and the other main tools the F-test, the t-
test, and regression and correlation analysis.

Mean
The average of a set of n data xi:
¯ (6.1)

Standard deviation
This is the most commonly used measure of the spread or dispersion of data around the mean.
The standard deviation is defined as the square root of the variance (V). The variance is
defined as the sum of the squared deviations from the mean, divided by n-1. Operationally,
there are several ways of calculation:

or

or

874 / 1351


FRW 408/ FRW 504

The calculation of the mean and the standard deviation can easily be done on a calculator but
most conveniently on a PC with computer programs such as dBASE, Lotus 123, Quattro-Pro,
Excel, and others, which have simple ready-to-use functions.

Relative standard deviation & Coefficient of variation


Although the standard deviation of analytical data may not vary much over limited ranges of
such data, it usually depends on the magnitude of such data: the larger the figures, the larger
s. Therefore, for comparison of variations (e.g. precision) it is often more convenient to use
the relative standard deviation (RSD) than the standard deviation itself. The RSD is expressed
as a fraction, but more usually as a percentage and is then called coefficient of variation
(CV). Often, however, these terms are confused.

Note. When needed (e.g. for the F-test) the variance can, of course, be calculated by squaring
the standard deviation:
V = s2

Confidence limits of a measurement


The more a measurement is replicated, the closer the mean x of the results will approach the
"true" value, of the analyte content (assuming absence of bias).

Example
For the determination of the clay content in the particle-size analysis, a semi-automatic
pipette installation is used with a 20 mL pipette. This volume is approximate and the
operation involves the opening and closing of taps. Therefore, the pipette has to be calibrated,
i.e. both the accuracy (trueness) and precision have to be established.
A tenfold measurement of the volume yielded the following set of data (in mL):
19.941 19.812 19.829 19.828 19.742
19.797 19.937 19.847 19.885 19.804
The mean is 19.842 mL and the standard deviation 0.0627 mL. According to data set n = 10,
its ttab = 2.26 (df = 9) and this calibration yields:

pipette volume = 19.842 ± 2.26 (0.0627/ ) = 19.84 ± 0.04 mL

875 / 1351




FRW 408/ FRW 504

Statistical tests
In research work a frequently recurring operation, explication or repetition of taking data is
the verification of performance of treatments by comparison of data. Some examples of
comparisons of treatments are:
- performance of two instruments,
- performance of two methods,
- performance of a procedure in different periods,
- performance of two analysts or laboratories,
- results obtained for a reference or control sample with the "true", "target" or "assigned"
value of the sample.
Some of the most common and convenient statistical tools to quantify such comparisons are
the F-test, the t-tests, and regression analysis. These tests examine if two sets of normally
distributed data are similar or dissimilar (belong or not belong to the same "population") by
comparing their standard deviations and means respectively.

Two-sided vs. one-sided test


These tests for comparison, for instance between methods A and B, are based on the
assumption that there is no significant difference (the "null hypothesis"). In other words,
when the difference is so small that a tabulated critical value of F or t is not exceeded, we can
be confident (usually at 95% level) that A and B are not different.
Two fundamentally different questions can be asked concerning both the comparison of the
standard deviations s1 and s2 with the F-test, and of the means¯x1, and ¯x2, with the t-test:
1. are A and B different? (two-sided test)
2. is A higher (or lower) than B? (one-sided test).
This distinction has an important practical implication as statistically the probabilities for the
two situations are different: the chance that A and B are only different ("it can go two ways")
is twice as large as the chance that A is higher (or lower) than B ("it can go only one way").
The most common case is the two-sided (also called two-tailed) test: there are no particular
reasons to expect that the means or the standard deviations of two data sets are different.

F-test for precision


The F-test (or Fisher's test) is a comparison of the spread of two sets of data to test if the sets
belong to the same population, in other words if the precisions are similar or dissimilar.
The test makes use of the ratio of the two variances:

where the larger s2 must be the numerator by convention. If the performances are not very
different, then the estimates s1, and s2, do not differ much and their ratio (and that of their
squares) should not deviate much from unity. In practice, the calculated F is compared with
the applicable F value in the F-table (also called the critical value).

If Fcal < Ftab one can conclude with 95% confidence that there is no significant difference in
precision (the "null hypothesis" that s1, = s, is accepted). Thus, there is still a 5% chance that
we draw the wrong conclusion. In certain cases more confidence may be needed, then a 99%
confidence table can be used, which can be found in statistical textbooks.

876 / 1351


FRW 408/ FRW 504

Example I (two-sided test)


Give an example with real data set from the forestry/ range related research

Example 2 (one-sided test)

t-Tests for bias

Depending on the nature of two sets of data (n, s, sampling nature), the means of the sets can
be compared for bias by several types of the t-test.
1. Student's t-test for comparison of two independent sets of data with very similar standard
deviations;
2. the Cochran variant of the t-test when the standard deviations of the independent sets
differ significantly;
3. the paired t-test for comparison of strongly dependent sets of data.

Basically, for the t-tests Equation is written in a way:

where
¯x = mean of test results of a sample
m = "true" or reference value
s = standard deviation of test results
n = number of test results of the sample.
To compare the mean of a data set with a reference value normally the "two-sided t-table of
critical values" is used. The applicable number of degrees of freedom here is: df = n-1
If a value for t calculated does not exceed the critical value in the table, the data are taken to
belong to the same population: there is no difference and the "null hypothesis" is accepted
(with the applicable probability, usually 95%).
As with the F-test, when it is expected or suspected that the obtained results are higher or
lower than that of the reference value, the one-sided t-test can be performed: if tcal > ttab, then
the results are significantly higher (or lower) than the reference value.

Similarity or non-similarity of standard deviations


When using the t-test for two small sets of data (n1 and/or n2<30), a choice of the type of test
must be made depending on the similarity (or non-similarity) of the standard deviations of the
two sets. If the standard deviations are sufficiently similar they can be "pooled" and the
Student t-test can be used. When the standard deviations are not sufficiently similar an
alternative procedure for the t-test must be followed in which the standard deviations are not
pooled. A convenient alternative is the Cochran variant of the t-test. The criterion for the
choice is the passing or non-passing of the F-test, that is, if the variances do or do not
significantly differ. Therefore, for small data sets, the F-test should precede the t-test.
For dealing with large data sets (n1, n2,³ 30) the "normal" t-test is used

877 / 1351


FRW 408/ FRW 504

Student's t-test
(To be applied to small data sets (n1, n2 < 30) where s1, and s2 are similar according to F-test.
When comparing two sets of data, Equation is rewritten as:

where
¯x1 = mean of data set 1
¯x2 = mean of data set 2
sp = "pooled" standard deviation of the sets
n1 = number of data in set 1
n2 = number of data in set 2.
The pooled standard deviation sp is calculated by:

where
s1 = standard deviation of data set 1
s2 = standard deviation of data set 2
n1 = number of data in set 1
n2 = number of data in set 2.
To perform the t-test, the critical ttab has to be found in the table; the applicable number of
degrees of freedom df is here calculated by: df = n1 + n2 -2

Example
If for the two data sets tcal, is lower than the critical value ttab , the null hypothesis (no
difference between means) is accepted and the two data sets are assumed to belong to the
same population: there is no significant difference between the mean results of the two
treatments (with 95% confidence).
Note. Another illustrative way to perform this test for bias is to calculate if the difference
between the means falls within or outside the range where this difference is still not
significantly large. In other words, if this difference is less than the least significant difference
(lsd). This can be derived from Equation :

If the measured difference between the means is smaller than the lsd indicating that there is
no significant difference between the performance of the treatments.

Paired t-test
When two data sets are not independent, the paired t-test can be a better tool for comparison
than the "normal" t-test described in the previous sections. This is for instance the case when

878 / 1351


FRW 408/ FRW 504

two methods are compared by the same analyst using the same sample(s). It could, in fact,
also be applied, if the two analysts used the same analytical method at (about) the same time.

Example 1

If the calculated t value exceeds the critical value (df = n -1 = 9, one-sided), the null
hypothesis that the methods do not differ is rejected.
Note. Since such data sets do not have a normal distribution, the "normal" t-test which
compares means of sets cannot be used here (the means do not constitute a fair representation
of the sets). For the same reason no information about the precision of the two methods can
be obtained, nor can the F-test be applied.

Example 2 (give an example from forestry related experiment)

Linear correlation and regression


These also belong to the most common useful statistical tools to compare effects and
performances X and Y. Although the technique is in principle the same for both, there is a
fundamental difference in concept: correlation analysis is applied to independent factors: if X
increases, what will Y do (increase, decrease, or perhaps not change at all)?
In regression analysis a unilateral response is assumed: changes in X result in changes in Y,
but changes in Y do not result in changes in X.
For example, ……..

Even more convenient are the regression programs included in statistical packages such as
Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and others. Also, most spreadsheet
programs such as Lotus 123, Excel, and Quattro-Pro have functions for this.

Construction of calibration graph

During calculation, the maximum number of decimals is used, rounding off to the last significant figure is done
at the end.

879 / 1351



FRW 408/ FRW 504

Comparing two sets of data using many samples at different analyte levels
Although regression analysis assumes that one factor (on the x-axis) is constant, when certain
conditions are met the technique can also successfully be applied to comparing two variables
such as laboratories or methods.
Example

Analysis of variance (ANOVA)


When results are compared where more than one factor can be of influence and must be
distinguished from random effects, then ANOVA is a powerful statistical tool to be used.
Examples of such factors are: different analysts, samples with different pre-treatments,
different analyte levels, different methods). Most statistical packages for the PC can perform
this analysis.
For further discussion the reader is referred to statistical textbooks.

A great number of tools are available to carry out statistical analysis of data, and below we
list (in no particular order) the seven best packages suitable for research. Students are advised
to use any of these tools, preferably MINITAB & SPSS (to be installed in PCs).

1. SPSS (IBM)
SPSS, (Statistical Package for the Social Sciences) is perhaps the most widely used statistics
software package within human behavior research. SPSS offers the ability to easily compile
descriptive statistics, parametric and non-parametric analyses, as well as graphical depictions
of results through the graphical user interface (GUI). It also includes the option to create
scripts to automate analysis, or to carry out more advanced statistical processing.

2. R (R Foundation for Statistical Computing)


R is a free statistical software package that is widely used across both human behavior
research and in other fields. Toolboxes (essentially plugins) are available for a great range of
applications, which can simplify various aspects of data processing. While R is a very
powerful software, it also has a steep learning curve, requiring a certain degree of coding. It
does however come with an active community engaged in building and improving R and the
associated plugins, which ensures that help is never too far away.

3. MATLAB (The Mathworks)

MatLab is an analytical platform and programming language that is widely used by engineers
and scientists. As with R, the learning path is steep, and you will be required to create your
own code at some point. A plentiful amount of toolboxes are also available to help answer
your research questions (such as EEGLab for analysing EEG data). While MatLab can be
difficult to use for novices, it offers a massive amount of flexibility in terms of what you want
to do – as long as you can code it (or at least operate the toolbox you require).

880 / 1351



FRW 408/ FRW 504

4. Microsoft Excel

While not a cutting-edge solution for statistical analysis, MS Excel does offer a wide variety
of tools for data visualization and simple statistics. It’s simple to generate summary metrics
and customizable graphics and figures, making it a usable tool for many who want to see the
basics of their data. As many individuals and companies both own and know how to use
Excel, it also makes it an accessible option for those looking to get started with statistics.

5. SAS (Statistical Analysis Software)

SAS is a statistical analysis platform that offers options to use either the GUI, or to create
scripts for more advanced analyses. It is a premium solution that is widely used in business,
healthcare, and human behavior research alike. It’s possible to carry out advanced analyses
and produce publication-worthy graphs and charts, although the coding can also be a difficult
adjustment for those not used to this approach.

6. GraphPad Prism

GraphPad Prism is premium software primarily used within statistics related to biology, but
offers a range of capabilities that can be used across various fields. Similar to SPSS, scripting
options are available to automate analyses, or carry out more complex statistical calculations,
but the majority of the work can be completed through the GUI.

7. Minitab

The Minitab software offers a range of both basic and fairly advanced statistical tools for data
analysis. Similar to GraphPad Prism, commands can be executed through both the GUI and scripted
commands, making it accessible to novices as well as users looking to carry out more complex
analyses.

There are a range of different software tools available, and each offers something slightly different to
the user – what you choose will depend on a range of factors, including your research question,
knowledge of statistics, and experience of coding.
These factors could mean that you are at the cutting-edge of data analysis, but as with any research,
the quality of the data obtained is reliant upon the quality of the study execution. It’s therefore
important to keep in mind that while you might have advanced statistical software (and the knowledge
to use it) available to you, the results won’t mean much if they weren’t collected in a valid way.
We’ve put together a guide to experimental design, helping you carry out quality research so that the
results you collect can be relied on.

Follow this link: https://www.youtube.com/watch?v=d-r8jxsJJCA

References:
1. Anonymous (2019) http://www.fao.org/3/W7295E/w7295e08.htm (accessed on 08 July 2019)]
2. Bryn Farnsworth (2018) The Top 7 Statistical Tools You Need to Make Your Data Shine.
https://imotions.com/blog/statistical-tools/ (Accessed on 08/07/2019)

881 / 1351





You might also like