Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms
Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms
QUESTION ONE
Measures of central tendency are the arithmetic mean, the median and the mode. A
central tendency can be calculated for either a finite set of values or for a theoretical
distribution, such as the normal distribution.
Mean or average when the context is clear, is the sum of a collection of numbers divided
by the number of numbers in the collection.
The median is the value separating the higher half of a data sample, a population, or a
probability distribution, from the lower half. In simple terms, it may be thought of as the
"middle" value of a data set.
The mode is the value that appears most often in a set of data. The mode of a discrete
probability distribution is the value x at which its probability mass function takes its
maximum value. In other words, it is the value that is most likely to be sampled.
Page 1 of 6
A data type or simply type is a classification of data which tells the compiler or interpreter
how the programmer intends to use the data. Most programming languages support
various types of data, for example: real, integer or Boolean.
In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ
or the Latin letter s) is a measure that is used to quantify the amount of variation or
dispersion of a set of data values.
A low standard deviation indicates that the data points tend to be close to the mean (also
called the expected value) of the set, while a high standard deviation indicates that the data
points are spread out over a wider range of values
In statistics the frequency (or absolute frequency) of an event is the number of times the
event occurred in an experiment or study.
c. The shape of a distribution is very important in data analysis. Using illustrations explain what is meant
by skewness and kurtosis giving their statistical significant values. [5 marks]
Page 2 of 6
Illustrate with diagrams
QUESTION TWO
a) How would you establish the following in SPSS for 2 variables? In each case clearly mention the
statistical procedure and the relevant statistics
i. Differences [2 marks]
This test is similar to the one-sample test, except rather than testing a
hypothesized mean, we’re testing to see if there is a difference between two groups.
For the grouping variable, you can choose a demographic trait (such as gender,
age, ethnicity, etc.) or any other variable that classifies your groups. (In an
experimental design, it is a good way to test the differences between the control
group and the manipulation group.) In this example, we’ll use gender.
In the SPSS menu, select:
Analyze>Compare Means>Independent Samples T-test Relationship
(4 marks)
This test is similar to the one-sample test, except rather than testing a
hypothesized mean, we’re testing to see if there is a difference between two groups.
For the grouping variable, you can choose a demographic trait (such as gender,
age, ethnicity, etc.) or any other variable that classifies your groups. (In an
experimental design, it is a good way to test the differences between the control
group and the manipulation group.) In this example, we’ll use gender.
In the SPSS menu, select:
Analyze>Compare Means>Independent Samples T-test Relationship
(4 marks)
b) With respect to normality, explain skewness and kurtosis illustrating your answer with sketches.
Page 3 of 6
[6 marks]
QUESTION THREE
Use the SPSS output for Linear Regression tables below to answer the following questions.
a) Write down the linear regression equation indicating what each letter represents in the equation.
[2 marks]
Linear regression is a way to model the relationship between two variables. ... The
equation has the form Y=a+bX, where Y is the dependent variable (that's the variable
that goes on the Y axis), X is the independent variable (i.e. it is plotted on the X axis), b
is the slope of the line and a is the y-intercept.
The standard error of the estimate is a measure of the accuracy of predictions. Recall
that the regression line is the line that minimizes the sum of squared deviations of
prediction (also called the sum of squares error). The standard error of the estimate is
closely related to this quantity and is defined below:
where σest is the standard error of the estimate, Y is an actual score, Y' is a predicted
score, and N is the number of pairs of scores. The numerator is the sum of squared
differences between the actual scores and the predicted scores.
Note the similarity of the formula for σest to the formula for σ. It turns out that σest is the
standard deviation of the errors of prediction (each Y - Y' is an error of prediction).
c) How many degrees of freedom are associated with the t-value for the line of regression?
[2 marks]
Page 4 of 6
Several commonly encountered statistical distributions (Student's t, Chi-Squared, F) have
parameters that are commonly referred to as degrees of freedom. This terminology simply
reflects that in many applications where these distributions occur, the parameter
corresponds to the degrees of freedom of an underlying random vector, as in the preceding
ANOVA example. Another simple example is: if are independent normal random
variables, the statistic
0.941
f) What is the 95% confidence interval for the mean value of x? [1 marks]
4.801
g) What is the 95% prediction interval for the mean value of x? [1 marks]
0.886
Coefficients
Unstandardized Standarsized
Model Coefficients Coeffecients
t Sig
B Std Beta
Error
Constant 2.129 .250 8.505 .000
0.941
Additive .338 .050 6.821 .000
Model Summary
Page 5 of 6
ANOVA
Page 6 of 6