0% found this document useful (0 votes)
9 views

Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms

Uploaded by

austinbodi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms

Uploaded by

austinbodi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MULTIMEDIA UNIVERSITY OF KENYA

FACULTY OF SCIENCE & TECHNOLOGY

UNIVERSITY EXAMINATIONS 2020/2021


THIRD YEAR FIRST SEMESTER ASSIGNMENT FOR THE DEGREE OF
BACHELOR OF SCIENCE IN APPLIED PHYSICS

UNIT CODE: ICS 2328: UNIT NAME: COMPUTER ORIENTED STATISTICAL


MODELING
DATE: MARCH 2023 TIME: 1 WEEK
INSTRUCTIONS:
ANSWER QUESTION ALL QUESTIONS

QUESTION ONE

a. Define the following terms as applied in statistical modeling:


i) Measures of central tendency [2 marks]

Measures of central tendency are the arithmetic mean, the median and the mode. A
central tendency can be calculated for either a finite set of values or for a theoretical
distribution, such as the normal distribution.

Mean or average when the context is clear, is the sum of a collection of numbers divided
by the number of numbers in the collection.

The median is the value separating the higher half of a data sample, a population, or a
probability distribution, from the lower half. In simple terms, it may be thought of as the
"middle" value of a data set.

The mode is the value that appears most often in a set of data. The mode of a discrete
probability distribution is the value x at which its probability mass function takes its
maximum value. In other words, it is the value that is most likely to be sampled.

ii) Data type [2 marks]

Page 1 of 6
A data type or simply type is a classification of data which tells the compiler or interpreter
how the programmer intends to use the data. Most programming languages support
various types of data, for example: real, integer or Boolean.

In statistics, groups of individual data points may be classified as belonging to any of


various statistical data types, e.g. categorical ("red", "blue", "green"), real number (1.68,
-5, 1.7e+6), etc.

iii) Standard deviation [2 marks]

In statistics, the standard deviation (SD, also represented by the Greek letter sigma σ
or the Latin letter s) is a measure that is used to quantify the amount of variation or
dispersion of a set of data values.

A low standard deviation indicates that the data points tend to be close to the mean (also
called the expected value) of the set, while a high standard deviation indicates that the data
points are spread out over a wider range of values

iv) Frequency [2 marks]

Frequency is the number of occurrences of a repeating event per unit time.


In statistics the frequency (or absolute frequency) of an event is the number of times the
event occurred in an experiment or study.

v) Statistical average [2 marks]

In statistics the frequency (or absolute frequency) of an event is the number of times the
event occurred in an experiment or study.

b. Why is SPSS important for Statistical (Quantitative) data analysis? [5 marks]

Statistics is that the body of mathematical techniques or processes for gathering,


describing organizing and decoding numerical information.
Since analysis usually yields such quantitative information, statistics could be a basic tool
of measure and analysis.

A widely used all-purpose survey analysis package.

c. The shape of a distribution is very important in data analysis. Using illustrations explain what is meant
by skewness and kurtosis giving their statistical significant values. [5 marks]

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A


distribution, or data set, is symmetric if it looks the same to the left and right of the center
point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a


normal distribution.

Page 2 of 6
Illustrate with diagrams

QUESTION TWO

a) How would you establish the following in SPSS for 2 variables? In each case clearly mention the
statistical procedure and the relevant statistics
i. Differences [2 marks]

This test is similar to the one-sample test, except rather than testing a
hypothesized mean, we’re testing to see if there is a difference between two groups.
For the grouping variable, you can choose a demographic trait (such as gender,
age, ethnicity, etc.) or any other variable that classifies your groups. (In an
experimental design, it is a good way to test the differences between the control
group and the manipulation group.) In this example, we’ll use gender.
In the SPSS menu, select:
Analyze>Compare Means>Independent Samples T-test Relationship
(4 marks)

The bivariate Pearson Correlation produces a sample correlation coefficient, r,


which measures the strength and direction of linear relationships between pairs
of continuous variables. By extension, the Pearson Correlation evaluates whether
there is statistical evidence for a linear relationship among the same pairs of
variables in the population, represented by a population correlation coefficient, ρ
(“rho”). The Pearson Correlation is a parametric measure.

ii. Association [2 marks]

This test is similar to the one-sample test, except rather than testing a
hypothesized mean, we’re testing to see if there is a difference between two groups.
For the grouping variable, you can choose a demographic trait (such as gender,
age, ethnicity, etc.) or any other variable that classifies your groups. (In an
experimental design, it is a good way to test the differences between the control
group and the manipulation group.) In this example, we’ll use gender.
In the SPSS menu, select:
Analyze>Compare Means>Independent Samples T-test Relationship
(4 marks)

The bivariate Pearson Correlation produces a sample correlation coefficient, r,


which measures the strength and direction of linear relationships between pairs
of continuous variables. By extension, the Pearson Correlation evaluates whether
there is statistical evidence for a linear relationship among the same pairs of
variables in the population, represented by a population correlation coefficient, ρ
(“rho”). The Pearson Correlation is a parametric measure.

b) With respect to normality, explain skewness and kurtosis illustrating your answer with sketches.

Page 3 of 6
[6 marks]

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A


distribution, or data set, is symmetric if it looks the same to the left and right of the
center point.

Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a


normal distribution.

Illustrate with diagrams

QUESTION THREE

Use the SPSS output for Linear Regression tables below to answer the following questions.

a) Write down the linear regression equation indicating what each letter represents in the equation.
[2 marks]

Linear regression is a way to model the relationship between two variables. ... The
equation has the form Y=a+bX, where Y is the dependent variable (that's the variable
that goes on the Y axis), X is the independent variable (i.e. it is plotted on the X axis), b
is the slope of the line and a is the y-intercept.

b) What is the value of the standard error of the estimate? [2 marks]

The standard error of the estimate is a measure of the accuracy of predictions. Recall
that the regression line is the line that minimizes the sum of squared deviations of
prediction (also called the sum of squares error). The standard error of the estimate is
closely related to this quantity and is defined below:

where σest is the standard error of the estimate, Y is an actual score, Y' is a predicted
score, and N is the number of pairs of scores. The numerator is the sum of squared
differences between the actual scores and the predicted scores.

Note the similarity of the formula for σest to the formula for σ. It turns out that σest is the
standard deviation of the errors of prediction (each Y - Y' is an error of prediction).

c) How many degrees of freedom are associated with the t-value for the line of regression?
[2 marks]

Page 4 of 6
Several commonly encountered statistical distributions (Student's t, Chi-Squared, F) have
parameters that are commonly referred to as degrees of freedom. This terminology simply
reflects that in many applications where these distributions occur, the parameter
corresponds to the degrees of freedom of an underlying random vector, as in the preceding

ANOVA example. Another simple example is: if are independent normal random
variables, the statistic

follows a chi-squared distribution with n − 1 degrees of freedom.

d) What is the value of the correlation coefficient? [1 marks]

0.941

e) What is the Confidence and Prediction Interval? [1 marks]

8.505 and 6.821

f) What is the 95% confidence interval for the mean value of x? [1 marks]

4.801

g) What is the 95% prediction interval for the mean value of x? [1 marks]

0.886

Coefficients

Unstandardized Standarsized
Model Coefficients Coeffecients
t Sig
B Std Beta
Error
Constant 2.129 .250 8.505 .000
0.941
Additive .338 .050 6.821 .000

Model Summary

Model R R Adjusted Std


Square R Error of Durbin-Watson
Square the
Estimate
1 .941 .886 .867 .32121 2.321

Page 5 of 6
ANOVA

Model Sum of df Mean F Sig.


Squares Square
1 4.801 1 4.801 46.532 .000
Regression
.619 6 .103
Residual
5.420 7
Total

Page 6 of 6

You might also like