EXAM STATISTICS
EXAM STATISTICS
SUALING
MPA Student
Student No.202340032
ASSESSMENT
PA 297 (Research Methods of Public Administration 1 – Statistics)
&
PA 200 (Applied Statistics for Public Administration)
Statistics plays a fundamental role in data analysis by providing the essential tools and
techniques for extracting meaningful insights from datasets. Through descriptive statistics, it
enables us to summarize the main features of the data, such as central tendency and
dispersion, giving us an initial understanding of the dataset's characteristics. Moreover,
inferential statistics empowers us to make inferences or predictions about populations based
on sample data, supporting decision-making processes and hypothesis testing. Statistical
techniques also aid in exploring relationships between variables, validating data, and building
predictive models, which are crucial for understanding patterns, making informed decisions,
and forecasting future trends. Ultimately, statistics serves as the backbone of data analysis,
guiding us in deriving reliable conclusions and actionable insights from complex datasets.
2. (20 points) Use the following information to answer the exercises below.
10; 11; 15; 15; 17; 22
a) Compute the mean and standard deviation for this data; use the sample formula for the
standard deviation.
b) What number is two standard deviations above the mean of this data?
c) Express the number 13.7 in terms of the mean and standard deviation of this data.
d) In a statistics class, the scores on the final exam were normally distributed, with a mean of
85, and a standard deviation of five. Susan got a final exam score of 95. Express her exam
results as a z-score, and interpret its meaning.
Susan’s exam score is 2 standard deviations above the mean, indicating that her score is
relatively high compared to the rest of the class. In a normally distributed dataset with a
mean of 85 and a standard deviation of 5, a score of 95 is considered exceptional.
3. (20 points) Suppose you have collected data from the Fortune 500 companies regarding their
top 5 highest-paid employees (such as CEOs, presidents, etc.). You have the following
information of these 50 workers: demographics (gender, age, and education), time in months
at the company, and salary. Your research question is whether gender makes a difference in
salary. In detail, explain what descriptive and inferential statistics you would use to answer
this question. Also, specify the type of variables that you are using in the analysis. You may
provide an outline of tables or graphs that you would use in your analysis. Also, provide a
discussion of the limitations and strengths of the research design.
Descriptive Statistics:
Summary Statistics:
Calculate summary statistics such as mean, median, standard deviation, and range of
salaries for each gender group (male and female). This provides an overview of the
central tendency and variability of salaries within each gender category.
Frequency Distribution
Create a frequency table or histogram to display the distribution of salaries by gender,
showing the number of male and female employees falling into different salary ranges.
Cross-tabulation
Create a cross-tabulation (contingency table) to examine the distribution of gender and
educational level or other demographic variables, providing insights into any potential
associations between gender and other variables.
Inferential Statistics:
Independent Samples t-test
Conduct an independent samples t-test to compare the mean salaries between male and
female employees. This test determines whether the difference in mean salaries
between genders is statistically significant.
Regression Analysis
Perform regression analysis with salary as the dependent variable and gender, age,
education, and time in the company as independent variables. This allows for controlling
potential confounding factors and assessing the unique contribution of gender to salary
differences.
Variables
Dependent Variable: Salary (continuous)
Independent Variable: Gender (categorical: male or female), Age (continuous), Education
(categorical or ordinal), Time in the Company (continuous)
Tables/Graphs:
Table: Summary statistics of salary by gender
Provides mean, median, standard deviation, and range of salaries for male and
female employees.
Histogram or Frequency Distribution:
Displays the distribution of salaries by gender, showing the frequency of male and
female employees in different salary ranges.
Cross-tabulation Table:
Shows the distribution of gender by education level or other demographic variables.
4. (15 points) Use the following information to answer the number 4 and 5 questions. A grocery
store is interested in how much money (on average) their customers spend each visit in the
product department. Using their store records, they draw a sample of 1,000 visits and
calculate each customer’s average spending on product
a. Identify the population, sample, parameter, statistic, variable, and data for this example.
The "amount of money spent on product per visit" is quantitative data. Specifically, it is
continuous quantitative data because it can take on any numerical value within a certain
range (e.g., Php0 to any positive amount).
5. (10 points) Does all research data need to be analyzed using a statistical method? Why?
Not all research data requires analysis through statistical methods. The necessity for
statistical analysis largely hinges on the research objectives, the nature of the data, and the
specific questions being addressed. While statistical methods are instrumental in
quantitatively summarizing and interpreting numerical data, facilitating generalization, and
controlling for biases, there are instances where qualitative approaches may be more
appropriate. Qualitative research, for example, focuses on understanding meanings,
exploring phenomena, and generating theories, often employing methods like content
analysis or thematic coding. Therefore, the decision to utilize statistical analysis should be
guided by the research goals, the type of data collected, and the analytical framework best
suited to address the research questions effectively.
6. (20 points) How can we distinguish parametric and non-parametric test? What is the
appropriate statistical tool for parametric and non-parametric?
a. The t-test, ANOVA, and Chi-Square test are all ways of detecting variables associates via
examinations of group differences and associations. In what instance would you expect
each of the three tests to be used?
Each of the three tests - t-test, ANOVA, and Chi-Square test - is suitable for examining
group differences and associations, but they are typically used in different scenarios
based on the nature of the data and the research questions being addressed.
t-test:
The t-test is used to compare the means of two independent groups. It is appropriate
when the variable of interest is continuous and normally distributed, and we want to
assess whether there is a statistically significant difference between the means of
two groups.
Example: Comparing the mean scores of two groups of students (e.g., males vs.
females) on a math test to determine if there is a significant difference in
performance between the two groups.
Chi-Square test:
The Chi-Square test is used to assess the association between two categorical
variables. It is appropriate when the variables of interest are categorical (nominal or
ordinal) and we want to determine if there is a significant association between them.
Example: Investigating whether there is a significant association between gender
(male vs. female) and smoking status (smoker vs. non-smoker) among a sample of
individuals.
b. Suppose you must choose the one-or two-tailed version that pertains to specific tests
mentions above. In what cases would a one-tail test appropriate? At what point would a
two-tail test be appropriate? Why?
The decision to use a one-tailed or two-tailed test depends on the specific research
hypothesis and the directionality of the expected effect.
One-tailed test:
A one-tailed test is appropriate when the research hypothesis predicts the direction
of the effect (i.e., whether it will be positive or negative). In other words, it is used
when there is a specific expectation about the relationship or difference between
groups.
For example, in a t-test comparing the mean scores of two groups, if the research
hypothesis states that Group A will have higher scores than Group B (i.e., directional
hypothesis), a one-tailed test would be used.
Similarly, in ANOVA or Chi-Square tests, if there is a specific hypothesis about the
direction of the difference or association among groups or categories, a one-tailed
test would be appropriate.
Two-tailed test:
A two-tailed test is used when the research hypothesis does not specify the direction
of the effect or when there is a possibility of effects in both directions.
In a t-test comparing the mean scores of two groups, if the research hypothesis is
non-directional or states that there will be a difference between the groups without
specifying the direction, a two-tailed test would be used.
Similarly, in ANOVA or Chi-Square tests, if there is no specific expectation about the
direction of the difference or association among groups or categories, a two-tailed
test would be appropriate.
8. (20 points) Suppose you are investigating how personality (openness, conscientiousness,
extroversion, agreeableness, neuroticism) and television viewing motive type (ritualistic,
instrumental) influence the dependent variable of television viewing levels (i.e. how
long/often someone watches television for). You want to look at the main effects of the two
independent variables (IVs) and how they interact to influence the dependent variables (DV).
What would be the most appropriate statistical technique to be used? Why?
In this scenario, where you want to investigate the main effects of personality traits and
television viewing motive type, as well as their interaction on television viewing levels (the
dependent variable), a multivariate analysis technique such as multiple regression analysis or
analysis of variance (ANOVA) with interaction terms would be appropriate.
ANOVA with interaction terms is suitable when you have categorical independent
variables (such as personality traits and television viewing motive type) and a
continuous dependent variable (television viewing levels).
By including interaction terms in the ANOVA model, you can assess whether the
effect of one independent variable (e.g., personality traits) on the dependent
variable depends on the levels of another independent variable (e.g., television
viewing motive type).
This approach allows you to examine both the main effects of each independent
variable and the interaction effects between them on the dependent variable.
Both multiple regression analysis and ANOVA with interaction terms provide a
comprehensive way to assess the main effects of personality traits and television viewing
motive type, as well as their interaction, on television viewing levels. These techniques allow
you to control for potential confounding variables, assess the unique contribution of each
independent variable, and understand how they interact to influence the dependent
variable.
9. (10 points) Describe a situation in which you would calculate a parameter, rather than a
statistic.
For example, consider a scenario where you are studying the average height of all adults in a
country. Instead of measuring the heights of a sample of adults and calculating the average
height of that sample (which would be a statistic), you might want to calculate the
population parameter, which would be the true average height of all adults in the country. To
do this, you would need to obtain data on the heights of every adult in the country, which
would represent the entire population.
Calculating a population parameter is often not feasible due to practical constraints such as
time, resources, and access to the entire population. In many cases, researchers rely on
statistical inference, where they use sample data to estimate population parameters and
make inferences about the population as a whole. However, in situations where it is possible
to access data for the entire population, calculating parameters provides more accurate and
precise descriptions of the population characteristics.