0% found this document useful (0 votes)
4 views

EXAM STATISTICS

The document discusses the importance of statistics in data analysis, detailing its role in summarizing data and making inferences. It covers various statistical methods, including descriptive and inferential statistics, and explains how to analyze data related to salary differences based on gender among Fortune 500 employees. Additionally, it addresses the distinction between parametric and non-parametric tests, the appropriate use of one-tailed and two-tailed tests, and the application of statistical techniques for analyzing personality traits and television viewing motives.

Uploaded by

simplyshesu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

EXAM STATISTICS

The document discusses the importance of statistics in data analysis, detailing its role in summarizing data and making inferences. It covers various statistical methods, including descriptive and inferential statistics, and explains how to analyze data related to salary differences based on gender among Fortune 500 employees. Additionally, it addresses the distinction between parametric and non-parametric tests, the appropriate use of one-tailed and two-tailed tests, and the application of statistical techniques for analyzing personality traits and television viewing motives.

Uploaded by

simplyshesu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

SHERMINA B.

SUALING
MPA Student
Student No.202340032

ASSESSMENT
PA 297 (Research Methods of Public Administration 1 – Statistics)
&
PA 200 (Applied Statistics for Public Administration)

1. (10 points) Why is Statistics important in data analysis?

Statistics plays a fundamental role in data analysis by providing the essential tools and
techniques for extracting meaningful insights from datasets. Through descriptive statistics, it
enables us to summarize the main features of the data, such as central tendency and
dispersion, giving us an initial understanding of the dataset's characteristics. Moreover,
inferential statistics empowers us to make inferences or predictions about populations based
on sample data, supporting decision-making processes and hypothesis testing. Statistical
techniques also aid in exploring relationships between variables, validating data, and building
predictive models, which are crucial for understanding patterns, making informed decisions,
and forecasting future trends. Ultimately, statistics serves as the backbone of data analysis,
guiding us in deriving reliable conclusions and actionable insights from complex datasets.

2. (20 points) Use the following information to answer the exercises below.
10; 11; 15; 15; 17; 22
a) Compute the mean and standard deviation for this data; use the sample formula for the
standard deviation.

b) What number is two standard deviations above the mean of this data?

Two standard deviation above the mean =

c) Express the number 13.7 in terms of the mean and standard deviation of this data.

d) In a statistics class, the scores on the final exam were normally distributed, with a mean of
85, and a standard deviation of five. Susan got a final exam score of 95. Express her exam
results as a z-score, and interpret its meaning.
Susan’s exam score is 2 standard deviations above the mean, indicating that her score is
relatively high compared to the rest of the class. In a normally distributed dataset with a
mean of 85 and a standard deviation of 5, a score of 95 is considered exceptional.

3. (20 points) Suppose you have collected data from the Fortune 500 companies regarding their
top 5 highest-paid employees (such as CEOs, presidents, etc.). You have the following
information of these 50 workers: demographics (gender, age, and education), time in months
at the company, and salary. Your research question is whether gender makes a difference in
salary. In detail, explain what descriptive and inferential statistics you would use to answer
this question. Also, specify the type of variables that you are using in the analysis. You may
provide an outline of tables or graphs that you would use in your analysis. Also, provide a
discussion of the limitations and strengths of the research design.

Descriptive Statistics:
Summary Statistics:
 Calculate summary statistics such as mean, median, standard deviation, and range of
salaries for each gender group (male and female). This provides an overview of the
central tendency and variability of salaries within each gender category.

Frequency Distribution
 Create a frequency table or histogram to display the distribution of salaries by gender,
showing the number of male and female employees falling into different salary ranges.

Cross-tabulation
 Create a cross-tabulation (contingency table) to examine the distribution of gender and
educational level or other demographic variables, providing insights into any potential
associations between gender and other variables.

Inferential Statistics:
Independent Samples t-test
 Conduct an independent samples t-test to compare the mean salaries between male and
female employees. This test determines whether the difference in mean salaries
between genders is statistically significant.
Regression Analysis
 Perform regression analysis with salary as the dependent variable and gender, age,
education, and time in the company as independent variables. This allows for controlling
potential confounding factors and assessing the unique contribution of gender to salary
differences.

Variables
 Dependent Variable: Salary (continuous)
 Independent Variable: Gender (categorical: male or female), Age (continuous), Education
(categorical or ordinal), Time in the Company (continuous)

Tables/Graphs:
Table: Summary statistics of salary by gender
 Provides mean, median, standard deviation, and range of salaries for male and
female employees.
Histogram or Frequency Distribution:
 Displays the distribution of salaries by gender, showing the frequency of male and
female employees in different salary ranges.
Cross-tabulation Table:
 Shows the distribution of gender by education level or other demographic variables.

Discussion of Limitations and Strengths:


Limitations:
 Limited Sample Size: The analysis is based on a sample of 50 employees from
Fortune 500 companies, which may not be representative of the entire population.
 Causality: Due to the observational nature of the study, causality cannot be inferred.
Other unmeasured variables may confound the relationship between gender and
salary.
Strengths:
 Detailed Data: The dataset includes information on demographics, time at the
company, and salary, allowing for a comprehensive analysis.
 High Profile Companies: The dataset consists of employees from Fortune 500
companies, which adds credibility and relevance to the study.

In conclusion, a combination of descriptive and inferential statistics, along with careful


consideration of variables and potential limitations, would be used to investigate the
relationship between gender and salary among the top 5 highest-paid employees of Fortune
500 companies.

4. (15 points) Use the following information to answer the number 4 and 5 questions. A grocery
store is interested in how much money (on average) their customers spend each visit in the
product department. Using their store records, they draw a sample of 1,000 visits and
calculate each customer’s average spending on product

a. Identify the population, sample, parameter, statistic, variable, and data for this example.

Population: All customers who visit the grocery store.


Sample: The 1,000 visits that were selected for analysis.
Parameter: The average spending of all customers who visit the grocery store (population
mean).
Statistic: The average spending of the 1,000 visits in the sample.
Variable: Customer spending on products during each visit.
Data: The recorded spending amounts for each of the 1,000 visits in the sample.

b. What kind of data is “amount of money spent on product per visit”?

The "amount of money spent on product per visit" is quantitative data. Specifically, it is
continuous quantitative data because it can take on any numerical value within a certain
range (e.g., Php0 to any positive amount).

5. (10 points) Does all research data need to be analyzed using a statistical method? Why?
Not all research data requires analysis through statistical methods. The necessity for
statistical analysis largely hinges on the research objectives, the nature of the data, and the
specific questions being addressed. While statistical methods are instrumental in
quantitatively summarizing and interpreting numerical data, facilitating generalization, and
controlling for biases, there are instances where qualitative approaches may be more
appropriate. Qualitative research, for example, focuses on understanding meanings,
exploring phenomena, and generating theories, often employing methods like content
analysis or thematic coding. Therefore, the decision to utilize statistical analysis should be
guided by the research goals, the type of data collected, and the analytical framework best
suited to address the research questions effectively.

6. (20 points) How can we distinguish parametric and non-parametric test? What is the
appropriate statistical tool for parametric and non-parametric?

Parametric and non-parametric tests are distinguished primarily by the underlying


assumptions about the data distribution.
1. Parametric Tests: Parametric tests assume that the data come from a specific distribution,
typically the normal distribution. They also assume homogeneity of variances across groups
being compared. Parametric tests are more powerful and efficient when the data meet these
assumptions. Examples of parametric tests include t-tests, ANOVA, Pearson correlation, and
linear regression.
2. Non-parametric Tests: Non-parametric tests make fewer assumptions about the data
distribution and are more robust to violations of distributional assumptions. They are
suitable for non-normally distributed data or data with unequal variances. Non-parametric
tests often rely on rank-ordering the data rather than using the raw values. Examples of non-
parametric tests include the Wilcoxon signed-rank test, Mann-Whitney U test, Kruskal-Wallis
test, and Spearman correlation.
Choosing the appropriate statistical tool depends on several factors:
 Data Distribution: If the data are normally distributed and meet the assumptions of
parametric tests, then parametric tests are appropriate. However, if the data are non-
normally distributed or the assumptions of parametric tests are violated, non-parametric
tests may be more suitable.
 Type of Variables: Parametric tests are typically used for interval or ratio data, while non-
parametric tests can be used for ordinal, interval, or ratio data.
 Sample Size: Parametric tests are generally more powerful with larger sample sizes, whereas
non-parametric tests may be preferred for smaller sample sizes or when data are highly
skewed.
In summary, the choice between parametric and non-parametric tests depends on the
assumptions about the data distribution, the type of variables being analyzed, and the
sample size. It's essential to assess the appropriateness of each test based on these factors
to ensure valid and reliable statistical analysis.

7. (20 points) Answer the following questions:

a. The t-test, ANOVA, and Chi-Square test are all ways of detecting variables associates via
examinations of group differences and associations. In what instance would you expect
each of the three tests to be used?
Each of the three tests - t-test, ANOVA, and Chi-Square test - is suitable for examining
group differences and associations, but they are typically used in different scenarios
based on the nature of the data and the research questions being addressed.

t-test:
 The t-test is used to compare the means of two independent groups. It is appropriate
when the variable of interest is continuous and normally distributed, and we want to
assess whether there is a statistically significant difference between the means of
two groups.
 Example: Comparing the mean scores of two groups of students (e.g., males vs.
females) on a math test to determine if there is a significant difference in
performance between the two groups.

ANOVA (Analysis of Variance):


 ANOVA is used to compare the means of three or more independent groups
simultaneously. It is suitable when the variable of interest is continuous and normally
distributed, and we want to assess whether there are statistically significant
differences among the means of multiple groups.
 Example: Comparing the mean scores of students across different grade levels (e.g.,
grades 9, 10, and 11) on a standardized test to determine if there is a significant
difference in performance among the grade levels.

Chi-Square test:
 The Chi-Square test is used to assess the association between two categorical
variables. It is appropriate when the variables of interest are categorical (nominal or
ordinal) and we want to determine if there is a significant association between them.
 Example: Investigating whether there is a significant association between gender
(male vs. female) and smoking status (smoker vs. non-smoker) among a sample of
individuals.

b. Suppose you must choose the one-or two-tailed version that pertains to specific tests
mentions above. In what cases would a one-tail test appropriate? At what point would a
two-tail test be appropriate? Why?

The decision to use a one-tailed or two-tailed test depends on the specific research
hypothesis and the directionality of the expected effect.

One-tailed test:
 A one-tailed test is appropriate when the research hypothesis predicts the direction
of the effect (i.e., whether it will be positive or negative). In other words, it is used
when there is a specific expectation about the relationship or difference between
groups.
 For example, in a t-test comparing the mean scores of two groups, if the research
hypothesis states that Group A will have higher scores than Group B (i.e., directional
hypothesis), a one-tailed test would be used.
 Similarly, in ANOVA or Chi-Square tests, if there is a specific hypothesis about the
direction of the difference or association among groups or categories, a one-tailed
test would be appropriate.

Two-tailed test:
 A two-tailed test is used when the research hypothesis does not specify the direction
of the effect or when there is a possibility of effects in both directions.
 In a t-test comparing the mean scores of two groups, if the research hypothesis is
non-directional or states that there will be a difference between the groups without
specifying the direction, a two-tailed test would be used.
 Similarly, in ANOVA or Chi-Square tests, if there is no specific expectation about the
direction of the difference or association among groups or categories, a two-tailed
test would be appropriate.

8. (20 points) Suppose you are investigating how personality (openness, conscientiousness,
extroversion, agreeableness, neuroticism) and television viewing motive type (ritualistic,
instrumental) influence the dependent variable of television viewing levels (i.e. how
long/often someone watches television for). You want to look at the main effects of the two
independent variables (IVs) and how they interact to influence the dependent variables (DV).
What would be the most appropriate statistical technique to be used? Why?

In this scenario, where you want to investigate the main effects of personality traits and
television viewing motive type, as well as their interaction on television viewing levels (the
dependent variable), a multivariate analysis technique such as multiple regression analysis or
analysis of variance (ANOVA) with interaction terms would be appropriate.

Multiple Regression Analysis:

 Multiple regression allows you to examine the relationship between multiple


independent variables (personality traits and television viewing motive type) and a
continuous dependent variable (television viewing levels).
 By including interaction terms (the product of personality traits and television
viewing motive type), you can assess whether the relationship between personality
traits and television viewing levels depends on the type of motive for television
viewing.
 This approach allows you to test for main effects of each independent variable
(personality traits and television viewing motive type) as well as interaction effects
between them.

Analysis of Variance (ANOVA) with Interaction Terms:

 ANOVA with interaction terms is suitable when you have categorical independent
variables (such as personality traits and television viewing motive type) and a
continuous dependent variable (television viewing levels).
 By including interaction terms in the ANOVA model, you can assess whether the
effect of one independent variable (e.g., personality traits) on the dependent
variable depends on the levels of another independent variable (e.g., television
viewing motive type).
 This approach allows you to examine both the main effects of each independent
variable and the interaction effects between them on the dependent variable.

Both multiple regression analysis and ANOVA with interaction terms provide a
comprehensive way to assess the main effects of personality traits and television viewing
motive type, as well as their interaction, on television viewing levels. These techniques allow
you to control for potential confounding variables, assess the unique contribution of each
independent variable, and understand how they interact to influence the dependent
variable.

9. (10 points) Describe a situation in which you would calculate a parameter, rather than a
statistic.

For example, consider a scenario where you are studying the average height of all adults in a
country. Instead of measuring the heights of a sample of adults and calculating the average
height of that sample (which would be a statistic), you might want to calculate the
population parameter, which would be the true average height of all adults in the country. To
do this, you would need to obtain data on the heights of every adult in the country, which
would represent the entire population.

Calculating a population parameter is often not feasible due to practical constraints such as
time, resources, and access to the entire population. In many cases, researchers rely on
statistical inference, where they use sample data to estimate population parameters and
make inferences about the population as a whole. However, in situations where it is possible
to access data for the entire population, calculating parameters provides more accurate and
precise descriptions of the population characteristics.

You might also like